
Manoj K. answered 04/10/21
Consultant with 25+ years of experience in databases and SQL
When we have data in “normalized” format, we expect to have data broken into multiple tables. This is to avoid redundancy. Example, let’s see sales order data.
-----------------------------------------------------------------------------------------------------------------------------------------------
SONO DATE CUSTNO NAME CITY STATE PRODCODE PRODNAME UOM QUANTITY RATE
-----------------------------------------------------------------------------------------------------------------------------------------------
1 4/1/2021 C001 ABC Inc Chicago IL P001 Prod P001 EACH 100 $20
2 4/1/2021 C001 ABC Inc Chicago IL P002 Prod P002 EACH 150 $25
3 4/2/2021 C002 BCD Inc Detroit MI P003 Prod P003 EACH 125 $30
4 4/3/2021 C003 CDE Inc San Jose CA P004 Prod P004 EACH 50 $45
5 4/4/2021 C002 BCD Inc Detroit MI P001 Prod P001 EACH 100 $20
------------------------------------------------------------------------------------------
Figure 1 - Denormalized data
Here in above figure, we can see customer name, city, state is repeating for each Sales Order. Similarly, product name, UOM and Rate is repeating for each sales order.
Customer ABC Inc, Chicago, IL is written 2 times for sales order 1 and 2. Also, Product P001, EACH, $20 is written 2 times for product P001 for sales order 1, 5.
This is one of the major design issues, for applications which are “transaction intensive”. If, customer C001 moves from Chicago, IL to Santa Clara, CA, we need to correct data at different rows. We can fix this issue by using “normalized” data model.
We can divide this 1 big table into 3 small tables.
1. Sales Order
2. Customer
3. Product
This way, repeating information is taken out and kept in separate table.
Following figure shows 3 separate tables. SalesOrder table has a sales order ( SONO = 6 ), which is not present in Figure 1. This is added purposely.
------------------------------------------------------------------------------------
Table - SalesOrder |
------------------------------------------------------------------------------------
SONO DATE CUSTNO PRODCODE QUANTITY RATE |
-----------------------------------------------------------------------------------|
1 4/1/2021 C001 P001 100 $20 |
2 4/1/2021 C001 P002 150 $25 |
3 4/2/2021 C002 P003 125 $30 |
4 4/3/2021 C003 P004 50 $45 |
5 4/4/2021 C002 P001 100 $20 |
6 4/4/2021 C004 P001 100 $10 |
------------------------------------------------------------------------------------
Table - Customer
------------------------------------------------------------
CUSTNO NAME CITY STATE
------------------------------------------------------------
C001 ABC Inc Chicago IL
C002 BCD Inc Detroit MI
C003 CDE Inc San Jose CA
------------------------------------------------------------
Table - Product
------------------------------------------------------------
PRODCODE PRODNAME UOM RATE
------------------------------------------------------------
P001 Prod P001 EACH $20
P002 Prod P002 EACH $25
P003 Prod P003 EACH $30
P004 Prod P004 EACH $45
------------------------------------------------------------
Figure 2 - Normalized tables
Now, if we want to generate data from 3 table, to look like as shown in Figure 1, we need to perform table join. There are several different types of join which are used.
1. Inner Join or Natural join – here we can join 2 tables on matching values in both tables.
2. Left Outer Join – unlike inner join (where only matching records are returned), outer join returns matching records (as inner join) + records with no-match.
Sample Inner Join statement
SELECT * FROM SalesOrder A INNER JOIN Customer B
ON A.CUSTNO = B.CUSTNO
------------------------------------------------------------------------------------------------------------------------------------------
Columns from SalesOrder table | Columns from Customer table |
SONO DATE CUSTNO PRODCODE QUANTITY RATE | CUSTNO NAME CITY STATE |
------------------------------------------------------------------------------------|-----------------------------------------------------|
1 4/1/2021 C001 P001 100 $20 | C001 ABC Inc Chicago IL |
2 4/1/2021 C001 P002 150 $25 | C001 ABC Inc Chicago IL |
3 4/2/2021 C002 P003 125 $30 | C002 BCD Inc Detroit MI |
4 4/3/2021 C003 P004 50 $45 | C003 CDE Inc San Jose CA |
5 4/4/2021 C002 P001 100 $20 | C002 BCD Inc Detroit MI |
------------------------------------------------------------------------------------------------------------------------------------------
Figure 3 - Output of inner join
We can see, record where SONO = 6 is not in output of Inner join (Figure 3). This is because, SalesOrder and Customer tables are joined on CUSTNO, and CUSTNO 'C004' is not present in CUSTOMER table.
Sample Left Outer Join statement
SELECT * FROM SalesOrder A LEFT OUTER JOIN Customer B
ON A.CUSTNO = B.CUSTNO
--------------------------------------------------------------------------------------------------------------------------------------
Columns from SalesOrder table | Columns from Customer table |
SONO DATE CUSTNO PRODCODE QUANTITY RATE | CUSTNO NAME CITY STATE |
-----------------------------------------------------------------------------------|---------------------------------------------------|
1 4/1/2021 C001 P001 100 $20 | C001 ABC Inc Chicago IL |
2 4/1/2021 C001 P002 150 $25 | C001 ABC Inc Chicago IL |
3 4/2/2021 C002 P003 125 $30 | C002 BCD Inc Detroit MI |
4 4/3/2021 C003 P004 50 $45 | C003 CDE Inc San Jose CA |
5 4/4/2021 C002 P001 100 $20 | C002 BCD Inc Detroit MI |
6 4/4/2021 C004 P001 100 $10 | NULL NULL NULL NULL |
---------------------------------------------------------------------------------------------------------------------------------------
Figure 4 - Output of left outer join
We can see, unlike data in Figure3, record where SONO = 6 is appearing in output of Left Outer join (Figure 4). But since Customer C004 does not exist in Customer table, customer name, city and state values are NULL.
Note : There are 4 operators in SQL which are not part of JOIN , but they perform functionality which is similar to JOIN. Those operators are
INTERSECT
UNION
UNION ALL
EXCEPT / MINUS