DOCUMENT #682
=======================================================================
WORKING WITH OUTER JOINS
=======================================================================
Product: R:BASE Version : 3.1 & Higher
=======================================================================
Area : DATA MANIPULATION Category: VIEWS
=======================================================================
When linking tables in QBE (Query-by-Example), you usually choose
to join them where the linking columns are equal, i.e. where you
have data that matches in both tables. Sometimes, however, you also
want to see the data where the linking column doesn't match. This
option is not available from the menus; you can't view both the rows
that match and the rows that don't match with a single query. You
can, however, create a query like this from the R> prompt. It's
called an "outer join."
Even though you don't see it, when you create a query using the
menus, R:BASE builds a SELECT command with a WHERE clause to link
the tables and identify the rows to display. By definition, an outer
join retrieves sets of completely different rows. A WHERE clause that
displays all the rows that match between two or more tables is mutually
exclusive from a WHERE clause that displays the rows that don't match.
This means you need two different WHERE clauses.
You might think you can see both sets of rows by creating a WHERE
clause with two conditions separated by an OR operator, but this
doesn't work. The right data is not retrieved. What you really
need are two separate SELECT commands with two separate WHERE clauses.
The menus won't let you do this. You need to use the UNION operator
of the SELECT command at the R> prompt. The purpose of the UNION
operator is to join two or more SELECT commands together to create
one table of data.
This article gives a few simple techniques to follow when creating
outer joins and shows how to use the techniques with an actual example
from the CONCOMP sample database.
Identify The Tables And Data Conditions
=======================================
The first step is to identify the tables to be joined and any
conditions needed to qualify the data. Once the tables and
conditions have been identified, sketch out the different select
commands needed. Be sure to include the linking columns. Here's
the basic structure of the complete SELECT command to create an
outer join of two tables:
SELECT data FROM table1 and table2 where the linking columns
are equal, these are the matching rows
UNION
SELECT data FROM table1 where there is no matching data in table2
UNION
SELECT data FROM table2 where there is no matching data in table1
You need three separate SELECT commands to cover all the possible
combinations of selecting data from two tables. But you may not
need all three of the SELECT commands. If you know that there is
never a row in table2 if there isn't a row in table1, you won't need
the third SELECT command.
It's important to sketch out the different select commands you need
before starting to write them because it becomes more complex with
each table that you add to the join. When joining two tables, there
can be up to three possible SELECT commands needed to retrieve all
the data. When joining three tables, there can be up to seven possible
SELECT commands. Look at the following table for a 3-table join.
"YES" means there could be data in the table.
| table 1 | table 2 | table 3
--------------------------------------------------------------------
SELECT1 | YES | YES | YES
SELECT2 | YES | YES | NO
SELECT3 | YES | NO | YES
SELECT4 | YES | NO | NO
SELECT5 | NO | YES | YES
SELECT6 | NO | YES | NO
SELECT7 | NO | NO | YES
By knowing your database structure and how your data is organized you
can reduce the number of SELECT commands needed to complete the outer
join and won't need to account for all possible combinations of data.
This is important because joining four tables for an outer join has
over 15 possible SELECT commands.
Write And Test Each SELECT Command By Itself
============================================
Once you have identified the tables and the conditions, and sketched
out the different SELECT commands needed, begin writing the individual
SELECT commands. Each SELECT command stands alone and is executed,
tested and optimized by itself. Then all the working SELECT commands
are combined into one using the UNION operator. When writing the
individual SELECT commands, remember that each must select the same
number of columns, and the columns must have the same datatypes and
be selected in the same order.
Start with the SELECT command that will retrieve the rows where the
linking column matches across all the tables. The easiest way to
write the SELECT command is to place each of its parts on a separate
line like this:
SELECT < the columns names you want to display data from >
FROM < the table names the data is in >
WHERE < put the linking column(s) here and any other data conditions >
Use this same technique to structure the other SELECT commands. For
the other SELECT commands, first fill out the FROM line. This
identifies the table(s) this particular SELECT command will retrieve
data from and keys both the SELECT and WHERE clauses. You need the
same number of values (either column names or constants) after the
keyword SELECT as in the first command. The FROM will determine
which values are constants and which are column names. Make sure
they are in the same order as the first command. The WHERE clause
will use a sub-SELECT to identify the rows that are NOT IN the other
table(s). A Sub-SELECT makes a list of data values to compare to.
An example
==========
Let's look at a specific example from the CONCOMP sample database.
The desired result is a report of transaction information for each
customer for the month of March. Not all customers had transactions
in March, but all customers must be listed and show $0.00 if they had
no transactions.
First, identify the tables and data conditions that are needed and
sketch out the SELECT commands. Transaction information is stored
in the Transmaster table. Customer information is in the Customer
table. The tables are linked by the custid column. The first SELECT
command selects the rows from the Transmaster and Customer tables
where the customer has a transaction record, i.e. the rows that match
between the two tables.
SELECT customer number (customer), company name (customer),
transaction number (transmaster), transaction date(transmaster),
transaction amount (transmaster),
FROM customer,transmaster
WHERE custid = custid and transdate in March
Notice that the actual column names aren't used. You can, but when
sketching out the SELECT commands it's often easier to not use the
actual names, but just put the description of the data you want to see.
Put the table name next to data description so it's easy to see which
table those items are from.
Next, sketch out the other SELECT commands to find the rows that don't
match. Data entry rules are defined in the database, so there can't
be a transaction (a row in the Transmaster table) if there is no
customer record. This leaves just one additional SELECT command, to
find the rows from the Customer table where the custid is not in the
Transmaster table.
SELECT customer number (customer), company name (customer),
transaction number (constant), transaction date (constant),
transaction amount (constant)
FROM customer
WHERE custid NOT IN (SELECT custid FROM transmaster WHERE transdate
in March)
Here data is only selected from the Customer table, so the data
values in the first SELECT that were from the Transmaster table
become constants in this SELECT command.
Now write and test the actual SELECT commands. The first one, a
straightforward multi-table SELECT command, looks like this:
columns from the
columns from the Transmaster table
Customer table / \
/ \ / \
SELECT custid,company,transid,transdate,netamount +
FROM customer,transmaster +
WHERE customer.custid=transmaster.custid AND +
/ transmaster.transdate BETWEEN 3/1/89 AND 3/31/89
/ \
links the tables \
qualifies the rows
Notice that in the WHERE clause, in addition to linking the tables
to find the rows that match (customer.custid=transmaster.custid),
there is the additional condition to retrieve only the records from
March. This second condition is on a column (transdate) from the
Transmaster table only.
After this SELECT command works to retrieve the desired data, write
and test the next command. This is just a single table select.
Because there is no data from the Transmaster table no columns are
selected from that table. "Place holders" are used instead, either
a constant value or '' (quote quote) for null. There are no tables
to link together in the WHERE clause. Instead, use a Sub-SELECT to
select the rows from the Customer table that are not in the
Transaction table.
place holders to match
columns from the columns from the Transmaster table
Customer table / \
/ \ / \
SELECT custid,company,0,'',$0 +
FROM customer +
WHERE custid NOT IN (SELECT custid FROM transmaster WHERE +
/ transdate BETWEEN 3/1/89 AND 3/31/89 )
/ \
finds the rows that \
don't match only compare to rows in the
right date range
Remember that in the first select there was a WHERE clause based on the
Transmaster table. The second select is from the Customer table only,
so there is no Transdate column to put a condition on. To make sure
the right rows are retrieved, that WHERE clause condition must be put
on the Sub-SELECT, which is from the Transmaster table. The data
includes only customers who didn't have transactions in March. If
there is no WHERE clause on the sub-SELECT to restrict the date to
March, this SELECT command would have returned customers who, in some
month, sometime, had no transactions; not the correct data.
Testing each SELECT command separately makes sure it is returning
correct data. If each of the individual SELECTs return correct data,
then the complete SELECT will return correct data.
Finally, join the two SELECT commands with the UNION operator:
SELECT custid,company,transid,transdate,netamount +
FROM customer,transmaster +
WHERE customer.custid=transmaster.custid AND +
transmaster.transdate BETWEEN 3/1/89 AND 3/31/89 +
UNION +
SELECT custid,company,0,'',$0 +
FROM customer +
WHERE custid NOT IN (SELECT custid FROM transmaster WHERE +
transdate BETWEEN 3/1/89 AND 3/31/89 )
This is the resulting data:
custid company UNNAMED UNNAMED UNNAMED
--------------------------------------------------------------
100 PC Distribution Inc. 5080 03/20/89 $80000.00
101 Computer Distrib Inc. 5000 03/01/89 $29000.00
101 Computer Distrib Inc. 5060 03/08/89 $30000.00
102 Industrial Computers 5075 03/16/89 $155500.00
103 Computer Mountain Inc. 5015 03/06/89 $42000.00
104 Industrial Concepts 5050 03/07/89 $56250.00
104 Industrial Concepts 5070 03/15/89 $95500.00
106 Computer Warehouse 5065 03/14/89 $140300.00
107 Midtown Computer Co. 5010 03/03/89 $108750.00
107 Midtown Computer Co. 5085 03/19/8 $74250.00
110 Southwest Computers 0 -0- $0.00
105 PC Consult and Design 0 -0- $0.00
Each successive select that is joined on with the UNION operator
simply adds rows to the temporary table whose structure is defined
by the first SELECT command. Notice that three of the columns are
titled "UNNAMED". These are the columns where there is data only
from the first SELECT command, the second SELECT command isn't
selecting data from these columns.
Continue to combine SELECT commands with the UNION operator making
sure each one works by itself. Test and optimize each SELECT command
separately, then combine them. For information on optimizing the
SELECT commands see the articles "Optimizing Application code for
Speed" in the May/June 1992 Exchange (automated FAX server document
#641, 206-649-2789) and "Making SUB-SELECTS, Outer Joins, & Rules
Faster" in the July/August 1991 Exchange (FAX document #318).
Create the view
===============
Once the complete SELECT command is working, simply preface it with
CREATE VIEW viewname AS and a view is created. A view is simply a
saved SELECT command. But the view has "UNNAMED" columns, and
"UNNAMED" columns can't be referenced in a report. To access all
the columns in the view, give the columns names when the view is
created.
Using the above example,
CREATE VIEW test (custid,company,transid,transdate,netamount) AS +
SELECT custid,company,transid,transdate,netamount +
FROM customer,transmaster +
WHERE customer.custid=transmaster.custid AND transdate +
BETWEEN 3/1/89 AND 3/31/89 +
UNION +
SELECT custid,company,0,'',$0 +
FROM customer +
WHERE custid NOT IN (SELECT custid FROM transmaster WHERE transdate +
BETWEEN 3/1/89 AND 3/31/89 )
The column names can be the same names as the column names in the
tables or they can be unique to the view. Just be sure to name all
the columns being selected.
Once you have mastered the technique, outer joins are no longer a
mystery. Use the techniques described above to successfully join
any number of tables together.
Examples Using More Than Two Tables
===================================
Taking the example above, how would it change to include data from
the Employee table as well? Add the columns from the Employee table
to the first SELECT command and add an additional link to the WHERE
clause. The second SELECT command doesn't change except for adding
place holders for the employee information. There is no link between
the Customer and Employee tables.
SELECT custid,company,transid,transdate,netamount,EMPLNAME,EMPCITY
FROM customer,transmaster,EMPLOYEE
WHERE customer.custid=transmaster.custid AND
TRANSMASTER.EMPID=EMPLOYEE.EMPID AND transdate
BETWEEN 3/1/89 AND 3/31/89
UNION
SELECT custid,company,0,'',$0,'',''
FROM customer
WHERE custid NOT IN (SELECT custid FROM transmaster WHERE transdate
BETWEEN 3/1/89 AND 3/31/89 )
This is the resulting data. Note that the additional columns are
UNNAMED because they are constants in the second SELECT command.
UNNAMED UNNAMED UNNAMED UNNAMED UNNAMED UNNAMED UNNAMED
------------------------------------------------------------------------------
100 PC Distribution Inc. 5080 03/20/89 $80000.00 Coffin Duvall
101 Computer Distrib Inc. 5000 03/01/89 $29000.00 Wilson Seattle
101 Computer Distrib Inc. 5060 03/08/89 $30000.00 Wilson Seattle
107 Midtown Computer Co. 5085 03/19/89 $74250.00 Smith Seattle
.......
110 Southwest Computers 0 -0- $0.00 -0- -0-
105 PC Consult and Design 0 -0- $0.00 -0- -0-
What if information from the Contact table was added instead? Again,
the contact information is added and linked in using the first SELECT
command. There is a link between the Customer and Contact tables so
the second SELECT includes the columns from the Contact table also.
SELECT custid,company,transid,transdate,netamount,CONTLNAME
FROM customer,transmaster,CONTACT
WHERE customer.custid=transmaster.custid AND
CUSTOMER.CUSTID=CONTACT.CUSTID AND transdate
BETWEEN 3/1/89 AND 3/31/89
UNION
SELECT custid,company,0,'',$0,CONTLNAME
FROM customer,CONTACT
WHERE CUSTOMER.CUSTID=CONTACT.CUSTID AND custid
NOT IN (SELECT custid FROM transmaster WHERE transdate
BETWEEN 3/1/89 AND 3/31/89 )
This is the resulting data. Note the additional column for contlname.
It is not UNNAMED because it is selected in both SELECT commands.
custid company UNNAMED UNNAMED UNNAMED CONTLNAME
------------------------------------------------------------------------
100 PC Distribution Inc. 5080 03/20/89 $80000.00 Estwitz
100 PC Distribution Inc. 5080 03/20/89 $80000.00 Sabini
101 Computer Distrib Inc. 5000 03/01/89 $29000.00 Finnegan
101 Computer Distrib Inc. 5000 03/01/89 $29000.00 Preston
107 Midtown Computer Co. 5085 03/19/89 $74250.00 Brady
.......
110 Southwest Computers 0 -0- $0.00 Adams
105 PC Consult and Design 0 -0- $0.00 Chin
What would the SELECT command look like if we added the Employee
information and also wanted to see those employees who made no
transactions in March? This requires the addition of a third SELECT
command. The first SELECT command finds and displays information
about customers and employees with transactions in March; the second
SELECT displays the information about customers who didn't have
transactions in March; the third SELECT displays the information
about employees who didn't have transactions in March.
SELECT custid,company,transid,transdate,netamount,EMPLNAME,EMPCITY
FROM customer,transmaster,EMPLOYEE
WHERE customer.custid=transmaster.custid AND
TRANSMASTER.EMPID=EMPLOYEE.EMPID AND transdate
BETWEEN 3/1/89 AND 3/31/89
UNION
SELECT custid,company,0,'',$0,'',''
FROM customer
WHERE custid NOT IN (SELECT custid FROM transmaster WHERE transdate
BETWEEN 3/1/89 AND 3/31/89 )
UNION
SELECT 0,'',0,'',$0,EMPLNAME,EMPCITY
FROM EMPLOYEE
WHERE EMPID NOT IN (SELECT EMPID FROM transmaster WHERE
transdate BETWEEN 3/1/89 AND 3/31/89 )
This is the resulting data. Note that all columns are now UNNAMED.
UNNAMED UNNAMED UNNAMED UNNAMED UNNAMED UNNAMED UNNAMED
-------------------------------------------------------------------------------
100 PC Distribution Inc. 5080 03/20/89 $80000.00 Coffin Duvall
101 Computer Distrib Inc. 5000 03/01/89 $29000.00 Wilson Seattle
101 Computer Distrib Inc. 5060 03/08/89 $30000.00 Wilson Seattle
102 Industrial Computers 5075 03/16/89 $155500.00 Hernandez Seattle
103 Computer Mountain 5015 03/06/89 $42000.00 Smith Seattle
104 Industrial Concepts 5050 03/07/89 $56250.00 Hernandez Seattle
104 Industrial Concepts 5070 03/15/89 $95500.00 Hernandez Seattle
106 Computer Warehouse 5065 03/14/89 $140300.00 Simpson Redmond
107 Midtown Computer Co. 5010 03/03/89 $108750.00 Smith Seattle
107 Midtown Computer Co. 5085 03/19/89 $74250.00 Smith Seattle
110 Southwest Computers 0 -0- $0.00 -0- -0-
105 PC Consult and Design 0 -0- $0.00 -0- -0-
0 -0- 0 -0- $0.00 Chou Woodinvi
0 -0- 0 -0- $0.00 Watson Redmond
0 -0- 0 -0- $0.00 Williams Seattle