DOCUMENT #682
     =======================================================================
     WORKING WITH OUTER JOINS
     =======================================================================
     Product:  R:BASE                Version :  3.1 & Higher
     =======================================================================
     Area   :  DATA MANIPULATION     Category:  VIEWS                 
     =======================================================================
 
 
     When linking tables in QBE (Query-by-Example), you usually choose 
     to join them where the linking columns are equal, i.e. where you 
     have data that matches in both tables.  Sometimes, however, you also 
     want to see the data where the linking column doesn't match.  This 
     option is not available from the menus; you can't view both the rows 
     that match and the rows that don't match with a single query.  You 
     can, however, create a query like this from the R> prompt.  It's 
     called an "outer join." 
 
     Even though you don't see it, when you create a query using the 
     menus, R:BASE builds a SELECT command with a WHERE clause to link 
     the tables and identify the rows to display.  By definition, an outer 
     join retrieves sets of completely different rows.  A WHERE clause that 
     displays all the rows that match between two or more tables is mutually 
     exclusive from a WHERE clause that displays the rows that don't match.  
     This means you need two different WHERE clauses.
 
     You might think you can see both sets of rows by creating a WHERE 
     clause with two conditions separated by an OR operator, but this 
     doesn't work.  The right data is not retrieved.  What you really 
     need are two separate SELECT commands with two separate WHERE clauses.  
 
     The menus won't let you do this.  You need to use the UNION operator 
     of the SELECT command at the R> prompt.  The purpose of the UNION 
     operator is to join two or more SELECT commands together to create 
     one table of data.
 
     This article gives a few simple techniques to follow when creating 
     outer joins and shows how to use the techniques with an actual example 
     from the CONCOMP sample database.
 
     
     Identify The Tables And Data Conditions
     =======================================
     The first step is to identify the tables to be joined and any 
     conditions needed to qualify the data.  Once the tables and 
     conditions have been identified, sketch out the different select 
     commands needed.  Be sure to include the linking columns.  Here's 
     the basic structure of the complete SELECT command to create an 
     outer join of two tables:
 
      SELECT data FROM table1 and table2 where the linking columns
       are equal, these are the matching rows
      UNION
      SELECT data FROM table1 where there is no matching data in table2
      UNION
      SELECT data FROM table2 where there is no matching data in table1
 
     You need three separate SELECT commands to cover all the possible 
     combinations of selecting data from two tables.  But you may not 
     need all three of the SELECT commands.  If you know that there is 
     never a row in table2 if there isn't a row in table1, you won't need 
     the third SELECT command.
 
     It's important to sketch out the different select commands you need 
     before starting to write them because it becomes more complex with 
     each table that you add to the join.  When joining two tables, there 
     can be up to three possible SELECT commands needed to retrieve all 
     the data.  When joining three tables, there can be up to seven possible 
     SELECT commands.  Look at the following table for a 3-table join.  
     "YES" means there could be data in the table.
 
                     |  table 1      |   table 2     |      table 3
      --------------------------------------------------------------------
       SELECT1       |    YES        |     YES       |      YES
       SELECT2       |    YES        |     YES       |      NO
       SELECT3       |    YES        |     NO        |      YES
       SELECT4       |    YES        |     NO        |      NO
       SELECT5       |    NO         |     YES       |      YES
       SELECT6       |    NO         |     YES       |      NO
       SELECT7       |    NO         |     NO        |      YES
                    
     By knowing your database structure and how your data is organized you 
     can reduce the number of SELECT commands needed to complete the outer 
     join and won't need to account for all possible combinations of data.  
     This is important because joining four tables for an outer join has 
     over 15 possible SELECT commands.
 
 
     Write And Test Each SELECT Command By Itself
     ============================================
     Once you have identified the tables and the conditions, and sketched 
     out the different SELECT commands needed, begin writing the individual 
     SELECT commands.  Each SELECT command stands alone and is executed, 
     tested and optimized by itself.  Then all the working SELECT commands 
     are combined into one using the UNION operator.  When writing the 
     individual SELECT commands, remember that each must select the same 
     number of columns, and the columns must have the same datatypes and 
     be selected in the same order. 
 
     Start with the SELECT command that will retrieve the rows where the 
     linking column matches across all the tables.  The easiest way to 
     write the SELECT command is to place each of its parts on a separate 
     line like this:
 
      SELECT < the columns names you want to display data from >
       FROM < the table names the data is in >
       WHERE < put the linking column(s) here and any other data conditions >
 
     Use this same technique to structure the other SELECT commands. For 
     the other SELECT commands, first fill out the FROM line.  This 
     identifies the table(s) this particular SELECT command will retrieve 
     data from and keys both the SELECT and WHERE clauses.  You need the 
     same number of values (either column names or constants) after the 
     keyword SELECT as in the first command.  The FROM will determine 
     which values are constants and which are column names.  Make sure 
     they are in the same order as the first command.  The WHERE clause 
     will use a sub-SELECT to identify the rows that are NOT IN the other 
     table(s).  A Sub-SELECT makes a list of data values to compare to.  
 
     An example
     ==========
     Let's look at a specific example from the CONCOMP sample database. 
     The desired result is a report of transaction information for each 
     customer for the month of March.  Not all customers had transactions 
     in March, but all customers must be listed and show $0.00 if they had 
     no transactions.  
 
     First, identify the tables and data conditions that are needed and 
     sketch out the SELECT commands.  Transaction information is stored 
     in the Transmaster table.  Customer information is in the Customer 
     table.  The tables are linked by the custid column.  The first SELECT 
     command selects the rows from the Transmaster and Customer tables 
     where the customer has a transaction record, i.e. the rows that match 
     between the two tables.
 
      SELECT customer number (customer), company name (customer),
       transaction number (transmaster), transaction date(transmaster), 
       transaction amount (transmaster),
       FROM customer,transmaster
       WHERE custid = custid and transdate in March
 
     Notice that the actual column names aren't used.  You can, but when 
     sketching out the SELECT commands it's often easier to not use the 
     actual names, but just put the description of the data you want to see.  
     Put the table name next to data description so it's easy to see which 
     table those items are from.
     
     Next, sketch out the other SELECT commands to find the rows that don't 
     match.  Data entry rules are defined in the database, so there can't 
     be a transaction (a row in the Transmaster table) if there is no 
     customer record.  This leaves just one additional SELECT command, to 
     find the rows from the Customer table where the custid is not in the 
     Transmaster table. 
 
     SELECT customer number (customer), company name (customer), 
      transaction number (constant), transaction date (constant), 
      transaction amount (constant) 
      FROM customer
      WHERE custid NOT IN (SELECT custid FROM transmaster WHERE transdate
      in March)
 
     Here data is only selected from the Customer table, so the data 
     values in the first SELECT that were from the Transmaster table 
     become constants in this SELECT command.  
 
     Now write and test the actual SELECT commands.  The first one, a 
     straightforward multi-table SELECT command, looks like this:
 
                                       columns from the
              columns from the        Transmaster table         
                  Customer table      /             \          
                       /      \       /               \       
      SELECT custid,company,transid,transdate,netamount +
       FROM customer,transmaster +
       WHERE customer.custid=transmaster.custid AND +
                         / transmaster.transdate BETWEEN 3/1/89 AND 3/31/89
                        /          \
                   links the tables \
                             qualifies the rows
        
     Notice that in the WHERE clause, in addition to linking the tables 
     to find the rows that match (customer.custid=transmaster.custid), 
     there is the additional condition to retrieve only the records from 
     March.  This second condition is on a column (transdate) from the 
     Transmaster table only.  
 
     After this SELECT command works to retrieve the desired data, write 
     and test the next command.  This is just a single table select.  
     Because there is no data from the Transmaster table no columns are 
     selected from that table.  "Place holders" are used instead, either 
     a constant value or '' (quote quote) for null. There are no tables 
     to link together in the WHERE clause.  Instead, use a Sub-SELECT to 
     select the rows from the Customer table that are not in the 
     Transaction table.
 
 
                               place holders to match                          
       columns from the       columns from the Transmaster table   
       Customer table             /   \    
                      /     \             /     \                          
      SELECT custid,company,0,'',$0 +
       FROM customer +
       WHERE custid NOT IN (SELECT custid FROM transmaster WHERE +
                       /             transdate BETWEEN 3/1/89 AND 3/31/89 )
                     /                   \
             finds the rows that          \
             don't match                 only compare to rows in the
                                          right date range
 
   Remember that in the first select there was a WHERE clause based on the 
   Transmaster table.  The second select is from the Customer table only, 
   so there is no Transdate column to put a condition on.  To make sure 
   the right rows are retrieved, that WHERE clause condition must be put 
   on the Sub-SELECT, which is from the Transmaster table.  The data 
   includes only customers who didn't have transactions in March.  If 
   there is no WHERE clause on the sub-SELECT to restrict the date to 
   March, this SELECT command would have returned customers who, in some 
   month, sometime, had no transactions; not the correct data.  
 
     Testing each SELECT command separately makes sure it is returning 
     correct data.  If each of the individual SELECTs return correct data, 
     then the complete SELECT will return correct data.
 
     Finally, join the two SELECT commands with the UNION operator:
 
      SELECT custid,company,transid,transdate,netamount +
       FROM customer,transmaster +
       WHERE customer.custid=transmaster.custid AND +
                 transmaster.transdate BETWEEN 3/1/89 AND 3/31/89 +
      UNION +
      SELECT custid,company,0,'',$0 +
       FROM customer +
       WHERE custid NOT IN (SELECT custid FROM transmaster WHERE +
                             transdate BETWEEN 3/1/89 AND 3/31/89 )
 
 
     This is the resulting data:
                                                 
      custid  company                 UNNAMED  UNNAMED   UNNAMED 
      --------------------------------------------------------------
      100     PC Distribution Inc.    5080     03/20/89  $80000.00     
      101     Computer Distrib Inc.   5000     03/01/89  $29000.00
      101     Computer Distrib Inc.   5060     03/08/89  $30000.00
      102     Industrial Computers    5075     03/16/89  $155500.00
      103     Computer Mountain Inc.  5015     03/06/89  $42000.00
      104     Industrial Concepts     5050     03/07/89  $56250.00
      104     Industrial Concepts     5070     03/15/89  $95500.00
      106     Computer Warehouse      5065     03/14/89  $140300.00
      107     Midtown Computer Co.    5010     03/03/89  $108750.00
      107     Midtown Computer Co.    5085     03/19/8   $74250.00
      110     Southwest Computers     0          -0-     $0.00
      105     PC Consult and Design   0          -0-     $0.00
 
 
     Each successive select that is joined on with the UNION operator 
     simply adds rows to the temporary table whose structure is defined 
     by the first SELECT command. Notice that three of the columns are 
     titled "UNNAMED".  These are the columns where there is data only 
     from the first SELECT command, the second SELECT command isn't 
     selecting data from these columns.
 
     Continue to combine SELECT commands with the UNION operator making 
     sure each one works by itself.  Test and optimize each SELECT command 
     separately, then combine them.  For information on optimizing the 
     SELECT commands see the articles "Optimizing Application code for 
     Speed" in the May/June 1992 Exchange (automated FAX server document 
     #641, 206-649-2789) and "Making SUB-SELECTS, Outer Joins, & Rules 
     Faster" in the July/August 1991 Exchange (FAX document #318).  
 
     Create the view
     ===============
     Once  the complete SELECT command is working, simply preface it with 
     CREATE VIEW viewname AS and a view is created.  A view is simply a 
     saved SELECT command.  But the view has "UNNAMED" columns, and 
     "UNNAMED" columns can't be referenced in a report.  To access all 
     the columns in the view, give the columns names when the view is 
     created.
 
     Using the above example,
  
      CREATE VIEW test (custid,company,transid,transdate,netamount) AS +
       SELECT custid,company,transid,transdate,netamount +
        FROM customer,transmaster +
        WHERE customer.custid=transmaster.custid AND transdate +
         BETWEEN 3/1/89 AND 3/31/89 +
       UNION +
        SELECT custid,company,0,'',$0 +
         FROM customer +
         WHERE custid NOT IN (SELECT custid FROM transmaster WHERE transdate +
          BETWEEN 3/1/89 AND 3/31/89 )           
 
     The column names can be the same names as the column names in the 
     tables or they can be unique to the view.  Just be sure to name all 
     the columns being selected.
 
     Once you have mastered the technique, outer joins are no longer a 
     mystery.  Use the techniques described above to successfully join 
     any number of tables together.  
 
 
     Examples Using More Than Two Tables
     ===================================
     Taking the example above, how would it change to include data from 
     the Employee table as well? Add the columns from the Employee table 
     to the first SELECT command and add an additional link to the WHERE 
     clause.  The second SELECT command doesn't change except for adding 
     place holders for the employee information.  There is no link between 
     the Customer and Employee tables.
 
      SELECT custid,company,transid,transdate,netamount,EMPLNAME,EMPCITY 
       FROM customer,transmaster,EMPLOYEE 
       WHERE customer.custid=transmaster.custid AND
        TRANSMASTER.EMPID=EMPLOYEE.EMPID AND transdate 
        BETWEEN 3/1/89 AND 3/31/89
      UNION 
      SELECT custid,company,0,'',$0,'','' 
       FROM customer 
       WHERE custid NOT IN (SELECT custid FROM transmaster WHERE transdate
                            BETWEEN 3/1/89 AND 3/31/89 )           
 
     This is the resulting data.  Note that the additional columns are 
     UNNAMED because they are constants in the second SELECT command.
 
UNNAMED UNNAMED                UNNAMED UNNAMED   UNNAMED     UNNAMED  UNNAMED
------------------------------------------------------------------------------
100     PC Distribution Inc.   5080    03/20/89  $80000.00   Coffin   Duvall   
       
101     Computer Distrib Inc.  5000    03/01/89  $29000.00   Wilson   Seattle 
101     Computer Distrib Inc.  5060    03/08/89  $30000.00   Wilson   Seattle  
        
107     Midtown Computer Co.   5085    03/19/89  $74250.00   Smith    Seattle  
 
.......
110     Southwest Computers    0         -0-     $0.00        -0-      -0-     
     
105     PC Consult and Design  0         -0-     $0.00        -0-      -0-  
 
     What if information from the Contact table was added instead?  Again, 
     the contact information is added and linked in using the first SELECT 
     command.  There is a link between the Customer and Contact tables so 
     the second SELECT includes the columns from the Contact table also.
       
      SELECT custid,company,transid,transdate,netamount,CONTLNAME 
       FROM customer,transmaster,CONTACT
       WHERE customer.custid=transmaster.custid AND
       CUSTOMER.CUSTID=CONTACT.CUSTID AND transdate 
       BETWEEN 3/1/89 AND 3/31/89
      UNION 
      SELECT custid,company,0,'',$0,CONTLNAME 
       FROM customer,CONTACT 
       WHERE CUSTOMER.CUSTID=CONTACT.CUSTID AND custid 
       NOT IN (SELECT custid FROM transmaster WHERE transdate
                             BETWEEN 3/1/89 AND 3/31/89 )           
 
     This is the resulting data.  Note the additional column for contlname.  
     It is not UNNAMED because it is selected in both SELECT commands.
 
     custid company               UNNAMED  UNNAMED  UNNAMED     CONTLNAME      
            
     ------------------------------------------------------------------------
     100    PC Distribution Inc.  5080     03/20/89  $80000.00  Estwitz
     100    PC Distribution Inc.  5080     03/20/89  $80000.00  Sabini
     101   Computer Distrib Inc.  5000     03/01/89  $29000.00  Finnegan
     101   Computer Distrib Inc.  5000     03/01/89  $29000.00  Preston
     107   Midtown Computer Co.   5085     03/19/89  $74250.00  Brady
     .......
     110   Southwest Computers    0        -0-       $0.00      Adams
     105   PC Consult and Design  0        -0-       $0.00      Chin
 
 
     What would the SELECT command look like if we added the Employee 
     information and also wanted to see those employees who made no 
     transactions in March?  This requires the addition of a third SELECT 
     command.  The first SELECT command finds and displays information 
     about customers and employees with transactions in March; the second 
     SELECT displays the information about customers who didn't have 
     transactions in March; the third SELECT displays the information 
     about employees who didn't have transactions in March.
 
      SELECT custid,company,transid,transdate,netamount,EMPLNAME,EMPCITY 
       FROM customer,transmaster,EMPLOYEE 
       WHERE customer.custid=transmaster.custid AND
        TRANSMASTER.EMPID=EMPLOYEE.EMPID AND transdate 
        BETWEEN 3/1/89 AND 3/31/89
      UNION 
      SELECT custid,company,0,'',$0,'','' 
       FROM customer 
       WHERE custid NOT IN (SELECT custid FROM transmaster WHERE transdate
                            BETWEEN 3/1/89 AND 3/31/89 )
      UNION
      SELECT 0,'',0,'',$0,EMPLNAME,EMPCITY 
       FROM EMPLOYEE
       WHERE EMPID NOT IN (SELECT EMPID FROM transmaster WHERE 
                                     transdate BETWEEN 3/1/89 AND 3/31/89 )
 
This is the resulting data.  Note that all columns are now UNNAMED.
 
UNNAMED UNNAMED               UNNAMED UNNAMED   UNNAMED     UNNAMED    UNNAMED 
           
-------------------------------------------------------------------------------
 
100     PC Distribution Inc.  5080    03/20/89  $80000.00   Coffin     Duvall  
        
101     Computer Distrib Inc. 5000    03/01/89  $29000.00   Wilson     Seattle 
         
101     Computer Distrib Inc. 5060    03/08/89  $30000.00   Wilson     Seattle 
         
102     Industrial Computers  5075    03/16/89  $155500.00  Hernandez  Seattle 
         
103     Computer Mountain     5015    03/06/89  $42000.00   Smith      Seattle 
         
104     Industrial Concepts   5050    03/07/89  $56250.00   Hernandez  Seattle 
         
104     Industrial Concepts   5070    03/15/89  $95500.00   Hernandez  Seattle 
         
106     Computer Warehouse    5065    03/14/89  $140300.00  Simpson    Redmond 
         
107     Midtown Computer Co.  5010    03/03/89  $108750.00  Smith      Seattle 
         
107     Midtown Computer Co.  5085    03/19/89  $74250.00   Smith      Seattle 
         
110     Southwest Computers   0       -0-       $0.00       -0-        -0-     
     
105     PC Consult and Design 0       -0-       $0.00       -0-        -0-     
      
0       -0-                   0       -0-       $0.00       Chou       Woodinvi
          
0       -0-                   0       -0-       $0.00       Watson     Redmond 
           
0       -0-                   0       -0-       $0.00       Williams   Seattle