791.TXT
=====================================================================
Using SELECT
=====================================================================
PRODUCT: R:BASE VERSION: 3.1 or Higher
=====================================================================
CATALOG: Programming in R:BASE AREA : Select
=====================================================================
SELECT is a very powerful and useful command. It is used throughout
R:BASE, not just in commands at the R> prompt. For example, the
SELECT command is used when you save a view in QBE, create a rule,
define a cursor, and anytime R:BASE retrieves data. Understanding
the workings of the SELECT command improves your ability to use
R:BASE effectively.
SELECT Command Parts
Although the SELECT command syntax looks intimidating, you can master
it by breaking it down into its component parts. The SELECT command
is made up of six parts or clauses, each defined by a keyword:
SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY.
SELECT_defines what is displayed by the command. The SELECT
clause lists the columns, literal text, SELECT functions
(SUM, AVG, COUNT, MIN, and MAX), and expressions that are
manipulated and displayed by the SELECT command. The
columns listed in this clause must be in the tables listed
in the FROM clause. The SELECT clause is required in all
SELECT commands.
FROM_lists the tables from which data is to be displayed.
Tables listed in this clause must match the columns listed
in the SELECT clause. More than one table can be listed.
If more than one table is listed, a WHERE clause is required
to link the tables. The FROM clause is required in all
SELECT commands.
WHERE_limits the rows of data retrieved from the tables
listed in the FROM clause and links the tables in a multi-
table SELECT. Columns used in this clause must be in the
tables listed in the FROM clause, but do not need to be
included in the SELECT clause. You can put conditions on
columns that are not displayed. The WHERE clause is
optional unless more than one table is listed in the FROM
clause, then the WHERE clause is required.
GROUP BY_groups data for display and performs computations
using the SELECT functions. All columns in the SELECT clause
must be listed here unless they are used in one of the SELECT
functions. The GROUP BY clause is optional.
HAVING_assigns WHERE clause type criteria to each group defined
by a GROUP BY clause. The WHERE clause limits the rows of data
selected by the command based on tables in the FROM clause;
the HAVING clause limits the groups of data selected based on
columns in the GROUP BY clause.
The HAVING clause is optional.
ORDER BY_sorts the rows of data retrieved by the other clauses
of the command. The sorting is done on the final set of data
after all the clauses of the command have been completed. The
ORDER BY clause is optional.
These are the basic clauses of the SELECT command. A complete SELECT
command can have many clauses. One option of a WHERE clause, for
example, is a sub-SELECT, another entire SELECT command. To make a
SELECT command easy to read and understand, place each clause on a
separate line, using the continuation character (+) to tell R:BASE
that the entire command encompasses more than one line. For example,
SELECT empid, custid, COUNT(transid), SUM(netamount) +
FROM transmaster +
WHERE netamount < $100,000 +
GROUP BY empid, custid +
HAVING COUNT(transid) > 1 +
AND SUM(netamount) > $100,000 ORDER BY 4 DESC
By writing the command like this, it is easy to see that the columns
in the SELECT clause are in the GROUP BY, and the FROM tables match
the column list. This SELECT command is relatively simple even though
it uses all the clauses; only one table and a few columns are being
selected. The benefits are even more obvious with SELECT commands
that retrieve many columns from more than one table and have complex
WHERE clauses.
The SELECT clause
In addition to columns, variables, text strings and expressions can
be included in the SELECT clause part of the SELECT command. Any of
the R:BASE SuperMath functions can be used to perform calculations on
the data retrieved and columns can be concatenated for better display.
Often, when using variables or text strings in a SELECT, you also SET
HEADINGS OFF. This setting turns off the display of the default column
headings of the SELECT command. Instead, use the WRITE command to
generate your own headings. For example:
SET HEADINGS OFF
WRITE 'Employee Name Phone Number Ext'
WRITE '---------------------------- ------------ ---'
SELECT (empfname & emplname)=30, empphone=12, empext=3 +
FROM employee
Employee Name Phone Number Ext
---------------------------- ------------ ---
Peter Coffin 649-4567 305
Mary Simpson 649-4567 303
Sandi Watson 649-4567 292
Distinct
By default, the SELECT command returns all rows of data. The DISTINCT
option can be specified in the SELECT clause to return only unique
rows of data. The DISTINCT option is followed by a list of columns,
and other valid items and operates only on the columns listed. If two
columns are listed, the DISTINCT option returns one row for each set
of rows where both column values are the same, ignoring data that is
not selected. For example:
emptitle emplname empfname
-------- -------- --------
Manager Johnson Jerry
Manager Johnson Susan
Representative Johnson Michael
There are two rows where the values for emptitle and emplname are
identical. These rows compress into one row in the final result set
when the DISTINCT option is used and only the title and last name
columns are selected. If the first name was also selected, all three
rows would be returned since there are no duplicates.
SELECT DISTINCT emptitle, emplname FROM employee
emptitle emplname
-------- --------
Manager Johnson
Representative Johnson
Into
Data is selected directly from a table into variables by using the
INTO option, but you must include a WHERE clause to ensure that the
SELECT command finds only one row. The number of items in the SELECT
clause must match the number of variables in the INTO variable list.
The INTO variable list immediately follows the complete list of
columns. Don't put the corresponding variable next to the column.
For example, this syntax is wrong:
SELECT custid INTO vcustid, company INTO vcompany FROM customer
This syntax is correct:
SELECT custid, company INTO vcustid, vcompany FROM customer
The INTO variable list is often used with a GROUP BY clause to
retrieve summary data into variables.
The FROM clause
You can retrieve data from more than one table in the SELECT command.
To use more than one table, list all the columns you want to display
in the SELECT clause, list the tables in the FROM clause, and then
link the tables together in the WHERE clause. This is called a
multi-table SELECT.
Multi-table SELECT commands are widely used in relational databases.
Information is cross-referenced among many tables, but detailed,
descriptive information is only stored in one table. For example,
a customer number is referenced in an orders table, the customer name
and address information is stored separately in a customer table.
When retrieving order data, a multi-table select is used to display
the customer name along with the order data.
Multi-table SELECT commands commonly use alias names to identify the
tables. An alias name identifies which column goes with which table.
For example, T_1 is an alias name for the first table in the FROM
clause and T_2 is the alias name for the second table listed. Instead
of an alias name, you can use just the table name. These two commands
are equivalent:
SELECT T_1.empid, T_1.hiredate, T_2.salesbonus +
FROM employee T_1, salesbonus T_2 +
WHERE T_1.empid = T_2.empid
SELECT employee.empid, employee.hiredate,salesbonus.salesbonus +
FROM employee, salesbonus +
WHERE employee.empid = salesbonus.empid
Either the alias name or the table name explicitly identifies from
which table to retrieve the data. This is necessary when linking the
tables in the WHERE clause of the command. If an alias name or table
name is not specified, R:BASE retrieves the data from the first
table in the FROM clause that contains the column. For example, the
following command always retrieves data for the empid column from
the Employee table.
SELECT empid, hiredate, salesbonus +
FROM employee, salesbonus +
WHERE empid = empid
R:BASE never looks at the data for the empid column from the
Salesbonus table. The link in the WHERE clause doesn't work correctly
because it compares the empid value from Employee with itself. Change
the command to use either the table name or an alias name in the
WHERE clause and the correct data is retrieved.
SELECT empid, hiredate, salesbonus +
FROM employee, salesbonus +
WHERE employee.empid = salesbonus.empid
The WHERE clause
The WHERE clause of the SELECT command is used to put conditions on
the rows of data to be retrieved. Instead of getting all the rows,
you can specify conditions to identify a particular subset of data.
Multiple conditions are combined with AND or OR. The Command
Dictionary section in the R:BASE Reference Manual lists all of the
WHERE clause operators and conditions under the keyword WHERE.
In addition, the WHERE clause links the tables in a multi-table
SELECT command. Here's an example of a three-table SELECT command
displaying companies, order numbers, and order detail information.
The WHERE clause links the three tables together:
SELECT T_1.company, T_2.transid, T_3.model, T_3.units +
FROM customer T_1, transaction T_2, transdetail T_3 +
WHERE T_1.custid = T_2.custid AND T_2.transid = T_3.transid
The WHERE clause can include conditions to qualify the rows as well
as linking tables. For example, to select just data from September,
the above command is modified to include a condition on the transdate
column.
SELECT T_1.company, T_2.transid, T_3.model, T_3.units +
FROM customer T_1, transaction T_2, transdetail T_3 +
WHERE T_1.custid = T_2.custid AND T_2.transid = T_3.transid+
AND transdate BETWEEN 9/1/94 AND 9/30/94
An easy way to structure the WHERE clause of a multi-table SELECT
command is to put the links first, and then add other conditions.
Sub-SELECTs
A special option of a WHERE clause is the sub-SELECT. A sub-SELECT
creates a list of values that qualify a row for inclusion in the
final result set. A sub-SELECT uses any of the six clauses of the
SELECT command except ORDER BY. A sub-SELECT is always enclosed in
parentheses. Only one column, expression, or SELECT function is used
in the SELECT clause of a sub-SELECT. Usually the IN or NOT IN
operator is used to compare a column with a sub-SELECT although
other operators can be used. The IN and NOT IN operators return rows
when the comparison column is in (or not in) the list of values
returned by the sub-SELECT; it is very easy to understand the result
of the comparison.
A sub-SELECT is often used to compare a column value to data in a
table not included in the FROM clause. For example, to see order
information about customers from California, a sub-SELECT is used
because the state information is not stored in the transmaster
table.
SELECT * +
FROM transmaster +
WHERE custid IN +
(SELECT custid +
FROM customer +
WHERE custstate = 'CA')
In many cases a sub-SELECT and multi-table SELECT are
interchangeable. Which one you use depends on the data values
displayed by the SELECT clause. In the above command, for example,
only data stored in the transmaster table is displayed. The company
name is not available, only the transmaster table is listed in the
FROM clause. To view the company name along with the transaction
data, a multi-table SELECT is used instead of a sub-SELECT and both
the transmaster and the customer tables are listed in the FROM clause.
A sub-SELECT is very useful when a GROUP BY clause is used to
restrict the data, but a number of columns needs to be displayed, not
all of which should be included in the GROUP BY clause. For example,
the following command returns a list of customer ID numbers where the
customer has more than one order, and the total of all their orders
is more than $100,000.00.
SELECT custid +
FROM transmaster +
WHERE netamount < $100,000 +
GROUP BY custid +
HAVING COUNT(*) > 1 +
AND SUM(netamount) > $100,000
Including other columns in the SELECT clause and thus in the GROUP BY
clause changes the grouping criteria. If you include the transaction
number, for example, no rows are returned because the transaction
number is unique and no single row meets the qualifying criteria of
both the WHERE and the HAVING clauses. A row becomes a group by
itself. Instead, use the SELECT with the GROUP BY clause in a sub-
SELECT to view all the data in the transmaster table for those
customers meeting the sub-SELECT criteria.
SELECT * +
FROM transmaster +
WHERE custid IN +
( SELECT custid +
FROM transmaster +
WHERE netamount < $100,000 +
GROUP BY custid +
HAVING COUNT(*) > 1 AND +
SUM(netamount) > $100,000)
The GROUP BY clause
The GROUP BY clause of the SELECT command is like a break point in a
report. This clause allows you to group the data, returning one row
of output for each different group of values. It's common to use
SELECT functions with the GROUP BY to get subtotals, averages,
minimums, maximums, and counts for the groups.
All items listed in the SELECT clause must also be included in the
GROUP BY clause unless you use a SELECT function on the column.
Forgetting this rule is the most common mistake made with the GROUP
BY clause. For example, the following command won't work; a column
is included in the SELECT clause that is not included in the GROUP
BY clause.
SELECT col1, col2 FROM tblname GROUP BY col1
Both of the following commands work:
SELECT col1, col2 FROM tblname GROUP BY col1, col2
SELECT col1, SUM(col2) FROM tblname GROUP BY col1
Either both columns are included in the GROUP BY clause, or the
second column is used in a SELECT function.
A GROUP BY clause is similar to using the DISTINCT option in the
SELECT clause. The result set has one row for each unique set of
values. One difference is that a GROUP BY clause allows you to use
the SELECT functions in the SELECT clause.
SELECT Functions
The SELECT functions (SUM, AVG, MIN, MAX, and COUNT) are placed in
the SELECT clause of the SELECT command, but once you add a SELECT
function, you can't specify any other columns unless you use the
GROUP BY clause. A SELECT function returns a single value, not a
list of values. You can return a single value for the entire table,
or a single value for each group of rows.
The following command is valid because it returns a single value, the
sum of all rows in the table.
SELECT SUM(netamount) FROM transmaster
This next command is invalid because the SUM function returns a
single value, but the transid column returns a list of values. There
is no way R:BASE can display the output.
SELECT transid, SUM(netamount) FROM transmaster
Adding a GROUP BY clause allows R:BASE to calculate and display a sum
for each unique transid value.
SELECT transid, SUM(netamount) FROM transmaster GROUP BY transid
The SELECT functions do not include NULL values in their computations.
Generally this is fine unless you want to count all the rows in a
table. COUNT (colname) excludes NULL values from the count; COUNT (*)
includes NULL values in the count. For example, the following command
counts all rows including null values:
SELECT COUNT(*) FROM tblname
In general, unless you want to specifically count the non-NULL data
values in a column, always use COUNT(*). A common technique using the
COUNT(*) is to emulate the results of a TALLY command. The following
commands are equivalent:
TALLY empid FROM transmaster
SELECT empid, COUNT(*) FROM transmaster GROUP BY empid
The advantage of using SELECT over TALLY is that the count can be
calculated for more than one column, or for data from more than one
table. Using SELECT, the employee name could be displayed with the
result, for example:
SELECT empid, emplname, COUNT(*) +
FROM transmaster, employee +
WHERE transmaster.empid = employee.empid +
GROUP BY empid, emplname
Note that two tables are now listed in the FROM clause. They are
linked in the WHERE clause and both columns displayed are listed in
the GROUP BY clause. The result set is much easier to read than a
TALLY, which just lists the employee ID number and the count.
empid emplname COUNT (*)
---------- ---------------- ----------
102 Wilson 6
129 Hernandez 2
131 Simpson 4
133 Coffin 2
165 Williams 5
167 Watson 1
The HAVING clause
The HAVING clause which is like a WHERE clause for the GROUP BY
clause is used to exclude certain groups from the final result set.
A WHERE clause excludes individual rows; a HAVING clause excludes
groups of rows_groups created by the GROUP BY clause.
The HAVING clause syntax is identical to the WHERE clause syntax
except that the HAVING clause allows SELECT functions to be used in
comparisons. For example, to retrieve only groups having a sum
greater than a certain amount use this command:
SELECT custid +
FROM transmaster +
GROUP BY custid +
HAVING SUM(netamount) > $100,000
A list of customers whose total transactions are more than $100,000
is returned.
The HAVING clause is often used with a GROUP BY clause to check for
duplicate entries. The SELECT syntax that emulates a TALLY command is
used with a HAVING clause to return only groups with more than one
value. For example, to check for duplicate invoice entries use the
command:
SELECT transid +
FROM transmaster +
GROUP BY transid +
HAVING COUNT(*) > 1
Only invoice numbers that occur more than once in the transmaster
table are returned. To view all the data about these invoices, use
the SELECT command in a sub-SELECT as in the following example:
EDIT * FROM transmaster +
WHERE transid IN +
(SELECT transid +
FROM transmaster +
GROUP BY transid +
HAVING COUNT(*) > 1)
The ORDER BY clause
The ORDER BY clause of the SELECT command is used to order the final
result set. After all the other clauses have executed, the resulting
data is ordered as specified.
You can use the position number from the SELECT clause in the ORDER
BY clause to order by an expression or a function.
For example, the following two commands are equivalent:
SELECT custid, custstate FROM customer ORDER BY 2
SELECT custid, custstate FROM customer ORDER BY custstate
The 2 in the ORDER BY clause represents the second item in the
SELECT clause of the command. This feature is used to sort by
expressions, SELECT functions, or literal text. For example, to sort
tally results in descending order, use the following SELECT command:
SELECT empid, emplname, COUNT(*) +
FROM transmaster, employee +
WHERE transmaster.empid = employee.empid +
GROUP BY empid, emplname +
ORDER BY 2 DESC
The COUNT function has no column name associated with it that can be
used in the ORDER BY clause. Instead, use its position in the SELECT
clause_2. The DESC indicates to put the largest number at the top of
the list.
empid emplname COUNT (*)
---------- ---------------- ----------
102 Wilson 6
165 Williams 5
131 Simpson 4
129 Hernandez 2
133 Coffin 2
167 Watson 1
UNION SELECT
The UNION operator in a SELECT command joins two distinct SELECT
commands. The SELECT commands can be from the same or different
tables. Generally, use UNION SELECT when you want to append the
results of one SELECT onto the bottom of the results of another
SELECT to select mutually exclusive groups of data. By default,
UNION is DISTINCT; duplicate rows are not displayed. If you want
to see duplicate rows, use UNION ALL. The SELECT clauses of the
commands that you join with the UNION operator must each contain the
same number of items, and the data types of the items must match.
The SELECT clauses can, however, contain constants and expressions as
well as column names.
The UNION operator joins the two SELECT command results into one
result set. A UNION SELECT can be used to perform an outer join, or
you can use the new outer join syntax of the SELECT command added
in R:BASE 4.5 Plus!. Refer to the article "Easy Outer Joins" in the
January/February 1994 Exchange for information about the new outer
join syntax and the article "Working With Outer Joins" in the
January/February 1993 Exchange (document #682 on the FAX server) for
more information on using UNION SELECT to create an outer join.
Using SELECT in other commands
Other commands also use SELECT to retrieve rows of data. The CREATE
VIEW, INSERT, and DECLARE CURSOR commands use SELECT to select the
rows of data used by that command. By using the #VALUES option on the
CHOOSE command, you can take advantage of some of the SELECT command
features to make better menus. The multi-table UPDATE command joins
the tables like a SELECT command. When you build and save a query
from QBE, you're actually building and saving a SELECT statement. The
SELECT command built by QBE is a subset of the SELECT command
available at the R> prompt.
By using the SELECT command, you create faster and more efficient
code. In addition, you can test your command first by using only the
SELECT command. If data is correctly retrieved, then you know the
view is correct, the appropriate rows are inserted, and the
operations performed on each row by the DECLARE CURSOR command
operate on the right rows. This is an easy and quick way to verify
data before modifying it.
The SELECT command is a language in itself. By using it, you'll write
faster and more efficient code, and often accomplish tasks with one
command instead of writing program code.