""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
   INDEXES & IHASH ADD SPEED
   """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
   PRODUCT   :  R:BASE                  VERSION      :  3.1
   CATEGORY  :  SPEED & PERFORMANCE     SUBCATEGORY  :  INDEXES & IHASH
   """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
   By indexing the right columns, you can speed up your applications.
   Processes that used to take 20 minutes or more may be completed in
   only a few seconds. An index on an R:BASE column speeds up access to a
   row of data in much the same way an index in a book speeds up access
   to a page.
 
   Get a good introduction to indexing from your R:BASE manuals and
   "Maximize R:BASE 3.0 Performance with Index Processing" in the
   May/June 1990 R:BASE EXCHANGE.
 
 
   Deciding What to Index
   """"""""""""""""""""""
   It's a good idea to put an index on key columns and on linking
   columns. A key column is a column that uniquely identifies rows. A
   linking or common column is a column that exists in two or more tables
   in order to establish a relationship between the tables.
 
   If you need to use more than one column to uniquely identify a row,
   it's a good idea to combine them into a computed column, and then
   index the computed column.
 
   Use CREATE INDEX to index a column or to rebuild existing indexes.
 
 
   Optimizing Indexes
   """"""""""""""""""
   Optimize indexes by paying attention to the data type of the indexed
   column and by minimizing the number of duplicate values in the column.
 
   The fastest, most efficient data types for indexed access are INTEGER,
   REAL, DATE, TIME, TEXT with a defined length of four characters or
   less. or NUMERIC with a defined precision of two or less. Indexes on
   other columns will be less effective.
 
   The more unique the column's values, the more effective the index will
   be. The best columns to index are those that don't have duplicate
   values.
 
 
   Hashing Long TEXT Values
   """"""""""""""""""""""""
   R:BASE has to hash (convert) TEXT values that contain more than four
   characters to create a four-byte index called a "hash value." Even
   though a TEXT column contains completely unique values, it can end up
   with duplicate hash values.
 
 
   How R:BASE Hashes TEXT
   """"""""""""""""""""""
   R:BASE uses the following technique to make the hash values as unique
   as possible:
 
     o  R:BASE uses the binary equivalents of the first and second
        characters in the TEXT string to fill the first and second bytes
        of the four-byte hash value.
     o  R:BASE sums the binary values of all the odd-numbered characters
        in the TEXT string beginning with the third. Then it divides the
        sum by 256 and uses the remainder to fill the third byte of the
        hash value.
     o  R:BASE sums the binary values of all the even-numbered characters
        in the TEXT string beginning with the fourth. Then it divides the
        sum by 256 and uses the remainder to fill the fourth byte of the
        hash value.
 
   As a result, two hash values, created by hashing two different TEXT
   strings, may be identical. In that case, R:BASE builds a "multiple
   occurrence table" to store the duplicate hash values.
 
   This is why TEXT strings that begin with the same two characters
   (often the case with part numbers) that are otherwise unique may have
   the same hash value, thereby reducing the effectiveness of the index.
   If you can ensure that the first two characters of each TEXT column
   value are unique, you can improve the performance of an index on that
   TEXT column because you'll eliminate all the "multiple occurrence
   tables."
 
   To make hash values as unique as possible, try to define indexed TEXT
   columns with as few characters as possible, and make the column values
   unique, especially with regard to the first two characters.
 
   If you must define an indexed TEXT column with a width greater than
   four and you can't ensure the uniqueness of the first two characters,
   that column may be a good candidate for use with the IHASH function
   added to R:BASE in R:BASE 3.1A (the first release of Upgrade
   "EXPRESS").
 
 
   IHASH to the Rescue
   """""""""""""""""""
   If the first two characters are identical, as is often true with part
   numbers (AB-100, AB-101, AB-102, and so on), use the new IHASH
   function and a computed column to make your index more unique.
 
   It's important to understand that IHASH doesn't guarantee a unique
   hash value. IHASH may hash two different TEXT values to the same hash
   value (index). IHASH makes hash values more unique but doesn't
   guarantee complete uniqueness.
 
   If you have R:SCOPE (available for purchase by calling 800-248-2001),
   you can check to see how unique the index values (hash values) are.
   Choose Indexes under Check, then choose the table and column you want
   to check. R:SCOPE will tell you how many unique and duplicate hash
   values there are.
 
 
   How to Use IHASH
   """"""""""""""""
   IHASH is a conversion function, not a command. It's syntax is simple:
 
     (IHASH(arg,n))
 
   The TEXT argument (ARG) can be a TEXT column, a dotted variable, or a
   value. The width (N) tells R:BASE how many characters, starting with
   the first, are needed to establish uniqueness in the string. If N is
   zero, R:BASE uses the entire length of the string.
 
   But, you need to know more than just the syntax to use IHASH
   correctly. First, you need to decide whether or not to use it. For
   example, IHASH may not improve performance when used on a TEXT column
   that holds unique text names. But it will improve performance on a
   TEXT column that has many values that have identical first and second
   characters. It all depends on whether IHASH can make the hash values
   for the column more unique. You can use R:SCOPE to check it out.
 
 
   IHASH Step by Step
   """"""""""""""""""
   As an example, you can use IHASH to speed up indexed access and joins
   by making a key, indexed TEXT column named (PARTID) faster. Follow
   these steps:
 
 
   STEP 1
   ======
   Define an INTEGER computed column including the IHASH function:
 
     ALTER TABLE tblname ADD hashid +
       = (IHASH(partid,0)) INTEGER
 
   If PARTID exists in other tables, you may want to add HASHID to them
   too. For example, if PARTID links a PARTS table with an INVOICE table,
   use these commands to add HASHID:
 
     ALTER TABLE parts ADD hashid +
       = (IHASH(partid,0)) INTEGER
     ALTER TABLE invoice ADD hashid
 
 
   STEP 2
   ======
   Drop the index from PARTID and index the computed column HASHID:
 
     DROP INDEX partid IN tblname
     CREATE INDEX ON tblname hashid
 
   Index HASHID in all the tables where it appears.
 
 
   STEP 3
   ======
   Use the newly indexed computed column in WHERE clauses and multi-table
   SELECT commands, but be sure to continue to include the actual TEXT
   column too. For example, to find a specific part number, instead of
   using this WHERE clause:
 
     Command ... WHERE partid = .vpartid
 
   use this SET VAR and WHERE clause:
 
     SET VAR vhash = (IHASH(.vpartid,0))
     Command ... WHERE hashid = .vhash +
       AND partid = .vpartid
 
   You must use the dotted variable (.VHASH) instead of the using the
   expression directly in the WHERE clause, because R:BASE doesn't use
   indexes if the WHERE clause uses an expression.
 
   You must include "AND partid = .vpartid" to ensure that you get the
   right value. Remember, there's no guarantee that the IHASH value will
   be unique.
 
   If you use the computed IHASH column (HASHID) in a multi-table join,
   you must also remember to include both connections in the WHERE
   clause. For example, the following rule checks for a unique value:
 
     RULES 'Value must be unique.' +
       FOR tblname SUCCEEDS +
       WHERE partid IS NOT NULL +
       AND NOT EXISTS +
       (SELECT partid FROM tblname t2 +
       WHERE t2.hashid = tblname.hashid +
       AND t2.partid = tblname.partid)
 
   It uses a correlated sub-SELECT (explained in "Making Sub-SELECTs,
   Outer Joins, & Rules Faster"). Correlated sub-SELECTs force R:BASE to
   do an internal join using indexes.
 
   Here's another example of a multi-table join using HASHID:
 
     SELECT t1.partid, SUM(t2.price) +
       FROM parts t1, invoices t2 +
       WHERE t2.hashid = t1.hashid +
       AND t2.partid = t1.partid +
       GROUP BY t1.partid
 
 
   Speed Up IHASH
   """"""""""""""
   When you use IHASH, try not to use zero as the N width. Zero causes
   R:BASE to use the entire string. If you're sure, for example, that the
   first 10 characters in a TEXT 50 column are enough to establish
   uniqueness, use 10 rather than zero in the IHASH.
 
   WHERE Clauses & Indexes
   """""""""""""""""""""""
   A WHERE clause can be in a command, a rule, or a lookup expression. If
   a WHERE clause has only one condition, R:BASE uses the index on the
   column if the conditional operator following the indexed column is EQ,
   =, BETWEEN, or IS NULL.
 
   The BETWEEN operator won't use the index on a data type that must be
   hashed, so if the operator is BETWEEN, the data type of the indexed
   column must be DATE, TIME, INTEGER, REAL, NUMERIC with a defined
   precision of two or less, or TEXT with defined length of four or less.
 
   R:BASE won't use the index if the WHERE clause contains a wildcard
   character or an expression. If you want R:BASE to use the index,
   replace expressions with constants or dotted variables, and get rid of
   the wildcards.
 
   If a WHERE clause contains more than one condition and all conditions
   are combined with AND, the clause must have at least one indexed
   column that uses EQ, =, BETWEEN, or IS NULL if you want R:BASE to use
   indexes. Under these conditions, R:BASE chooses the condition that
   places the greatest restriction on the WHERE clause for the indexed
   search. Here are the three levels of restriction:
 
     o  = or EQ is most restrictive.
     o  IS NULL is next most restrictive.
     o  BETWEEN is least restrictive.
 
   If there are two conditions and both are at the same level of
   restriction, R:BASE uses the first one. For optimum performance, make
   the most restrictive indexed condition the first condition in the
   WHERE clause.
 
 
   ORDER BY Clauses & Indexes
   """"""""""""""""""""""""""
   Under the following conditions, R:BASE uses indexes to sort quickly:
 
     o  The ORDER BY clause contains a single, indexed column that is
        DATE, TIME, INTEGER, REAL, NUMERIC with a defined precision of
        two or less, or TEXT with a defined length of four or less.
     o  The command doesn't use a view.
     o  If the command also includes a WHERE clause, the same indexed
        column used in the ORDER BY clause is also used in the WHERE
        clause along with one of the index supporting operators (=, IS
        NULL, or BETWEEN).