820.TXT
=====================================================================
Optimizing Application Code
=====================================================================
PRODUCT: R:BASE VERSION: 4.5+ and above
=====================================================================
CATALOG: General Inforation AREA : Memory Management
=====================================================================
Because the most obvious or logical approach to a programming task is
not necessarily the fastest, optimizing application code is a
necessary skill. This article gives some specific examples of
different ways to accomplish the same tasks and shows the speed
differences between them. It also provides some general techniques for
improving the performance of your applications.
How can you tell what is the fastest way to accomplish a programming
task? The only way to tell for sure is to try each command or set of
commands, time it, and compare the times. Often a sequence of commands
will be fast on one database but slow on another because the structure
(tables and columns) and data of the databases are different. Speed
can vary from database to database and from machine to machine.
Times given below for different programming techniques are for
comparison purposes only. Actual times that you will see using the
same methods will vary on the basis of your hardware and your
database.
Timing Code
Use the following commands to find the execution time for a section of
application code:
SET TIME FORMAT HH:MM:SS.sss
SET VAR vstart = .#time
-- commands to time go here
SET VAR vend = .#time
SET VAR vdiff = (.vend - .vstart)
SET VAR vtime = (RTIME(0,0,0,.vdiff))
The result, stored in the variable vtime, is in an easy to read hours,
minutes, and seconds format.
When checking the performance of a SELECT or a PRINT command, set the
output to a temporary file and set LINES to 0. This tests the time to
select all the data for the command. For example:
SET LINES 0
SET TIME FORMAT HH:MM:SS.sss
SET VAR vstart = .#time
OUTPUT temp.dat
PRINT invoices
OUTPUT SCREEN
SET VAR vend = .#time
SET VAR vdiff = (.vend - .vstart)
SET VAR vtime = (RTIME(0,0,0,.vdiff))
SET LINES 20
To test how long it takes to retrieve a single row of data, use the
WHERE clause, WHERE LIMIT = 1. The LIMIT keyword can be combined with
other WHERE clause conditions to limit the number of rows retrieved.
For example:
SELECT * FROM transmaster WHERE transdate +
BETWEEN 1/1/95 AND 6/31/95 AND LIMIT=1
This command retrieves the first row that meets the WHERE clause
condition.
Use the Most Current Version of R:BASE Microrim is continually
optimizing R:BASE and each new release is faster than the previous
one. In addition to performance enhancements included in a new
release, there are feature enhancements that enable you to modify
and reduce your program code for further performance increases.
An application written in R:BASE 3.1, for example, used DECLARE
CURSOR to update a column in one table with values from another
table. This process took 0:00:07.14.
DECLARE C1 CURSOR FOR SELECT invid,sum(tprice) FROM transx +
GROUP BY invid
OPEN c1
FETCH c1 INTO vinvid ind1,vprice ind2
WHILE sqlcode <> 100 THEN
UPDATE invoice SET invtotal = .vprice WHERE invid=.vinvid
FETCH c1 INTO vinvid ind1,vprice ind2
ENDW
R:BASE 4.0 added the multi-table option to the UPDATE command.
Instead of a DECLARE CURSOR, you now can do the update using a view
and one UPDATE command. By keeping abreast of feature enhancements in
new versions and updating application code, the process time was
reduced almost in half to 0:00:04.29.
CREATE VIEW v_trans (invid, amount) AS SELECT invid, SUM(tprice) +
FROM transx GROUP BY invid
UPDATE invoice SET invtotal = amount FROM invoice ,v_trans T2 +
WHERE invoice .invid = T2.invid
Think Globally Instead of Row-by-Row
One of the best ways to achieve significant speed improvements is to
use a global command rather than a row-by- row command. Instead of
doing row-by-row processing using a DECLARE CURSOR, for example, try
using a single UPDATE or INSERT command, which takes advantage of the
SQL select capabilities of R:BASE.
For example, suppose you want to create a table to hold information
about how many of each product a customer has purchased. The most
logical approach might be to set up a cursor to go through the detail
table and count the rows for each customer, product combination and
then insert that data into the new table. On a table with 3500 rows,
processing all the rows took 1:15:00 (one hour, 15 minutes). By
replacing the DECLARE CURSOR loop with a single insert command, the
time required to process all the rows was cut to 0:09:00 (nine
minutes).
Compare the two pieces of code:
1. The DECLARE CURSOR code. Commands are repeated for each customer,
product combination.
DECLARE c1 CURSOR FOR SELECT cust#, partno +
FROM cproddet GROUP BY cust# ,partno
OPEN c1
FETCH c1 INTO vcust ind1, vpartno ind2
WHILE SQLCODE <> 100 THEN
SELECT COUNT(*) INTO vcount FROM cproddet +
WHERE cust# = .vcust AND partno = .vpartno
INSERT INTO custprod (cust#, partno, copies) +
VALUES (.vcust, .vpartno, .vcount)
FETCH c1 INTO vcust ind1, vpartno ind2
ENDW
DROP CURSOR c1
2. The SQL code. A single command is used instead of a series of
repeated commands. One command is almost always faster than repeating
a group of commands.
INSERT INTO custprod (cust#, partno, copies) +
SELECT cust#, partno, count(*) +
FROM cproddet GROUP BY cust# ,partno
Reduce the Number of Expressions in Reports
Because all report expressions are processed for each row of data read
from the driving table or view of the report, reducing the number of
expressions increases the printing speed of a report. An alternative
to reducing the number of expressions is to use the section evaluation
feature added in R:BASE 4.5 Plus!. This feature allows you to indicate
the section, F1, for example, at which an expression is evaluated. The
expression is then not evaluated at every row, but only when that
section is processed. Either method can reduce printing time almost by
half.
If you have many lookups in a report, basing the report on a view
instead of on a table will usually be faster. A view combines all the
data just once; a lookup needs to find the data for each row in the
driving table, even if the lookup is printing only on a header line.
A sample report, based on a table with 8000 rows and containing 14
lookup expressions (24 total expressions), took 0:20:45.54 to print.
As an alternative, create a four-table view and base the report on the
view. Remove the lookup expressions from the report, leaving only 10
expressions to be processed each row. Printing the report by using a
view instead of lookup expressions takes only 0:15:16.98, a definite
performance improvement.
Using the report section evaluation feature introduced in 4.5 Plus! is
even better. The original report with 14 lookups prints in only
0:12:57.41 when the lookup expressions are specified to evaluate only
at the section (header or footer) where they are located. The report
based on the view does not have a significant performance increase
using the section evaluation as most of the expressions are summing
data and must be evaluated at the detail section. The performance
increase you see depends on your report design and data.
R:BASE does not require a global variable to be equated to a report
variable; you can locate global variables on the report or reference
them directly in an expression. For example, you don't need to
define RPTDATE = . #DATE and then locate RPTDATE; just locate #DATE
directly. Set constant variables outside the report. They are
evaluated only once instead of for every row.
Abbreviate Commands to Four Characters
R:BASE parses command lines into 4-byte tokens_the fewer tokens to
read, the faster the command executes. Abbreviating commands to
three characters does not provide any additional performance.
Combine Commands Whenever Possible
Many R:BASE commands allow access to more than one column or variable
and don't require using separate commands for each. This reduces the
number of times R:BASE must parse and read a command. The fewer
commands to read, the faster R:BASE executes. The most frequently used
commands that fit into this category are SET VAR, UPDATE, SELECT,
COMPUTE, and WRITE.
For example, assign data types to variables and set the variables to
values in one command, like this:
SET VAR v1 TEXT = 'abcd', v2 INTEGER = 100, +
v3 DATE = .#date
Make Cursors Faster
Sometimes a task can be accomplished only by stepping through each row
in a table. If you need to use a DECLARE CURSOR routine, make it as
fast as possible by using the following tips:
Include all columns and computations in the DECLARE CURSOR statement
so you can FETCH the values once instead of doing SET VAR commands
repeatedly in the WHILE loop. Take advantage of SQL SELECT when
doing your DECLARE CURSOR. This can reduce the number of times the
commands in the WHILE loop must be repeated, thus improving
performance. Compare the following two code excerpts:
1.This example using the SUM function directly in the DECLARE CURSOR
statement took about three minutes on an 8,000-row transaction table:
DECLARE C1 CURSOR FOR SELECT invid, SUM(tprice) +
FROM transx GROUP BY invid
OPEN c1
FETCH c1 INTO vinvid ind1, vprice ind2
WHILE sqlcode <> 100 THEN
UPDATE invoice SET invtotal = .vprice +
WHERE invid = .vinvid
FETCH c1 INTO vinvid ind1, vprice ind2
ENDW
2.By contrast, doing the SUM within the WHILE loop for each row took
about 12 minutes, or four times as long.
SET VAR vtotprice CURR = 0
SET VAR voldinv = invid IN transx WHERE LIMIT=1
DECLARE C1 CURSOR FOR SELECT invid, tprice +
FROM transx ORDER BY invid
OPEN c1
FETCH c1 INTO vinvid ind1, vprice ind2
WHILE sqlcode <> 100 THEN
SET VAR vtotprice = (.vtotprice + .vprice)
IF vinvid <> voldinv THEN
UPDATE invoice SET invtotal = .vtotprice
WHERE invid = .voldinv
SET VAR vtotprice = 0
ENDIF
SET VAR voldinv = .vinvid
FETCH c1 INTO vinvid ind1, vprice ind2
ENDW
Fetch the index columns into variables and then use WHERE
colname=.varname instead of WHERE CURRENT OF cursor. Using an
indexed where clause is slightly faster.
When nesting cursors, do all the DECLAREs at the top of the file.
Nest the OPEN and FETCH commands, not the DECLARE CURSOR.
When nesting cursors, use the RESET option on the OPEN cursor
command. R:BASE does not need to reprocess the entire query when
you use RESET, instead, R:BASE just re-evaluates the WHERE clause.
Try Both Sub-Selects and Multi-Table Selects
In R:BASE 4.5 and higher, a multi-table select uses indexes to join
the tables, a correlated sub-select uses indexes to restrict the list
of values to be compared within the sub-select, and a plain
sub-select uses a temporary index created by R:BASE. Because the
index is already created, a multi-table select or a correlated
sub-select is slightly faster than a plain sub-select. The amount of
data selected, number of matches, and number of unique matches all
affect the performance.
To illustrate, compare the results obtained by using the different
techniques on two tables, each with 1,000 unique rows (each row in
table1 has only one match in table2).
1.A multi-table select, shown below, took 0:00:01.54.
SELECT * FROM table1,table2 +
WHERE table1.linkcol=table2.linkcol
2.A correlated sub-select, which looked at every row in table1 but
needed to look at only one row in table2 for each row in table1, took
0:00:01.37. The code is shown below:
SELECT collist FROM table1 WHERE linkcol IN +
(SELECT linkcol FROM table2 +
WHERE table2.linkcol=table1.linkcol)
3.A plain sub-select, using a temporary index built by R:BASE when
the command executed took slightly longer, 0:00:2.31. This example
code, is shown below:
SELECT collist FROM table1 WHERE linkcol IN +
(SELECT linkcol FROM table2)
SELECT is a command in which your data can affect the performance.
The results you see and the method you find that works the best
depends on the number of rows and distribution of data in the tables
you are combining. Also, the different commands retrieve different
columns of data for display.
Use the MICRORIM variables
In R:BASE 4.5, optimization variables were added to allow application
developers more control over the R:BASE environment. The ones most
commonly used allow you to increase the default buffers for reading
in data and indexes. The variables MICRORIM_F2MAXBLK and
MICRORIM_TMPMAXBLK increase the number of buffers for reading in
data. The variables MICRORIM_F3MAXBLK and MICRORIM_BTMAXBLK increase
the number of buffers for reading in indexes.
To use the variables, set them to the desired number of buffers before
connecting to a database. Usually you set all four variables. For
example:
SET VAR MICRORIM_F2MAXBLK = 64, +
MICRORIM_TMPMAXBLK = 64, +
MICRORIM_F3MAXBLK = 256, +
MICRORIM_F2MAXBLK = 256
CONNECT concomp
In the above example of joining two 1,000 row tables, the performance
of the plain sub-select was significantly improved by setting the
MICRORIM variables. The multi-table select and the correlated
sub-select are already fairly well optimized with their use of
indexes and the performance improvement is not as great. Experiment
with using these variables to see if they will increase performance of
your application.
For more information about the MICRORIM variables refer to the R:BASE
4.5 Plus! Startup & New Features Guide, Inside R:BASE, or R:BASE 5.1
on-line Help (Reference Topics, Optimizing R:BASE 4.5 and Higher)
Databases, Cache Memory Buffers).
Practice Smart Indexing
In a WHERE clause, R:BASE uses only one index (except when joining
tables). If more than one indexed column is referenced in a WHERE
clause, R:BASE looks for what it considers the "best" index_the one
that places the greatest restriction on the number of rows returned.
An index on a column that contains unique values is always better
than an index on a column that can have duplicate values. R:BASE also
looks to see what operator is being used. R:BASE first looks for an
indexed column using =, then checks for IS NULL, and finally looks at
BETWEEN.
The following example uses the index on indexcol2 because it uses =
and R:BASE assumes that comparison restricts the query to fewer rows
than the BETWEEN:
WHERE indexcol1 BETWEEN .var1 AND .var2 +
AND indexcol2 = .var3
The next example uses the index on indexcol2 because both conditions
use = and indexcol2 is an index with unique data and indexcol1 is an
index on a column with duplicate data. The unique index is the "best"
index.
WHERE indexcol1 =.var1 AND indexcol2 = .var2
This example won't use indexes because the conditions are joined with
OR. R:BASE does not use indexes to retrieve data if conditions are
joined with OR, or if the indexed column is being compared to an
expression.
WHERE indexcol1 =.var1 OR indexcol2 = .var2
You can force R:BASE to use a particular index by placing the
comparison value in parentheses, making it an expression.
WHERE indexcol1 =.var1 AND indexcol2 = (.var2)
By knowing which indexed column R:BASE will use in a WHERE clause, you
can structure your conditions so that R:BASE uses the most unique
(and thus most helpful) index. Knowing your data helps you to modify
commands and gain speed.
Use Multi-Column Indexes
If you routinely use the same set of columns in the WHERE clause,
consider defining them as a multi-column index rather than as separate
indexes. A multi-column index is faster when a set of columns is used
in a sort clause or in a WHERE clause. While the columns in a
multi-column index can be used separately, the index is most
effective when all the columns in the index are used. For example, if
you always retrieve records using a job# and a po#, define a
multi-column index for the two columns, rather than two, single-column
indexes.
Drop Indexes to Load Data
If you are loading or inserting many rows of data at one time, it can
be faster to drop the indexes, load the data, and then rebuild the
indexes. It is much faster for R:BASE to build the indexes in one
chunk rather than having them updated with each row as it is loaded.
Be Creative
R:BASE almost always offers two or more ways of doing something. Look
for an alternative method for accomplishing a task_it just might be
considerably faster.
Review Your Code
Did you write that piece of code three months ago, six months ago, or
perhaps longer ago still? As you learn more about R:BASE, about your
data, and about how people are using the database and your
application, you'll easily recognize areas in your code that can be
modified for performance improvements. You'll also want to incorporate
new techniques and new features from the current version of R:BASE.
Have someone else review your code. Someone else might see areas for
improvement and suggest other techniques that you miss because you are
so familiar with your work. It is true that two heads are better than
one.
Other Sources
These are just some of the techniques you can use to optimize your
code to achieve the maximum speed potential of R:BASE. Some of these
techniques provide dramatic improvements in speed; others provide
smaller and less noticeable speed improvements. Taken together, these
techniques can significantly increase the overall speed of your
application. Remember that the times given here are relative_the
actual times you see vary on the basis of your database (structure
and data) and your hardware.
You can find more information about optimization techniques in Inside
R:BASE (available from the Microrim Sales Department, 1-800-628-6990),
in various issues of the R:BASE Exchange, in the R:BASE Reference
Manual, and in on-line Help.