783.TXT
=====================================================================
OPTIMIZING CURSORS
=====================================================================
PRODUCT: R:BASE VERSION: 4.5+ or Higher
=====================================================================
CATALOG: PROGRAMMING IN R:BASE AREA : Logic & Data Manipulation
=====================================================================
DECLARE CURSOR is not always the fastest way to accomplish a task,
particularly an UPDATE or an INSERT. If you can replace your DECLARE
CURSOR routine with a single SQL command, you will dramatically
improve performance. However, some tasks require a DECLARE CURSOR.
Let the cursor do the work
To improve the performance of a DECLARE CURSOR routine, do as much
work in the DECLARE CURSOR as possible. This is the single most
important factor in improving cursor performance. Do whatever work
can be done in the SELECT command part of the DECLARE CURSOR - select
as many columns of data as possible and also do calculations there if
you can. The DECLARE CURSOR does the operation only once; inside the
WHILE loop, the command is repeated for each row that is stepped
through.
To do actions for unique rows only, use SELECT DISTINCT in the cursor
definition instead of adding code to your WHILE loop to test the row
values to see if they are the same or different. Use the SELECT
functions to sum, average, count and so on in the cursor definition
instead of for each row in the WHILE loop. Select as many columns as
possible in the DECLARE CURSOR rather than retrieve the data each row
in the WHILE loop.
The fewer commands repeated in the WHILE loop, the faster your DECLARE
CURSOR will run. Remember that each command in the WHILE loop is
repeated for each row retrieved by the DECLARE CURSOR. Use optimized
variables in the WHILE loop -initialize each variable outside the
WHILE loop, and do not change the data type of variables in the loop.
Following are two examples showing progressive changes made to a
DECLARE CURSOR routine to improve performance.
Example 1 - posting. The task is to sum the extended price column in
the transaction detail, Transdetail, table for each transaction ID,
then update the transaction header, Transmaster, table with the sum.
An initial approach is to declare a cursor on the header table, then
step through all matching rows in the detail table. After all the
matching detail rows have been processed, the header table is
updated.
*(POST1.CMD -- the worst case)
-- nested declare cursors
-- strictly linear programming
SET VAR vtotal CURR
DECLARE c1 CURSOR FOR SELECT transid, netamount +
FROM transmaster
OPEN c1
FETCH c1 INTO vtransid vind1,vnetamount vind2
WHILE SQLCODE <> 100 THEN
DECLARE c2 CURSOR FOR SELECT extprice +
FROM transdetail WHERE transid = .vtransid
OPEN c2
FETCH c2 INTO vprice vind3
WHILE SQLCODE <> 100 THEN
SET VAR vtotal = (.vtotal + .vprice)
FETCH c2 INTO vprice vind3
ENDWHILE
DROP CURSOR c2
UPDATE transmaster SET netamount = .vtotal +
WHERE CURRENT OF c1
SET VAR vtotal = NULL
FETCH c1 INTO vtransid vind1,vnetamount vind2
ENDWHILE
DROP CURSOR c1
We can speed up this code by following the recommended structure for
nested cursors. If we move the second DECLARE CURSOR out of the WHILE
loop and reset the cursor instead of dropping it, this command file
will execute faster. However, the best way to improve this code is by
removing the second DECLARE CURSOR altogether. We don't need to step
through all the rows in the detail table - we can compute the sum
with a single SELECT command.
*(POST2.CMD - a little bit better)
-- use the SELECT or COMPUTE command
-- to calculate the sum instead of a nested cursor
SET VAR vprice CURR = NULL
DECLARE c1 CURSOR FOR SELECT transid, netamount +
FROM transmaster
OPEN c1
FETCH c1 INTO vtransid vind1,vamount vind2
WHILE SQLCODE <> 100 THEN
SELECT SUM(extprice) INTO vprice +
FROM transdetail WHERE transid = .vtransid
-- if no matching rows in the Transdetail table,
-- vprice is null
IF vprice IS NOT NULL THEN
UPDATE transmaster SET netamount = .vprice +
WHERE CURRENT OF c1
ENDIF
SET VAR vprice = NULL
FETCH c1 INTO vtransid vind1,vamount vind2
ENDWHILE
DROP CURSOR c1
This simple change reduced the number of commands in the program,
which in turn improved performance. All the commands inside the
WHILE loop still need to be executed for as many rows as are in the
Transmaster table, however. The Transmaster table has fewer rows
than the Transdetail table, so a valid assumption is to place the
cursor on the Transmaster table to repeat the WHILE loop the fewest
times.
However, if we place the cursor on the detail table instead of on the
header table, the sum can be calculated directly in the DECLARE
CURSOR. Because the command is grouped by the transaction ID, the same
number of rows is retrieved by the cursor. The only commands to repeat
in the WHILE loop are the UPDATE and the FETCH to get the next row.
At first this might seem backwards, but computing the sum in the
DECLARE CURSOR is much faster.
*(POST3.CMD - better yet)
-- declare the cursor on the detail table and
-- do the sum directly in the cursor definition
DECLARE c1 CURSOR FOR SELECT transid, SUM(extprice) +
FROM transdetail GROUP BY transid
OPEN c1
FETCH c1 INTO vtransid vind1,vprice vind2
WHILE SQLCODE <> 100 THEN
-- this is a non-updatable cursor so an explicit
-- WHERE clause is used
UPDATE transmaster SET netamount = .vprice +
WHERE transid = .vtransid
FETCH c1 INTO vtransid vind1,vprice vind2
ENDWHILE
DROP CURSOR c1
The number of commands has been reduced by over half from the first
program, and performance by more than that. In R:BASE 4.0, 4.5 and 4.5
Plus! the multi-table update command is actually the fastest way to
accomplish this task.
*(POST4.CMD - do a multi-table update if you can)
-- multi table update command, a view is used
-- to first calculate the sum and create a
-- one-one relationship
DROP VIEW v_trans
CREATE VIEW v_trans (transid, amount) AS +
SELECT transid, SUM(extprice) +
FROM transdetail GROUP BY transid
UPDATE transmaster SET netamount = amount +
FROM transmaster ,v_trans t2 +
WHERE transmaster.transid = t2.transid
Example 2 - a quick report. The task here is to create a quick report
of companies from the Customer table and their corresponding contact
names from the Contact table. Using nested cursors makes printing the
company information once followed by the many rows of contact
information easier.
*(CUSTREP1.CMD - the worst case)
-- nested cursors are used with the declare for
-- the second cursor inside the while loop of
-- the first cursor. Also, the data is retrieved
-- with a SELECT command instead of in the
-- cursor definition
-- Dropping a cursor before you declare it is a
-- technique used to guarantee that the cursor does
-- not exist in memory. The DROP CURSOR normally
-- returns an error message, that is ok.
DROP CURSOR c1
DROP CURSOR c2
-- Only the unique row identifier is specified in
-- the cursor definition
DECLARE c1 CURSOR FOR SELECT custid FROM customer +
ORDER BY custid
OPEN c1
FETCH c1 INTO vcustid ind1
WHILE SQLCODE <> 100 THEN
-- Retrieve and display the rest of the
-- data for a customer
SELECT company, custaddress, custcity, +
custstate, custzip, custphone INTO +
vcompany vi1, vaddress vi2, vcity vi3, +
vstate vi4, vzipcode vi5, vphone vi6 +
FROM customer WHERE custid = .vcustid
SET VAR vcsz = (.vcity + ',' & .vstate & .vzipcode)
WRITE .vcustid, .vcompany
WRITE .vaddress
WRITE .vcsz
-- Declare a cursor to identify matching contact rows
DECLARE c2 CURSOR FOR SELECT contfname, contlname +
FROM contact WHERE custid = .vcustid
OPEN c2
FETCH c2 INTO vfname i1, vlname i2
WHILE SQLCODE <> 100 THEN
SET VAR vfullname = (.vfname & .vlname)
WRITE .vfullname
FETCH c2 INTO vfname i1, vlname i2
ENDWHILE
DROP CURSOR c2
FETCH c1 INTO vcustid ind1
ENDWHILE
DROP CURSOR c1
The next code segment shows the recommended structure for nested
cursors. The second DECLARE CURSOR is moved to the top of the program,
and the second cursor is opened and closed, not declared and dropped.
Just this simple change improves performance.
*(CUSTREP2.CMD - move cursor out of WHILE loop)
DROP CURSOR c1
DROP CURSOR c2
SET VAR vcustid INTEGER
DECLARE c1 CURSOR FOR SELECT custid +
FROM customer ORDER BY custid
DECLARE c2 CURSOR FOR SELECT contfname, contlname +
FROM contact WHERE custid = .vcustid
-- Get the first row of data for a customer
OPEN c1
FETCH c1 INTO vcustid ind1
WHILE SQLCODE <> 100 THEN
-- Retrieve and display the rest of the
-- data for a customer
SELECT company, custaddress, custcity, +
custstate, custzip, custphone INTO +
vcompany vi1, vaddress vi2, vcity vi3, +
vstate vi4, vzipcode vi5, vphone vi6 +
FROM customer WHERE custid = .vcustid
SET VAR vcsz = (.vcity + ',' & .vstate & .vzipcode)
WRITE .vcustid, .vcompany
WRITE .vaddress
WRITE .vcsz
-- Open cursor c2, retrieve and display
-- the matching contact data
OPEN c2
FETCH c2 INTO vfname i1, vlname i2
WHILE SQLCODE <> 100 THEN
SET VAR vfullname = (.vfname & .vlname)
WRITE .vfullname
FETCH c2 INTO vfname i1, vlname i2
ENDWHILE
-- Close cursor c2 and get the next row of
-- customer data
CLOSE c2
FETCH c1 INTO vcustid ind1
ENDWHILE
DROP CURSOR c1
DROP CURSOR c2
Moving the data retrieval to the DECLARE CURSOR command
instead of using a separate SELECT command again improves
performance.
*(CUSTREP3.CMD
retrieve data through DECLARE CURSOR)
DROP CURSOR c1
DROP CURSOR c2
-- retrieve all the data through the DECLARE CURSOR
-- command instead of SELECT
SET VAR vcustid INTEGER
DECLARE c1 CURSOR FOR SELECT custid, company, +
custaddress, custcity ,custstate, custzip, +
custphone FROM customer ORDER BY custid
DECLARE c2 CURSOR FOR SELECT contfname, contlname +
FROM contact WHERE custid = .vcustid
OPEN c1
-- Get the first row of customer data
FETCH c1 INTO vcustid ind1, vcompany ind2,+
vaddress ind3, vcity ind4, vstate ind5, +
vzip ind6, vphone ind7
WHILE SQLCODE <> 100 THEN
-- Display the customer data and open cursor c2 to
-- retrieve the matching contact data
SET VAR vcsz = (.vcity + ',' & .vstate & .vzipcode)
WRITE .vcustid, .vcompany
WRITE .vaddress
WRITE .vcsz
OPEN c2
FETCH c2 INTO vfname i1, vlname i2
WHILE SQLCODE <> 100 THEN
SET VAR vfullname = (.vfname & .vlname)
WRITE .vfullname
FETCH c2 INTO vfname i1, vlname i2
ENDWHILE
-- Close cursor c2 and get the next row of
-- customer data
CLOSE c2
FETCH c1 INTO vcustid ind1, vcompany ind2,+
vaddress ind3, vcity ind4, vstate ind5, +
vzip ind6, vphone ind7
ENDWHILE
DROP CURSOR c1
DROP CURSOR c2
Another small change also improves performance - instead of using
SET VAR commands within the WHILE loops to concatenate city, state
and zipcode together, and first and last name together, the
concatenation operation can be done in the DECLARE CURSOR command.
The concatenation in the DECLARE CURSOR reduces the number of
commands that are repeated for each row and moves the work to the
DECLARE CURSOR command.
*(CUSTREP4.CMD add the concatenation to the DECLARE CURSOR)
DROP CURSOR c1
DROP CURSOR c2
SET VAR vcustid INTEGER
-- Replace SET VAR commands with expressions in
-- the DECLARE CURSOR
DECLARE c1 CURSOR FOR SELECT custid, company, +
custaddress, (custcity + ',' & custstate & custzip), +
custphone FROM customer ORDER BY custid
DECLARE c2 CURSOR FOR SELECT (contfname & contlname) +
FROM contact WHERE custid = .vcustid
OPEN c1
-- Retrieve and display the customer data
FETCH c1 INTO vcustid ind1, vcompany ind2, +
vaddress ind3, vcsz ind4, vphone ind5
WHILE SQLCODE <> 100 THEN
WRITE .vcustid, .vcompany
WRITE .vaddress
WRITE .vcsz
-- Retrieve and display the contact data
OPEN c2
FETCH c2 INTO vfullname i1
WHILE SQLCODE <> 100 THEN
WRITE .vfullname
FETCH c2 INTO vfullname i1
ENDWHILE
-- Close cursor c2 and get the next row of
-- customer data
CLOSE c2
FETCH c1 INTO vcustid ind1, vcompany ind2, +
vaddress ind3, vcsz ind4, vphone ind5
ENDWHILE
DROP CURSOR c1
DROP CURSOR c2
The final change to improve performance is to use the RESET option
on the OPEN c2 command instead of CLOSE c2. Overall, we have improved
performance on this small set of rows by a full second. On a larger
data set you can expect to see a greater performance improvement.
*(CUSTREP5.CMD
reset cursor 2 instead of close and open)
DROP CURSOR c1
DROP CURSOR c2
SET VAR vcustid INTEGER
DECLARE c1 CURSOR FOR SELECT custid, company, +
custaddress, (custcity + ',' & custstate & custzip), +
custphone FROM customer ORDER BY custid
DECLARE c2 CURSOR FOR SELECT (contfname & contlname) +
FROM contact WHERE custid = .vcustid
OPEN c1
FETCH c1 INTO vcustid ind1, vcompany ind2, +
vaddress ind3, vcsz ind4, vphone ind5
WHILE SQLCODE <> 100 THEN
WRITE .vcustid, .vcompany
WRITE .vaddress
WRITE .vcsz
-- Open cursor c2 with the RESET option,
-- no CLOSE command is needed
OPEN c2 RESET
FETCH c2 INTO vfullname i1
WHILE SQLCODE <> 100 THEN
WRITE .vfullname
FETCH c2 INTO vfullname i1
ENDWHILE
FETCH c1 INTO vcustid ind1, vcompany ind2, +
vaddress ind3, vcsz ind4, vphone ind5
ENDWHILE
DROP CURSOR c1
DROP CURSOR c2
As you can see from the above examples, maximizing the work of the
DECLARE CURSOR command provides significant performance improvements.
The changes were small and they didn't involve a lot of time or
programming effort, but these changes did result in definite
performance benefits.
Customize the environment
In addition to optimizing your programming code, you can improve
cursor performance by optimizing the environment. Obviously, code
runs faster in the 386/486 version of R:BASE than in the 286 version,
and a 486 computer runs programs faster than a 386. Outside of
upgrading your hardware, however, certain R:BASE environment settings
can be used to improve performance. These settings generally improve
overall performance as well as cursor performance.
In single-user mode, SET CLEAR OFF, which is really useful when
updating data because disk writes are buffered. This setting is not
available in multi-user mode.
In either single- or multi-user mode, use the Microrim variables; a
cursor is the same as any other query. MICRORIM_F2MAXBLK and
MICRORIM_TMPMAXBLK affect actual data retrieval; MICRORIM_F3MAXBLK
and MICRORIM_BTMAXBLK are used with index retrieval. A scrolling
cursor uses a temporary internal table and can be sped up by using
MICRORIM_TMPMAXBLK. Refer to Chapter 6 in the Startup and New Features
Guide for more information about using these variables. Note that you
may not see any performance improvement unless you are working with a
large amount of data. If you never retrieve more data than what fits
in the default buffers, you won't see a visible performance increase.
Look at the EXPLAIN.DAT file generated by the MICRORIM_EXPLAIN
variable to see the cursor query optimization. The OPEN command
actually executes the query. Each query executed in your program
puts an entry in EXPLAIN.DAT; for example, SELECT or UPDATE commands
in the WHILE loop are reflected. You might also see a query reference
to the SYS_RULES table, which is used for multi-user locking control.
By using EXPLAIN.DAT, you can easily see why using the RESET option on
OPEN is faster. Normally, each OPEN redoes the query. When RESET is
used, the query is only optimized once.
The EXPLAIN.DAT entries for the last two command files from Example
2 earlier in this article are shown here. The first entry shows
nested cursors using the OPEN and CLOSE commands. The second entry
shows using the RESET option on OPEN.
Cursor c1 on the Customer table is accessed sequentially, all rows in
the table are retrieved, and no WHERE clause is used. If an indexed
WHERE clause was used, EXPLAIN.DAT would show the index used. The
second cursor on the Contact table does use an indexed WHERE clause
to define the query. This query is redone each time the cursor is
opened with a different vcustid value.
D:\XCHNG\JULAUG\CODE\custrep4.cmd 417
05/20/94 14:56:37.580
SelectCost=1 (OptimizationTime=0ms)
customer Sequential
D:\XCHNG\JULAUG\CODE\custrep4.cmd 822
05/20/94 14:56:37.970
SelectCost=0.06962963 (OptimizationTime=0ms)
contact (ColumnName=custid,Type=I) Random Dup=1.666667
Adj=1
D:\XCHNG\JULAUG\CODE\custrep4.cmd 822
05/20/94 14:56:38.240
SelectCost=0.06962963 (OptimizationTime=50ms)
contact (ColumnName=custid,Type=I) Random Dup=1.666667
Adj=1
D:\XCHNG\JULAUG\CODE\custrep4.cmd 822
05/20/94 14:56:38.460
SelectCost=0.06962963 (OptimizationTime=0ms)
contact (ColumnName=custid,Type=I) Random Dup=1.666667
Adj=1
............
D:\XCHNG\JULAUG\CODE\custrep4.cmd 822
05/20/94 14:56:40.110
SelectCost=0.06962963 (OptimizationTime=0ms)
contact (ColumnName=custid,Type=I) Random Dup=1.666667
Adj=1
D:\XCHNG\JULAUG\CODE\custrep4.cmd 836
05/20/94 14:56:40.330
SelectCost=1 (OptimizationTime=0ms)
SYS_RULES Sequential
The following EXPLAIN.DAT entry uses OPEN c2 RESET. The same query
is used each time cursor c2 is accessed. The query does not need to
be reoptimized each time the cursor is opened.
D:\XCHNG\JULAUG\CODE\custrep5.cmd 1441
05/20/94 14:56:40.770
SelectCost=1 (OptimizationTime=60ms)
customer Sequential
D:\XCHNG\JULAUG\CODE\custrep5.cmd 1839
05/20/94 14:56:41.100
SelectCost=0.06962963 (OptimizationTime=60ms)
contact (ColumnName=custid,Type=I) Random Dup=1.666667
Adj=1
D:\XCHNG\JULAUG\CODE\custrep5.cmd 1867
05/20/94 14:56:42.530
SelectCost=1 (OptimizationTime=0ms)
SYS_RULES Sequential