783.TXT
     =====================================================================
     OPTIMIZING CURSORS   
     =====================================================================
     PRODUCT:  R:BASE                 VERSION:  4.5+ or Higher
     =====================================================================
     CATALOG:  PROGRAMMING IN R:BASE  AREA   :  Logic & Data Manipulation
     =====================================================================
 
 
     DECLARE CURSOR is not always the fastest way to accomplish a task, 
     particularly an UPDATE or an INSERT. If you can replace your DECLARE 
     CURSOR routine with a single SQL command, you will dramatically 
     improve performance. However, some tasks require a DECLARE CURSOR.
 
     Let the cursor do the work
     To improve the performance of a DECLARE CURSOR routine, do as much 
     work in the DECLARE CURSOR as possible. This is the single most 
     important factor in improving cursor performance. Do whatever work 
     can be done in the SELECT command part of the DECLARE CURSOR - select 
     as many columns of data as possible and also do calculations there if 
     you can. The DECLARE CURSOR does the operation only once; inside the 
     WHILE loop, the command is repeated for each row that is stepped 
     through.
 
     To do actions for unique rows only, use SELECT DISTINCT in the cursor 
     definition instead of adding code to your WHILE loop to test the row 
     values to see if they are the same or different. Use the SELECT 
     functions to sum, average, count and so on in the cursor definition 
     instead of for each row in the WHILE loop. Select as many columns as 
     possible in the DECLARE CURSOR rather than retrieve the data each row 
     in the WHILE loop.
 
     The fewer commands repeated in the WHILE loop, the faster your DECLARE 
     CURSOR will run. Remember that each command in the WHILE loop is 
     repeated for each row retrieved by the DECLARE CURSOR. Use optimized 
     variables in the WHILE loop -initialize each variable outside the 
     WHILE loop, and do not change the data type of variables in the loop.
 
     Following are two examples showing progressive changes made to a 
     DECLARE CURSOR routine to improve performance.
 
     Example 1 - posting. The task is to sum the extended price column in 
     the transaction detail, Transdetail, table for each transaction ID, 
     then update the transaction header, Transmaster, table with the sum. 
     An initial approach is to declare a cursor on the header table, then 
     step through all matching rows in the detail table. After all the 
     matching detail rows have been processed, the header table is
     updated.
 
     *(POST1.CMD -- the worst case)
 
     -- nested declare cursors
     -- strictly linear programming
     SET VAR vtotal CURR
     DECLARE c1 CURSOR FOR SELECT transid, netamount +
       FROM transmaster
     OPEN c1
     FETCH c1 INTO vtransid vind1,vnetamount vind2
     WHILE SQLCODE <> 100 THEN
       DECLARE c2 CURSOR FOR SELECT extprice +
        FROM transdetail WHERE transid = .vtransid
       OPEN c2
       FETCH c2 INTO vprice vind3
       WHILE SQLCODE <> 100 THEN
         SET VAR vtotal = (.vtotal + .vprice)
         FETCH c2 INTO vprice vind3
       ENDWHILE
       DROP CURSOR c2
       UPDATE transmaster SET netamount = .vtotal +
         WHERE CURRENT OF c1
       SET VAR vtotal = NULL
       FETCH c1 INTO vtransid vind1,vnetamount vind2
     ENDWHILE
     DROP CURSOR c1
 
     We can speed up this code by following the recommended structure for 
     nested cursors. If we move the second DECLARE CURSOR out of the WHILE 
     loop and reset the cursor instead of dropping it, this command file 
     will execute faster. However, the best way to improve this code is by 
     removing the second DECLARE CURSOR altogether. We don't need to step 
     through all the rows in the detail table - we can compute the sum 
     with a single SELECT command.
 
     *(POST2.CMD - a little bit better)
 
     -- use the SELECT or COMPUTE command
     -- to calculate the sum instead of a nested cursor
     SET VAR vprice CURR = NULL
     DECLARE c1 CURSOR FOR SELECT transid, netamount +
       FROM transmaster
     OPEN c1
     FETCH c1 INTO vtransid vind1,vamount vind2
     WHILE SQLCODE <> 100 THEN
       SELECT SUM(extprice) INTO vprice +
         FROM transdetail WHERE transid = .vtransid
     -- if no matching rows in the Transdetail table,
     -- vprice is null
       IF vprice IS NOT NULL THEN
         UPDATE transmaster SET netamount = .vprice +
           WHERE CURRENT OF c1
       ENDIF
       SET VAR vprice = NULL
       FETCH c1 INTO vtransid vind1,vamount vind2
     ENDWHILE
     DROP CURSOR c1
 
     This simple change reduced the number of commands in the program, 
     which in turn improved performance. All the commands inside the 
     WHILE loop still need to be executed for as many rows as are in the 
     Transmaster table, however. The Transmaster table has fewer rows 
     than the Transdetail table, so a valid assumption is to place the 
     cursor on the Transmaster table to repeat the WHILE loop the fewest 
     times.
 
     However, if we place the cursor on the detail table instead of on the 
     header table, the sum can be calculated directly in the DECLARE 
     CURSOR. Because the command is grouped by the transaction ID, the same 
     number of rows is retrieved by the cursor. The only commands to repeat 
     in the WHILE loop are the UPDATE and the FETCH to get the next row. 
     At first this might seem backwards, but computing the sum in the 
     DECLARE CURSOR is much faster.
 
     *(POST3.CMD - better yet)
     -- declare the cursor on the detail table and
     -- do the sum directly in the cursor definition
     DECLARE c1 CURSOR FOR SELECT transid, SUM(extprice) +
       FROM transdetail GROUP BY transid
     OPEN c1
     FETCH c1 INTO vtransid vind1,vprice vind2
     WHILE SQLCODE <> 100 THEN
     -- this is a non-updatable cursor so an explicit
     -- WHERE clause is used
       UPDATE transmaster SET netamount = .vprice +
         WHERE transid = .vtransid
       FETCH c1 INTO vtransid vind1,vprice vind2
     ENDWHILE
     DROP CURSOR c1
 
     The number of commands has been reduced by over half from the first 
     program, and performance by more than that. In R:BASE 4.0, 4.5 and 4.5 
     Plus! the multi-table update command is actually the fastest way to 
     accomplish this task.
 
     *(POST4.CMD - do a multi-table update if you can)
     -- multi table update command, a view is used
     -- to first calculate the sum and create a
     -- one-one relationship
     DROP VIEW v_trans
     CREATE VIEW v_trans (transid, amount) AS +
       SELECT transid, SUM(extprice) +
       FROM transdetail GROUP BY transid
     UPDATE transmaster SET netamount = amount +
       FROM transmaster ,v_trans t2 +
       WHERE transmaster.transid = t2.transid
 
     Example 2 - a quick report. The task here is to create a quick report 
     of companies from the Customer table and their corresponding contact 
     names from the Contact table. Using nested cursors makes printing the 
     company information once followed by the many rows of contact 
     information easier.
 
     *(CUSTREP1.CMD - the worst case)
     -- nested cursors are used with the declare for
     -- the second cursor inside the while loop of
     -- the first cursor. Also, the data is retrieved
     -- with a SELECT command instead of in the
     -- cursor definition
 
     -- Dropping a cursor before you declare it is a
     -- technique used to guarantee that the cursor does
     -- not exist in memory. The DROP CURSOR normally
     -- returns an error message, that is ok.
     DROP CURSOR c1
     DROP CURSOR c2
 
     -- Only the unique row identifier is specified in
     -- the cursor definition
     DECLARE c1 CURSOR FOR SELECT custid FROM customer +
       ORDER BY custid
     OPEN c1
     FETCH c1 INTO vcustid ind1
     WHILE SQLCODE <> 100 THEN
 
     -- Retrieve and display the rest of the
     -- data for a customer
       SELECT company, custaddress, custcity, +
        custstate, custzip, custphone INTO  +
        vcompany vi1, vaddress vi2, vcity vi3, +
        vstate vi4, vzipcode vi5, vphone vi6 +
        FROM customer WHERE custid = .vcustid
       SET VAR vcsz = (.vcity + ',' & .vstate & .vzipcode)
       WRITE .vcustid, .vcompany
       WRITE .vaddress
       WRITE .vcsz
 
     -- Declare a cursor to identify matching contact rows
       DECLARE c2 CURSOR FOR SELECT contfname, contlname +
         FROM contact WHERE custid = .vcustid
       OPEN c2
       FETCH c2 INTO vfname i1, vlname i2
       WHILE SQLCODE <> 100 THEN
         SET VAR vfullname = (.vfname & .vlname)
         WRITE .vfullname
         FETCH c2 INTO vfname i1, vlname i2
       ENDWHILE
       DROP CURSOR c2
       FETCH c1 INTO vcustid ind1
     ENDWHILE
     DROP CURSOR c1
 
     The next code segment shows the recommended structure for nested 
     cursors. The second DECLARE CURSOR is moved to the top of the program, 
     and the second cursor is opened and closed, not declared and dropped. 
     Just this simple change improves performance.
 
     *(CUSTREP2.CMD - move cursor out of WHILE loop)
     DROP CURSOR c1
     DROP CURSOR c2
     SET VAR vcustid INTEGER
     DECLARE c1 CURSOR FOR SELECT custid +
       FROM customer ORDER BY custid
     DECLARE c2 CURSOR FOR SELECT contfname, contlname +
       FROM contact WHERE custid = .vcustid
 
     -- Get the first row of data for a customer
     OPEN c1
     FETCH c1 INTO vcustid ind1
     WHILE SQLCODE <> 100 THEN
 
     -- Retrieve and display the rest of the
     -- data for a customer
       SELECT company, custaddress, custcity, +
        custstate, custzip, custphone INTO  +
        vcompany vi1, vaddress vi2, vcity vi3, +
        vstate vi4, vzipcode vi5, vphone vi6 +
        FROM customer WHERE custid = .vcustid
       SET VAR vcsz = (.vcity + ',' & .vstate & .vzipcode)
       WRITE .vcustid, .vcompany
       WRITE .vaddress
       WRITE .vcsz
 
     -- Open cursor c2, retrieve and display
     -- the matching contact data
       OPEN c2
       FETCH c2 INTO vfname i1, vlname i2
       WHILE SQLCODE <> 100 THEN
         SET VAR vfullname = (.vfname & .vlname)
         WRITE .vfullname
         FETCH c2 INTO vfname i1, vlname i2
       ENDWHILE
 
     -- Close cursor c2 and get the next row of
     -- customer data
       CLOSE c2
       FETCH c1 INTO vcustid ind1
     ENDWHILE
     DROP CURSOR c1
     DROP CURSOR c2
 
     Moving the data retrieval to the DECLARE CURSOR command
     instead of using a separate SELECT command again improves
     performance.
 
     *(CUSTREP3.CMD
       retrieve data through DECLARE CURSOR)
     DROP CURSOR c1
     DROP CURSOR c2
 
     -- retrieve all the data through the DECLARE CURSOR
     -- command instead of SELECT
     SET VAR vcustid INTEGER
     DECLARE c1 CURSOR FOR SELECT custid, company, +
       custaddress, custcity ,custstate, custzip, +
       custphone FROM customer ORDER BY custid
     DECLARE c2 CURSOR FOR SELECT contfname, contlname +
       FROM contact WHERE custid = .vcustid
     OPEN c1
 
     -- Get the first row of customer data
     FETCH c1 INTO vcustid ind1, vcompany ind2,+
        vaddress ind3, vcity ind4, vstate ind5, +
        vzip ind6, vphone ind7
     WHILE SQLCODE <> 100 THEN
 
     -- Display the customer data and open cursor c2 to
     -- retrieve the matching contact data
       SET VAR vcsz = (.vcity + ',' & .vstate & .vzipcode)
       WRITE .vcustid, .vcompany
       WRITE .vaddress
       WRITE .vcsz
       OPEN c2
       FETCH c2 INTO vfname i1, vlname i2
       WHILE SQLCODE <> 100 THEN
         SET VAR vfullname = (.vfname & .vlname)
         WRITE .vfullname
         FETCH c2 INTO vfname i1, vlname i2
       ENDWHILE
 
     -- Close cursor c2 and get the next row of
     -- customer data
       CLOSE c2
       FETCH c1 INTO vcustid ind1, vcompany ind2,+
        vaddress ind3, vcity ind4, vstate ind5, +
        vzip ind6, vphone ind7
     ENDWHILE
     DROP CURSOR c1
     DROP CURSOR c2
 
     Another small change also improves performance - instead of using 
     SET VAR commands within the WHILE loops to concatenate city, state 
     and zipcode together, and first and last name together, the 
     concatenation operation can be done in the DECLARE CURSOR command. 
     The concatenation in the DECLARE CURSOR reduces the number of 
     commands that are repeated for each row and moves the work to the 
     DECLARE CURSOR command.
 
     *(CUSTREP4.CMD add the concatenation to the DECLARE CURSOR)
     DROP CURSOR c1
     DROP CURSOR c2
     SET VAR vcustid INTEGER
 
     -- Replace SET VAR commands with expressions in
     -- the DECLARE CURSOR
     DECLARE c1 CURSOR FOR SELECT custid, company, +
       custaddress, (custcity + ',' & custstate & custzip), +
       custphone FROM customer ORDER BY custid
     DECLARE c2 CURSOR FOR SELECT (contfname & contlname) +
       FROM contact WHERE custid = .vcustid
     OPEN c1
 
     -- Retrieve and display the customer data
     FETCH c1 INTO vcustid ind1, vcompany ind2, +
       vaddress ind3, vcsz ind4, vphone ind5
     WHILE SQLCODE <> 100 THEN
       WRITE .vcustid, .vcompany
       WRITE .vaddress
       WRITE .vcsz
 
     -- Retrieve and display the contact data
       OPEN c2
       FETCH c2 INTO vfullname i1
       WHILE SQLCODE <> 100 THEN
         WRITE .vfullname
         FETCH c2 INTO vfullname i1
       ENDWHILE
 
     -- Close cursor c2 and get the next row of
     -- customer data
       CLOSE c2
       FETCH c1 INTO vcustid ind1, vcompany ind2, +
        vaddress ind3, vcsz ind4, vphone ind5
     ENDWHILE
     DROP CURSOR c1
     DROP CURSOR c2
 
     The final change to improve performance is to use the RESET option 
     on the OPEN c2 command instead of CLOSE c2. Overall, we have improved 
     performance on this small set of rows by a full second. On a larger 
     data set you can expect to see a greater performance improvement.
 
     *(CUSTREP5.CMD
       reset cursor 2 instead of close and open)
 
     DROP CURSOR c1
     DROP CURSOR c2
     SET VAR vcustid INTEGER
     DECLARE c1 CURSOR FOR SELECT custid, company, +
       custaddress, (custcity + ',' & custstate & custzip), +
       custphone FROM customer ORDER BY custid
     DECLARE c2 CURSOR FOR SELECT (contfname & contlname) +
       FROM contact WHERE custid = .vcustid
     OPEN c1
     FETCH c1 INTO vcustid ind1, vcompany ind2, +
       vaddress ind3, vcsz ind4, vphone ind5
     WHILE SQLCODE <> 100 THEN
       WRITE .vcustid, .vcompany
       WRITE .vaddress
       WRITE .vcsz
 
     -- Open cursor c2 with the RESET option,
     -- no CLOSE command is needed
       OPEN c2 RESET
       FETCH c2 INTO vfullname i1
       WHILE SQLCODE <> 100 THEN
         WRITE .vfullname
         FETCH c2 INTO vfullname i1
       ENDWHILE
     FETCH c1 INTO vcustid ind1, vcompany ind2, +
       vaddress ind3, vcsz ind4, vphone ind5
     ENDWHILE
     DROP CURSOR c1
     DROP CURSOR c2
 
 
     As you can see from the above examples, maximizing the work of the 
     DECLARE CURSOR command provides significant performance improvements. 
     The changes were small and they didn't involve a lot of time or 
     programming effort, but these changes did result in definite 
     performance benefits.
 
     Customize the environment
     In addition to optimizing your programming code, you can improve 
     cursor performance by optimizing the environment. Obviously, code 
     runs faster in the 386/486 version of R:BASE than in the 286 version, 
     and a 486 computer runs programs faster than a 386. Outside of 
     upgrading your hardware, however, certain R:BASE environment settings 
     can be used to improve performance. These settings generally improve
     overall performance as well as cursor performance.
 
     In single-user mode, SET CLEAR OFF, which is really useful when 
     updating data because disk writes are buffered. This setting is not 
     available in multi-user mode.
 
     In either single- or multi-user mode, use the Microrim variables; a 
     cursor is the same as any other query. MICRORIM_F2MAXBLK and 
     MICRORIM_TMPMAXBLK affect actual data retrieval; MICRORIM_F3MAXBLK 
     and MICRORIM_BTMAXBLK are used with index retrieval. A scrolling 
     cursor uses a temporary internal table and can be sped up by using
     MICRORIM_TMPMAXBLK. Refer to Chapter 6 in the Startup and New Features 
     Guide for more information about using these variables. Note that you 
     may not see any performance improvement unless you are working with a 
     large amount of data. If you never retrieve more data than what fits 
     in the default buffers, you won't see a visible performance increase.
 
     Look at the EXPLAIN.DAT file generated by the MICRORIM_EXPLAIN 
     variable to see the cursor query optimization. The OPEN command 
     actually executes the query. Each query executed in your program 
     puts an entry in EXPLAIN.DAT; for example, SELECT or UPDATE commands 
     in the WHILE loop are reflected. You might also see a query reference 
     to the SYS_RULES table, which is used for multi-user locking control.
 
     By using EXPLAIN.DAT, you can easily see why using the RESET option on 
     OPEN is faster. Normally, each OPEN redoes the query. When RESET is 
     used, the query is only optimized once.
 
     The EXPLAIN.DAT entries for the last two command files from Example 
     2 earlier in this article are shown here. The first entry shows 
     nested cursors using the OPEN and CLOSE commands. The second entry 
     shows using the RESET option on OPEN.
 
     Cursor c1 on the Customer table is accessed sequentially, all rows in 
     the table are retrieved, and no WHERE clause is used. If an indexed 
     WHERE clause was used, EXPLAIN.DAT would show the index used. The 
     second cursor on the Contact table does use an indexed WHERE clause 
     to define the query. This query is redone each time the cursor is 
     opened with a different vcustid value.
 
     D:\XCHNG\JULAUG\CODE\custrep4.cmd 417
     05/20/94 14:56:37.580
     SelectCost=1  (OptimizationTime=0ms)
       customer Sequential
 
     D:\XCHNG\JULAUG\CODE\custrep4.cmd 822
     05/20/94 14:56:37.970
     SelectCost=0.06962963  (OptimizationTime=0ms)
       contact (ColumnName=custid,Type=I) Random Dup=1.666667
     Adj=1
 
     D:\XCHNG\JULAUG\CODE\custrep4.cmd 822
     05/20/94 14:56:38.240
     SelectCost=0.06962963  (OptimizationTime=50ms)
       contact (ColumnName=custid,Type=I) Random Dup=1.666667
     Adj=1
 
     D:\XCHNG\JULAUG\CODE\custrep4.cmd 822
     05/20/94 14:56:38.460
     SelectCost=0.06962963  (OptimizationTime=0ms)
       contact (ColumnName=custid,Type=I) Random Dup=1.666667
     Adj=1
 
     ............
 
     D:\XCHNG\JULAUG\CODE\custrep4.cmd 822
     05/20/94 14:56:40.110
     SelectCost=0.06962963  (OptimizationTime=0ms)
       contact (ColumnName=custid,Type=I) Random Dup=1.666667
     Adj=1
 
     D:\XCHNG\JULAUG\CODE\custrep4.cmd 836
     05/20/94 14:56:40.330
     SelectCost=1  (OptimizationTime=0ms)
       SYS_RULES Sequential
 
     The following EXPLAIN.DAT entry uses OPEN c2 RESET. The same query 
     is used each time cursor c2 is accessed. The query does not need to 
     be reoptimized each time the cursor is opened.
 
     D:\XCHNG\JULAUG\CODE\custrep5.cmd 1441
     05/20/94 14:56:40.770
     SelectCost=1  (OptimizationTime=60ms)
       customer Sequential
 
     D:\XCHNG\JULAUG\CODE\custrep5.cmd 1839
     05/20/94 14:56:41.100
     SelectCost=0.06962963  (OptimizationTime=60ms)
       contact (ColumnName=custid,Type=I) Random Dup=1.666667
     Adj=1
 
     D:\XCHNG\JULAUG\CODE\custrep5.cmd 1867
     05/20/94 14:56:42.530
     SelectCost=1  (OptimizationTime=0ms)
       SYS_RULES Sequential