DOCUMENT #680
=======================================================================
BASIC STATISTICS: COUNT, AVERAGE, SUM, MINIMUM, MAXIMUM
=======================================================================
PRODUCT: R:BASE VERSION : 3.1 or Higher
=======================================================================
CATALOG: Programming In R:BASE AREA : Other
=======================================================================
R:BASE contains over 70 SuperMath functions for computations in addition
to the simple addition, subtraction, multiplication and division oper-
ations. However, the SuperMath functions don't apply when doing a count,
average, sum, minimum, or maximum on a group of rows from a table.
The SuperMath functions LAVG, LMIN, and LMAX work across a row, i.e.
they compute the average, minimum and maximum column values for a
group of columns from a given row, they do not return the average,
minimum or maximum value for a group of rows from a given column. Use
the AVG, MIN or MAX function with the SELECT or COMPUTE command to
return the value for a column. The LAVG, LMIN and LMAX SuperMath
functions are not interchangeable with the AVG, MIN and MAX functions.
The SELECT and COMPUTE commands also have COUNT and SUM functions.
In addition to the functions used with SELECT and COMPUTE, count,
average, sum, minimum, and maximum are calculated using a row-by-row
processing method and the R:BASE programming language. A program does
row-by-row processing using DECLARE CURSOR or performs the calculations
with one command using SELECT or COMPUTE. Note that COMPUTE does provide
additional calculations not available with SELECT: standard deviation
and variance.
A report is processed row-by-row; expressions are evaluated with each
row that is read from the driving table or view. A report uses basically
the same procedure to calculate the count, average, sum, minimum, or m
aximum as in a program. Functions in the SELECT command can also be used
in reports, but are evaluated on every row, not just once. This can
slow down a report. The COMPUTE command functions are not available in
reports.
Before you start
================
Before you start, you need to know a little bit about your data. Does
the column you're doing computations on contain NULL values? A NULL
value is how R:BASE represents no data entered in a field. The SELECT
and COMPUTE functions do not include NULL values in the computation,
but with row-by-row processing methods you need to consider NULLs and
explicitly exclude them, if necessary, to return a correct result.
Also, when doing row-by-row processing, you must SET ZERO ON. The
setting of ZERO tells R:BASE how to treat NULLs in calculations. If
ZERO is OFF, then calculations involving NULLs return NULL. When ZERO
is ON, NULLs are treated as if they were 0. SET ZERO ON to ensure totals
accumulate correctly and comparisons evaluate properly.
The count, average, sum, minimum, and maximum described below work on
INTEGER, REAL, DOUBLE or CURRENCY datatype columns. They can be done on
all the rows in a table or on a group of rows by adding an appropriate
WHERE clause to the command. In a report, rows are grouped by using
breakpoints.
COUNT
=====
Using one command
-----------------
SELECT COUNT(*) FROM tblname
- counts all the rows in a table, including NULL values
- same as COMPUTE ROWS FROM tblname
- can be used in a report, is evaluated for every row
SELECT COUNT(colname) FROM tblname
- counts all the rows where the specified column is not NULL
- store the result in a variable
SELECT COUNT(colname) INTO varname FROM tblname
- same as COMPUTE COUNT colname FROM tblname
- can be used in a report, is evaluated for every row; format is similar
to a lookup expression
varname INTEGER = COUNT(colname) IN tblname WHERE linkcol = linkcol
Using row-by-row processing
---------------------------
SET VAR vcount INTEGER = (.vcount + 1)
- often called a counter variable
- used in a program in WHILE or other looping structure
- counts the number of rows processed
- increments itself by one as each row is read
- counts all rows, including NULLs
- must use SET ZERO ON to accumulate the count
- in a report is used without the SET VAR
vcount INTEGER = (.vcount + 1)
Conditional count
-----------------
SET VAR vcount INTEGER = (IFEQ(name,value,(.vcount+1),.vcount))
- compares name and value, if they are equal the count is incremented
by one; name and value can be columns (reports only), variables or
constants
- NULL values are considered equal
- function can be nested if name and value are non-TEXT datatypes
- use in a report without the SET VAR
vcount INTEGER = (IFEQ(name,value,(.vcount+1),.vcount))
SUM
===
Using one command
-----------------
SELECT SUM(colname) FROM tblname
- calculates the sum of all non-NULL values in the specified column
- store the result in a variable
SELECT SUM(colname) INTO varname FROM tblname
- variable is automatically datatyped based on the column
- same as COMPUTE SUM colname FROM tblname
- can be used in a report, is evaluated for every row, format is
similar to a lookup expression
varname = SUM(colname) IN tblname WHERE linkcol = linkcol
Using row-by-row processing
---------------------------
SET VAR vtotal = (.vtotal + value)
- must initialize the variables
- often called an accumulator variable or a running total
- value can be a column (reports only), variable or constant
- adds value to the variable vtotal on a row-by-row basis
- must use SET ZERO ON to accumulate the total
- use in a report without the SET VAR
vtotal = (.vtotal + value)
Conditional sum
---------------
SET VAR vtotal = (IFEQ(name, value1,(.vtotal + value2),
.vtotal))
- compares name and value1, if they are equal, adds value2 to the
variable vtotal
- name, value1 and value2 can be columns (reports only), variables or
constants
- function can be nested if name and value1 are non-TEXT datatypes
- use in a report without the SET VAR
vtotal = (IFEQ(name, value1,(.vtotal + value2),.vtotal))
AVERAGE
=======
Using one command
-----------------
SELECT AVG(colname) FROM tblname
- calculates the average of all non-NULL values in the specified column
- store the result in a variable
SELECT AVG(colname) INTO varname FROM tblname
- variable is automatically datatyped based on the column
- same as COMPUTE AVG colname FROM tblname
- can be used in a report, is evaluated for every row, format is similar
to a lookup expression
varname = AVG(colname) IN tblname WHERE linkcol = linkcol
Using row-by-row processing
---------------------------
An average is a sum divided by a count. First count the rows, then sum
the rows, then divide the sum by the count. The sum and the count can
be the result of SELECT functions, of row-by-row processing, or of a
conditional operation. If there are NULLs in the data, be sure to
exclude them from the count. If you don't, the average will be wrong.
They will be counted and included in the average as zero values.
SELECT COUNT(*) INTO var_count FROM tblname
SELECT SUM(colname) INTO var_sum FROM tblname
SET VAR var_avg = (.var_sum/.var_count)
MINIMUM
=======
Using one command
-----------------
SELECT MIN(colname) FROM tblname
- calculates the minimum value of all non-NULL values in the specified
column
- store the result in a variable
SELECT MIN(colname) INTO varname FROM tblname
- variable is automatically datatyped based on the column
- same as COMPUTE MIN colname FROM tblname
- can be used in a report, is evaluated for every row, format is similar
to a lookup expression
variable = MIN(colname) IN tblname WHERE linkcol = linkcol
Using row-by-row processing
---------------------------
In a program, you get the first row from the table, then compare that
value with each succeeding row. If the new value is less, it becomes
the minimum. For example,
SET VAR vmin = colname IN tblname WHERE LIMIT=1
DECLARE c1 CURSOR FOR SELECT colname FROM tblname
OPEN c1
FETCH c1 INTO vcol ind1
WHILE SQLCODE <> 100 THEN
SET VAR vmin = (IFLT(.vcol,.vmin,.vcol,.vmin))
FETCH c1 INTO vcol ind1
ENDW
In a report, the process is similar, but because you can only set
variables, the logic is a little bit different. Use a counter variable
to determine the initial value; if the counter is 1, it is the first
row. Then, compare each row, replacing the minimum if the new value is
smaller. If computing the minimum for each group, reset the variable
vcount at the breakpoint.
vcount INTEGER = (.vcount + 1)
vmin = (IFEQ(.vcount,1,colname,(IFLT(colname,.vmin,
colname,.vmin)) ))
MAXIMUM
=======
SELECT MAX(colname) FROM tblname
- calculates the maximum of all non-NULL values in the specified column
- store the result in a variable
SELECT MAX(colname) INTO varname FROM tblname
- variable is automatically datatyped based on the column
- same as COMPUTE MAX colname FROM tblname
- can be used in a report, is evaluated every row, format is similar to a
lookup expression
varname = MAX(colname) IN tblname WHERE linkcol = linkcol
Using row-by-row processing
---------------------------
In a program you get the first value from the table, then compare that
with each succeeding row. If the new value is greater, it becomes the
maximum For example,
SET VAR vmax = colname IN tblname WHERE LIMIT=1
DECLARE c1 CURSOR FOR SELECT colname FROM tblname
OPEN c1
FETCH c1 INTO vcol ind1
WHILE SQLCODE <> 100 THEN
SET VAR vmax = (IFGT(.vcol,.vmax,.vcol,.vmax))
FETCH c1 INTO vcol ind1
ENDW
In a report, the process is similar, but because you can only set
variables, the logic is a little bit different. Use a counter variable
to determine the initial value, if the counter is 1, it is the first
row. Then compare each row, replacing the maximum if the new value is
greater. If computing the maximum for each group, reset the variable
vcount at the breakpoint.
vcount INTEGER = (.vcount + 1)
vmax = (IFEQ(.vcount,1,colname,(IFGT(colname,.vmax,
colname,.vmax)) ))