825.TXT
=====================================================================
Finding The Top "n"
=====================================================================
PRODUCT: R:BASE VERSION: 4.5 or Higher
=====================================================================
CATALOG: Programming in R:BASE AREA : Data Manipulation
=====================================================================
Using views, the "top n" problem (you want to find the top 3 or the
top 5 records) is a snap. The view makes it easy to select and rank
the data, then display just the top records. Ranking the data is
easy, you just order the data. The difficult part of the problem is
displaying only the top "n" rows.
For example, to rank salespeople by sales amount, just select the
data and order in reverse order_the highest amount first:
SELECT empid, invoicetotal FROM transmaster ORDER BY invoicetotal+
DESC
empid invoicetotal
---------- ---------------
102 $192,016.00
160 $182,469.75
129 $169,650.50
131 $166,104.75
160 $153,067.30
131 $118,646.25
129 $104,190.50
102 $95,462.50
133 $87,280.00
131 $81,006.75
129 $68,733.00
129 $61,368.75
131 $45,822.00
102 $32,730.00
129 $31,775.37
102 $31,639.00
133 $29,457.00
102 $24,547.50
The salesperson with the highest sale is listed first. If you don't
have many rows of data, this command may be sufficient, but often you
want to see just the top salesperson, or just the top 3.
You can't use the WHERE clause conditions LIMIT and COUNT to retrieve
the top "n" records. They always work on the data before it has been
sorted; you need to retrieve the top "n" after the data is sorted.
Using either of these conditions returns just the first 2 rows from
the table.
SELECT empid, invoicetotal FROM transmaster WHERE LIMIT = 2 ORDER BY+
invoicetotal DESC
empid invoicetotal
---------- ---------------
133 $29,457.00
160 $10,910.00
SELECT empid, invoicetotal FROM transmaster WHERE COUNT <= 2 ORDER+
BY invoicetotal DESC
empid invoicetotal
---------- ---------------
133 $29,457.00
160 $10,910.00
There is a technique using the SELECT command that ranks the data and
selects the top "n", but only if there is no duplicate data in the
rows. If you have duplicate data values in the comparison column, then
this technique returns incorrect data.
SELECT empid,invoicetotal FROM transmaster t1 WHERE 2 >= (SELECT
COUNT(*) FROM transmaster t2 WHERE t1.invoicetotal <= t2.invoicetotal)
empid invoicetotal
---------- ---------------
102 $192,016.00
160 $182,469.75
The easy way to solve the problem is to create a view with the desired
data in sorted order. Since the data retrieved by the view is already
sorted, the LIMIT operator quickly retrieves the top "n" rows.
CREATE VIEW rank1 AS SELECT empid,invoicetotal FROM transmaster ORDER
BY invoicetotal DESC
SELECT * FROM rank1
empid invoicetotal
---------- ---------------
102 $192,016.00
160 $182,469.75
129 $169,650.50
131 $166,104.75
160 $153,067.30
131 $118,646.25
129 $104,190.50
102 $95,462.50
133 $87,280.00
131 $81,006.75
129 $68,733.00
129 $61,368.75
131 $45,822.00
102 $32,730.00
129 $31,775.37
102 $31,639.00
133 $29,457.00
102 $24,547.50
160 $10,910.00
SELECT * FROM rank1 WHERE LIMIT = 2
empid invoicetotal
---------- ---------------
102 $192,016.00
160 $182,469.75
You can also use a view to find the "top n" when you want the ranking
based on a sum or other calculation.
CREATE VIEW rank2 (empid, total) AS SELECT empid, SUM(invoicetotal)
FROM transmaster GROUP BY empid ORDER BY 2 DESC
The ORDER BY clause supports ordering by expressions, so we can order
by total sales in reverse order (DESC is for descending). And once
the data in the view is in the correct order, the LIMIT operator
retrieves just the specified rows.