DOCUMENT #690
=======================================================================
SPORTS STATISTICS
=======================================================================
PRODUCT: R:BASE VERSION : 3.1B or Higher
=======================================================================
CATALOG: Programming In R:BASE AREA : Statistical Analyst
=======================================================================
R:BASE quickly and easily stores data about team or individual sports
and computes statistics. Use forms to enter the data and then views and
reports to display statistical information. Following are examples for
both team (baseball) and individual (bicycling) sports.
Baseball
========
A common team statistic is won/lost percentages. With team sports, you
also want to compare players. Some of the individual player statistics
are batting average and on-base percentage. All of these can be quickly
calculated using views.
Formulas to compute these statistics are as follows:
won/lost percentage = (total games won)
---------------
(total games played)
batting average = (total hits)
----------
(total at bats)
on-base percentage = (total hits + total bases on balls)
---------------------------------
(total at bats + total bases on balls)
The word total in the above formulas indicates where to do a sum. The
rest is simple addition and division.
Create two tables, one to hold team data, one to hold the individual
player data. You'll have other tables to hold more information about
the players and teams. For example, league, division, player's nickname,
age, right or left handed, position etc. The team_stats table is loaded
with one row for each team per game. The Won and Lost columns are auto-
matically computed from the score. The player_stats table is loaded
with one row of information per player per game.
Table: team_stats
Column definitions
# Name Type Index Expression
1 Team_name TEXT 20
2 Team_score INTEGER
3 Opponent TEXT 20
4 Opp_score INTEGER
5 Won INTEGER (IFGT(team_score,opp_score,1,0))
6 Lost INTEGER (IFLT(team_score,opp_score,1,0))
8 Game_date DATE
Sample data from team_stats
Team_name Team_score Opponent Opp_score Won Lost Game_Date
---------------- ---------- ---------------- --------- --- ---- ---------
Oakland A's 12 NY Yankees 7 1 0 04/12/1993
NY Yankees 7 Oakland A's 12 0 1 04/12/1993
Oakland A's 2 Seattle Mariners 3 0 1 04/20/1993
Seattle Mariners 3 Oakland A's 2 1 0 04/20/1993
NY Yankees 2 Seattle Mariners 9 0 1 04/21/1993
Seattle Mariners 9 NY Yankees 2 1 0 04/21/1993
Texas Rangers 0 Seattle Mariners 1 0 1 05/01/1993
Seattle Mariners 1 Texas Rangers 0 1 0 05/01/1993
NY Yankees 2 Texas Rangers 3 0 1 04/15/1993
Texas Rangers 3 NY Yankees 2 1 0 04/15/1993
Texas Rangers 2 Oakland A's 0 1 0 04/17/1993
Oakland A's 0 Texas Rangers 2 0 1 04/17/1993
Table: player_stats
Column definitions
# Name Type Index Expression
1 Player_Name TEXT 20
2 AtBats INTEGER
3 Hits INTEGER
4 HomeRuns INTEGER
5 Triples INTEGER
6 Doubles INTEGER
7 Singles INTEGER
8 BaseOnBalls INTEGER
9 Gamedate DATE
Sample data from player_stats
Player_Name AtBats Hits HomeRuns Triples Doubles Singles BaseOnBalls Gamedt
-------------- ------ ---- -------- ------- ------- ------- ----------- ------
Ken Griffey Jr. 5 3 0 1 1 1 1 04/15/93
Mickey Mantle 3 1 0 0 0 1 0 04/15/93
Bobby Bonds 5 4 2 0 0 2 1 04/15/93
Willie Mays 3 3 0 1 2 0 2 04/17/93
Henry Aaron 5 4 2 1 1 0 1 04/17/93
Joe DiMaggio 4 3 0 0 0 3 2 04/17/93
Ken Griffey Jr. 4 3 2 1 0 0 2 04/21/93
Mickey Mantle 2 0 0 0 0 0 1 04/21/93
Bobby Bonds 3 1 0 0 0 1 0 04/21/93
Willie Mays 3 1 0 0 0 1 1 05/01/93
Henry Aaron 4 2 0 1 0 1 0 05/01/93
Joe DiMaggio 3 3 0 1 0 2 1 05/01/93
The statistics are calculated using functions available with the SELECT
command and stored as views. The FORMAT function is used to format
output for display, but can't have computations as arguments. So two
views are used; one to do the computations, one to format the output.
Use the following two views to look at the team statistics. After
creating the views, SELECT * FROM teams_2 displays the team standings.
CREATE VIEW teams_1 +
(team,won,lost,pct) AS SELECT +
team_name,SUM(won),SUM(lost), +
(SUM(won)/(COUNT(*))) FROM team_stats +
GROUP BY team_name ORDER BY 4 DESC
CREATE VIEW teams_2 +
(team,won,lost,pct) AS SELECT +
team,won,lost,(FORMAT(pct,'.900')) +
FROM teams_1
team won lost pct
---------------- --- ---- -----
Seattle Mariners 3 0 1.000
Texas Rangers 2 1 0.666
Oakland A's 1 2 0.333
NY Yankees 0 3 0.000
Use the next two views to look at the player statistics. After creating
the views, SELECT or BROWSE from player_2 to display data about all the
players.
CREATE VIEW player_1 +
(PlayerName, BatAvg, OnBase, AtBats, +
HR, Triples,Doubles, Singles, Walks) +
AS SELECT +
player_name,(SUM(hits)/SUM(at_bats)), +
((SUM(hits)+SUM(base_on_balls))/ +
(SUM(at_bats)+SUM(base_on_balls))), +
SUM (at_bats),SUM (home_runs), +
SUM (triples),SUM (doubles), +
SUM(singles),SUM (base_on_balls) +
FROM player_stats GROUP BY player_name
CREATE VIEW player_2 +
(PlayerName, BatAvg, OnBase, AtBats, +
HR, Triples,Doubles, Singles, Walks) +
AS SELECT +
PlayerName,(FORMAT(BatAvg,'.900')),+
(FORMAT(OnBase,'.900')), AtBats,HR, +
Triples,Doubles,Singles,Walks +
FROM player_1 ORDER BY 2 DESC
PlayerName BatAvg OnBase AtBats HR Triples Doubles Singles Walks
--------------- ------ ------ ------ -- ------- ------- ------- -----
Willie Mays .727 .800 11 1 2 3 2 4
Joe DiMaggio .692 .764 13 0 1 2 6 4
Bobby Bonds .666 .700 18 3 0 2 7 2
Ken Griffey Jr. .666 .736 15 2 3 3 2 4
Henry Aaron .666 .687 15 3 3 2 2 1
Mickey Mantle .363 .500 11 0 0 2 2 3
The view Player_2 automatically sorts by batting average. The player
with the highest batting average is listed first. To see the players
ordered by on base percentage use the following command:
SELECT PlayerName,BatAvg,OnBase FROM player_2 ORDER BY OnBase DESC
PlayerName BatAvg OnBase
--------------- ------ ------
Willie Mays .727 .800
Joe DiMaggio .692 .764
Ken Griffey Jr. .666 .736
Bobby Bonds .666 .700
Henry Aaron .666 .687
Mickey Mantle .363 .500
Bicycling
---------
Statistics for individual sports are different from team sports.
Instead of comparing player to player, the comparison is yourself
against the clock, and how well did you do this week compared to last
week. By tracking the distance traveled and the time it took, R:BASE
reports can compute your speed. The reports are printed for a specified
time period to track progress and performance.
Reports are used instead of views because a sequence of expressions are
needed to do the computations. The time must be converted to seconds
and then hours in order to calculate the speed (miles per hour). This
is easier to do in a report than in a SELECT command in a view. A report
also lets you do multiple levels of totals.
Create a table to hold information about each outing. Record the
distance traveled, the time it took and the date. Add columns to hold
other relevant information as desired, for example, route, weather etc.
Table: CYCLE_LOG
Column definitions
# Name Type Index Expression
1 Ride_Date DATE *
2 Mileage INTEGER
2 Ride_Time TIME
4 Route TEXT 20
5 Ride_Notes NOTE
Create a report with the following layout and expressions:
RH
PH
PH DATE ROUTE MILEAGE TIME SPEED
PH ------- --------------------- --------- ------ -------
D ride_date route mileage ride_time vspeed MpH
PF
RF TOTALS: vmileage vavg_speed Mph
1: INTEGER vsec = (RIDE_TIME - 00:00:00)
2: DOUBLE vhours = (.vsec / 3600)
3: DOUBLE vspeed = (MILEAGE / .vhours)
4: INTEGER vmileage = SUM OF MILEAGE
5: INTEGER vtotal_sec = (.vtotal_sec + .vsec)
6: DOUBLE vtotal_hours = (.vtotal_sec / 3600)
7: DOUBLE vavg_speed = (.vmileage / .vtotal_hours)
The variable vsec converts the time to seconds, then the seconds are
converted to hours (vhours) to compute the speed in miles per hour
(vspeed). Variables vmileage, vtotal_sec, vtotal_hours are used to
compute the average speed, vavg_speed, for all rides.
Output from the report looks like this:
DATE ROUTE MILEAGE TIME SPEED
-------- ---------------------- ------- ------- --------
01/11/92 Vashon Island 25 1:22:00 18.2 MpH
03/01/92 Bainbridge Island 26 1:31:00 17.1 MpH
03/29/92 Vashon Island 25 1:17:22 19.3 MpH
06/06/92 Whidbey Island 126 6:55:00 18.2 MpH
06/20/92 Vashon Island 25 1:14:30 20.1 MpH
06/27/92 STP 195 11:45:00 16.5 MpH
07/03/92 Southworth 60 3:06:00 9.3 MpH
07/15/92 Vashon Island 25 1:13:42 20.3 MpH
07/23/92 Vashon Island 25 1:15:26 19.8 MpH
07/30/92 RAMROD 154 9:23:00 16.4 MpH
08/02/92 Vashon Island 25 1:16:50 19.5 MpH
08/25/92 Vashon Island 25 1:08:47 21.8 MpH
09/13/92 Vashon Island 25 1:15:23 19.8 MpH
10/03/92 West Seattle/I-90 30 3:00:00 10.0 MpH
10/04/92 West Seattle/I-90 15 1:30:00 10.0 MpH
TOTALS: 806 17.0 Mph
To do a summary report by month, add an expression to find the month,
vmonth = (TMON(ride_date)), and break on the variable. Change the Detail
line to a Break Footer and reset the accumulator variables, vmileage and
vtotal_sec, at the break. Add a second group of expressions to do report
totals. The report generates summary data to track monthly performance.
Expressions for summary report:
1: INTEGER vsec = (RIDE_TIME - 00:00:00)
2: INTEGER vtotal_sec = (.vtotal_sec + .vsec)
3: DOUBLE vhours = (.vtotal_sec / 3600)
4: INTEGER vmileage = SUM OF MILEAGE
5: DOUBLE vspeed = (.vmileage / .vhours)
6: TEXT vmonth = (TMON(ride_date))
7: INTEGER vtotal_miles = SUM OF MILEAGE
8: INTEGER vrtotal_sec = (.vrtotal_sec + .vsec)
9: DOUBLE vtotal_hours = (.vrtotal_sec / 3600)
10: DOUBLE vavg_speed = (.vtotal_miles/.vtotal_hours)
When printing the monthly summary report, order by the date so it will
break correctly on month.
R>PRINT mthcycle ORDER BY Ride_Date
MONTH MILEAGE SPEED
--------- --------- -------
January 25 18.2 MpH
March 51 18.1 MpH
June 346 17.3 MpH
July 264 17.6 MpH
August 50 20.6 MpH
September 25 19.8 MpH
October 45 10.0 MpH
Totals: 806 17.0 Mph
This structure can easily be adapted to other individual sports such as
running, swimming, kyaking etc.