Data Don'ts: When You Shouldn't Average Averages
Dealing with data can sometimes cause confusion. One common data mistake is averaging averages. This can often be seen when trying to create a regional number from county data. This article looks at a few popular data sets where averaging data distorts accuracy. Thankfully, STATS Indiana (www.stats.indiana.edu) has built-in functionality to help you avoid that mistake.
The unemployment rate serves as a good example of when you should not average multiple rates since the rate is a calculation itself (unemployed divided by labor force). Table 1 shows the counties comprising the Lafayette metro area. Average those three rates together and you get 7.0 percent. However, the true unemployment rate for the metro is 7.3 percent (7,260 unemployed divided by 100,113 people in the labor force).
Table 1: Lafayette Metro Counties Unemployment Rate, November 2012
|What You Get If You Average County Rates
|True Metro Unemployment Rate
Source: IBRC, using Indiana Department of Workforce Development data
Per Capita Personal Income
Per capita personal income (PCPI) is another data set where you should not aggregate data into regions using averages since PCPI is really a calculation (income divided by population). Doing so gives each county equal weight even though they each have different population levels. Table 2 uses the 10-county Indianapolis-Carmel metro area as an example to show that the county average differs from the true PCPI by more than $500.
Table 2: Indianapolis-Carmel, IN Metro Area PCPI, 2011
|What You Get If You Average County PCPIs
|True Metro PCPI
Source: IBRC, using Bureau of Economic Analysis data
How Using STATS Indiana Helps
STATS Indiana has built-in functionality to provide statistics for metropolitan, micropolitan and combined statistical areas, as well as a wide variety of pre-existing regions, such as the Department of Workforce Development’s economic growth regions and Indiana Economic Development Corporation regions. In addition, you can use the custom region builder for any Indiana regions that aren’t already pre-defined. The links below take you to the relevant outputs on STATS Indiana that were used in this article.
A Caveat Concerning Medians
Data that are reported in terms of a median (e.g., median household income) are another area where averaging causes problems. However, with medians things get a little trickier because the raw data used to calculate the median is often unavailable. Recall that a median is the value that is in the middle. So, in the case of median household income, half of the households in the area have incomes above the median and half of the households are below the median. The problem is that we don’t have the income data for every individual household in the region to calculate the regional median. Oftentimes the best that can be done if a regional median value is absolutely necessary, is to go ahead and average it. In those cases, be sure to call it the average median value instead of a median value.
Geodemographic Analyst, Indiana Business Research Center, Indiana University Kelley School of Business