Data Don'ts: When You Shouldn't Average Averages

Dealing with data can sometimes cause confusion. One common data mistake is averaging averages. This can often be seen when trying to create a regional number from county data. This article looks at a few popular data sets where averaging data distorts accuracy. Thankfully, STATS Indiana (www.stats.indiana.edu) has built-in functionality to help you avoid that mistake.

Unemployment Rate

The unemployment rate serves as a good example of when you should not average multiple rates since the rate is a calculation itself (unemployed divided by labor force). Table 1 shows the counties comprising the Lafayette metro area. Average those three rates together and you get 7.0 percent. However, the true unemployment rate for the metro is 7.3 percent (7,260 unemployed divided by 100,113 people in the labor force).

Table 1: Lafayette Metro Counties Unemployment Rate, November 2012

Geography Unemployment Rate
Benton 6.8
Carroll 7.0
Tippecanoe 7.3
What You Get If You Average County Rates 7.0
True Metro Unemployment Rate 7.3

Source: IBRC, using Indiana Department of Workforce Development data

Per Capita Personal Income

Per capita personal income (PCPI) is another data set where you should not aggregate data into regions using averages since PCPI is really a calculation (income divided by population). Doing so gives each county equal weight even though they each have different population levels. Table 2 uses the 10-county Indianapolis-Carmel metro area as an example to show that the county average differs from the true PCPI by more than $500.

Table 2: Indianapolis-Carmel, IN Metro Area PCPI, 2011

Area PCPI
Boone County $52,975
Hamilton County $51,824
Hancock County $43,714
Marion County $38,309
Morgan County $37,634
Johnson County $36,570
Hendricks County $36,188
Brown County $35,863
Shelby County $35,376
Putnam County $31,817
What You Get If You Average County PCPIs $40,027
True Metro PCPI $40,572

Source: IBRC, using Bureau of Economic Analysis data

How Using STATS Indiana Helps

STATS Indiana has built-in functionality to provide statistics for metropolitan, micropolitan and combined statistical areas, as well as a wide variety of pre-existing regions, such as the Department of Workforce Development’s economic growth regions and Indiana Economic Development Corporation regions. In addition, you can use the custom region builder for any Indiana regions that aren’t already pre-defined. The links below take you to the relevant outputs on STATS Indiana that were used in this article.

A Caveat Concerning Medians

Data that are reported in terms of a median (e.g., median household income) are another area where averaging causes problems. However, with medians things get a little trickier because the raw data used to calculate the median is often unavailable. Recall that a median is the value that is in the middle. So, in the case of median household income, half of the households in the area have incomes above the median and half of the households are below the median. The problem is that we don’t have the income data for every individual household in the region to calculate the regional median. Oftentimes the best that can be done if a regional median value is absolutely necessary, is to go ahead and average it. In those cases, be sure to call it the average median value instead of a median value.

Rachel Strange
Geodemographic Analyst, Indiana Business Research Center, Indiana University Kelley School of Business