The Search for Significance: A Crash Course in Statistical Significance Using ACS 2007

If we told you the American Community Survey (ACS) found that 26 percent of Hoosier women between the ages of 35 and 44 had a bachelor's degree or more compared to just 23 percent of men, how can you know if that is a real difference in educational attainment (that is, a statistically significant finding) or just a result of random sampling error? This article provides a brief tutorial on calculating statistical significance for those who want to accurately use ACS data without becoming statisticians.1

Margins of Error

As with any survey, margins of error are critical—particularly as the size of the population in question decreases (because that typically increases the margin of error). A large margin of error makes the survey estimate less reliable, which can negatively affect your analysis and comparisons. The ACS reports the margin of error for the 90 percent confidence level. Therefore, if we look at the first row in Table 1, we can say that we're 90 percent confident that the number of Hoosier men between the ages of 25 and 34 is between 422,281 and 427,919 (that range—which is the estimate plus or minus the margin of error—is known as the confidence interval). In other words, there's only a 10 percent chance that the actual number of men in that age group falls outside of that range.

Table 1: Educational Attainment and Confidence Intervals for Indiana Men and Women, 2007

Subject
Male
Female
Estimate
Margin of Error
Confidence Interval
Estimate
Margin of Error
Confidence Interval
Population 25 to 34 years 425,100 +/-2,819 422,281-427,919 412,591 +/-2,879 409,712-415,470
Percent high school graduate or higher 86.7 +/-0.8 85.9-87.5 89.3 +/-0.7 88.6-90
Percent bachelor's degree or higher 23.4 +/-1.0 22.4-24.4 28.1 +/-1.0 27.1-29.1
Population 35 to 44 years 447,489 +/-2,440 445,049-449,929 444,091 +/-2,585 441,506-446,676
Percent high school graduate or higher 87.1 +/-0.9 86.2-88 90.5 +/-0.8 89.7-91.3
Percent bachelor's degree or higher 22.8 +/-0.9 21.9-23.7 25.7 +/-0.9 24.8-26.6
Population 45 to 64 years 796,162 +/-2,157 794,005-798,319 824,930 +/-2,596 822,334-827,526
Percent high school graduate or higher 88.1 +/-0.5 87.6-88.6 89 +/-0.5 88.5-89.5
Percent bachelor's degree or higher 24.2 +/-0.6 23.6-24.8 21.5 +/-0.7 20.8-22.2
Population 65 years and over 328,860 +/-1,151 327,709-330,011 464,296 +/-1,431 462,865-465,727
Percent high school graduate or higher 75.3 +/-1.2 74.1-76.5 73.7 +/-0.9 72.8-74.6
Percent bachelor's degree or higher 18.9 +/-0.9 18-19.8 11.1 +/-0.6 10.5-11.7
Source: IBRC, using data from the U.S. Census Bureau American Community Survey

One might think that this is all the information we need to determine statistical significance: As long as the confidence intervals of two numbers don't overlap, we're good to go, right? Unfortunately, it is a bit more complex than that, and the Census Bureau discourages the use of confidence intervals alone to determine a value's statistical significance. Instead, we should calculate z-scores, which are standardized figures that allow us to make comparisons.

Three Steps to Determining Significance

The first step in determining statistical significance is to convert the margin of error into a standard error.  This calculation varies depending on if we are using numbers directly from published ACS tables or if we've done some intermediate calculations on our own, such as calculating a percentage. Since our data do not contain any derived estimates, all we need to do for this step is divide the margin of error value by 1.645.2

The second step is to calculate the z-score itself (see Table 2). If we let A represent the male estimates, use B for the female estimates and use SE(A) and SE(B) for the standard errors of those respective estimates, the formula is as follows:

Z-Score Equation

Table 2: Comparing Male and Female Educational Attainment Z-Scores for Indiana, 2007

Subject
Male (A)
Female (B)
Z-Score Comparing Male
and Female Populations*
Estimate
Margin of Error
Standard Error
Estimate
Margin of Error
Standard Error
Population 25 to 34 years 425,100 2,819 1,714 412,591 2,879 1,750 5.11
Percent high school graduate or higher 86.7 0.8 0.486 89.3 0.7 0.426 -4.02
Percent bachelor's degree or higher 23.4 1 0.608 28.1 1 0.608 -5.47
Population 35 to 44 years 447,489 2,440 1,483 444,091 2,585 1,571 1.57
Percent high school graduate or higher 87.1 0.9 0.547 90.5 0.8 0.486 -4.64
Percent bachelor's degree or higher 22.8 0.9 0.547 25.7 0.9 0.547 -3.75
Population 45 to 64 years 796,162 2,157 1,311 824,930 2,596 1,578 -14.02
Percent high school graduate or higher 88.1 0.5 0.304 89 0.5 0.304 (2.09)
Percent bachelor's degree or higher 24.2 0.6 0.365 21.5 0.7 0.426 4.82
Population 65 years and over 328,860 1,151 700 464,296 1,431 870 -121.32
Percent high school graduate or higher 75.3 1.2 0.729 73.7 0.9 0.547 1.75
Percent bachelor's degree or higher 18.9 0.9 0.547 11.1 0.6 0.365 11.86
Note: Bold cells are significant at the 99 percent confidence level.
Source: IBRC, using data from the U.S. Census Bureau American Community Survey

Here's an important note for Excel users: When downloading percentage data from American FactFinder, it will format the values as percents (22.8%), which Excel stores in decimal form (0.228). The margins of error, however, are stored as regular numbers (0.9). As one can imagine, mixing those two formats yields utterly meaningless z-scores. Therefore, always make sure to convert any percentages to numeric format (22.8) so they are in the same units as the margin of error before calculating the z-score.

The third step is to use the z-score to determine if the difference between the genders is significant or if random chance can explain the difference. Table 3 provides the z-score thresholds with their corresponding confidence level. Essentially, as the absolute value of the z-score becomes larger,  the more confident we are that a real difference in the estimates exists. Looking back at Table 2, we find that nearly all of the values are significant at the 99 percent level, which means that we're 99 percent sure that the difference is not due to random chance.

Table 3: Z-Scores and Levels of Significance

If … Then the difference between A and B is …
z < - 1.645 or z > 1.645
Significant at the 90 percent confidence level
z < - 1.96 or z > 1.96
Significant at the 95 percent confidence level
z < - 2.576 or z > 2.576
Significant at the 99 percent confidence level
Source: U.S. Census Bureau American Community Survey

For more information, download the Census Bureau's instructions on statistical testing and ACS available at  www.census.gov/programs-surveys/acs/guidance.html.

Notes

  1. Data in this article are extracted from Table S1501 in the 2007 American Community Survey dataset, available via American Factfinder at https://data.census.gov.
  2. The denominator is 1.645 for ACS data from 2006 and later; For ACS data from 2005 or earlier, 1.65 should be used. For the Census Bureau recommended calculations for derived estimates, visit http://census.gov/programs-surveys/acs/guidance.html

Rachel Justis, Geodemographic Analyst
Indiana Business Research Center, Kelley School of Business, Indiana University