The Search for Significance: A Crash Course in Statistical Significance Using ACS 2007
If we told you the American Community Survey (ACS) found that 26 percent of Hoosier women between the ages of 35 and 44 had a bachelor's degree or more compared to just 23 percent of men, how can you know if that is a real difference in educational attainment (that is, a statistically significant finding) or just a result of random sampling error? This article provides a brief tutorial on calculating statistical significance for those who want to accurately use ACS data without becoming statisticians.1
Margins of Error
As with any survey, margins of error are critical—particularly as the size of the population in question decreases (because that typically increases the margin of error). A large margin of error makes the survey estimate less reliable, which can negatively affect your analysis and comparisons. The ACS reports the margin of error for the 90 percent confidence level. Therefore, if we look at the first row in Table 1, we can say that we're 90 percent confident that the number of Hoosier men between the ages of 25 and 34 is between 422,281 and 427,919 (that range—which is the estimate plus or minus the margin of error—is known as the confidence interval). In other words, there's only a 10 percent chance that the actual number of men in that age group falls outside of that range.
Table 1: Educational Attainment and Confidence Intervals for Indiana Men and Women, 2007
Subject | Male |
Female |
||||
Estimate |
Margin of Error |
Confidence Interval |
Estimate |
Margin of Error |
Confidence Interval |
|
Population 25 to 34 years | 425,100 | +/-2,819 | 422,281-427,919 | 412,591 | +/-2,879 | 409,712-415,470 |
Percent high school graduate or higher | 86.7 | +/-0.8 | 85.9-87.5 | 89.3 | +/-0.7 | 88.6-90 |
Percent bachelor's degree or higher | 23.4 | +/-1.0 | 22.4-24.4 | 28.1 | +/-1.0 | 27.1-29.1 |
Population 35 to 44 years | 447,489 | +/-2,440 | 445,049-449,929 | 444,091 | +/-2,585 | 441,506-446,676 |
Percent high school graduate or higher | 87.1 | +/-0.9 | 86.2-88 | 90.5 | +/-0.8 | 89.7-91.3 |
Percent bachelor's degree or higher | 22.8 | +/-0.9 | 21.9-23.7 | 25.7 | +/-0.9 | 24.8-26.6 |
Population 45 to 64 years | 796,162 | +/-2,157 | 794,005-798,319 | 824,930 | +/-2,596 | 822,334-827,526 |
Percent high school graduate or higher | 88.1 | +/-0.5 | 87.6-88.6 | 89 | +/-0.5 | 88.5-89.5 |
Percent bachelor's degree or higher | 24.2 | +/-0.6 | 23.6-24.8 | 21.5 | +/-0.7 | 20.8-22.2 |
Population 65 years and over | 328,860 | +/-1,151 | 327,709-330,011 | 464,296 | +/-1,431 | 462,865-465,727 |
Percent high school graduate or higher | 75.3 | +/-1.2 | 74.1-76.5 | 73.7 | +/-0.9 | 72.8-74.6 |
Percent bachelor's degree or higher | 18.9 | +/-0.9 | 18-19.8 | 11.1 | +/-0.6 | 10.5-11.7 |
One might think that this is all the information we need to determine statistical significance: As long as the confidence intervals of two numbers don't overlap, we're good to go, right? Unfortunately, it is a bit more complex than that, and the Census Bureau discourages the use of confidence intervals alone to determine a value's statistical significance. Instead, we should calculate z-scores, which are standardized figures that allow us to make comparisons.
Three Steps to Determining Significance
The first step in determining statistical significance is to convert the margin of error into a standard error. This calculation varies depending on if we are using numbers directly from published ACS tables or if we've done some intermediate calculations on our own, such as calculating a percentage. Since our data do not contain any derived estimates, all we need to do for this step is divide the margin of error value by 1.645.2
The second step is to calculate the z-score itself (see Table 2). If we let A represent the male estimates, use B for the female estimates and use SE(A) and SE(B) for the standard errors of those respective estimates, the formula is as follows:
Table 2: Comparing Male and Female Educational Attainment Z-Scores for Indiana, 2007
Subject | Male (A) |
Female (B) |
Z-Score Comparing Male and Female Populations* |
||||
Estimate |
Margin of Error |
Standard Error |
Estimate |
Margin of Error |
Standard Error |
||
Population 25 to 34 years | 425,100 | 2,819 | 1,714 | 412,591 | 2,879 | 1,750 | 5.11 |
Percent high school graduate or higher | 86.7 | 0.8 | 0.486 | 89.3 | 0.7 | 0.426 | -4.02 |
Percent bachelor's degree or higher | 23.4 | 1 | 0.608 | 28.1 | 1 | 0.608 | -5.47 |
Population 35 to 44 years | 447,489 | 2,440 | 1,483 | 444,091 | 2,585 | 1,571 | 1.57 |
Percent high school graduate or higher | 87.1 | 0.9 | 0.547 | 90.5 | 0.8 | 0.486 | -4.64 |
Percent bachelor's degree or higher | 22.8 | 0.9 | 0.547 | 25.7 | 0.9 | 0.547 | -3.75 |
Population 45 to 64 years | 796,162 | 2,157 | 1,311 | 824,930 | 2,596 | 1,578 | -14.02 |
Percent high school graduate or higher | 88.1 | 0.5 | 0.304 | 89 | 0.5 | 0.304 | (2.09) |
Percent bachelor's degree or higher | 24.2 | 0.6 | 0.365 | 21.5 | 0.7 | 0.426 | 4.82 |
Population 65 years and over | 328,860 | 1,151 | 700 | 464,296 | 1,431 | 870 | -121.32 |
Percent high school graduate or higher | 75.3 | 1.2 | 0.729 | 73.7 | 0.9 | 0.547 | 1.75 |
Percent bachelor's degree or higher | 18.9 | 0.9 | 0.547 | 11.1 | 0.6 | 0.365 | 11.86 |
Source: IBRC, using data from the U.S. Census Bureau American Community Survey
Here's an important note for Excel users: When downloading percentage data from American FactFinder, it will format the values as percents (22.8%), which Excel stores in decimal form (0.228). The margins of error, however, are stored as regular numbers (0.9). As one can imagine, mixing those two formats yields utterly meaningless z-scores. Therefore, always make sure to convert any percentages to numeric format (22.8) so they are in the same units as the margin of error before calculating the z-score.
The third step is to use the z-score to determine if the difference between the genders is significant or if random chance can explain the difference. Table 3 provides the z-score thresholds with their corresponding confidence level. Essentially, as the absolute value of the z-score becomes larger, the more confident we are that a real difference in the estimates exists. Looking back at Table 2, we find that nearly all of the values are significant at the 99 percent level, which means that we're 99 percent sure that the difference is not due to random chance.
Table 3: Z-Scores and Levels of Significance
If … | Then the difference between A and B is … |
z < - 1.645 or z > 1.645 | Significant at the 90 percent confidence level |
z < - 1.96 or z > 1.96 | Significant at the 95 percent confidence level |
z < - 2.576 or z > 2.576 | Significant at the 99 percent confidence level |
For more information, download the Census Bureau's instructions on statistical testing and ACS available at www.census.gov/programs-surveys/acs/guidance.html.
Notes
- Data in this article are extracted from Table S1501 in the 2007 American Community Survey dataset, available via American Factfinder at https://data.census.gov.
- The denominator is 1.645 for ACS data from 2006 and later; For ACS data from 2005 or earlier, 1.65 should be used. For the Census Bureau recommended calculations for derived estimates, visit http://census.gov/programs-surveys/acs/guidance.html
Rachel Justis, Geodemographic Analyst
Indiana Business Research Center, Kelley School of Business, Indiana University