Census 2020: Balancing privacy and data reliability

Matt Kinghorn

Senior Demographic Analyst, Indiana Business Research Center, Indiana University Kelley School of Business

The differential privacy (DP) process involves adding a specific amount of statistical noise to most of the data variables published for the 2020 census to protect privacy.

The U.S. Census Bureau is currently undertaking its once-a-decade head count of the nation’s population. While the decennial census is constitutionally mandated for the purposes of reapportioning representation in Congress, data users also rely on the data gathered by the census for many purposes—such as distributing public funds to states and localities, public policy formulation and evaluation, and a broad range of research and analysis activities.

For the Census Bureau, there is always a tension between the goal of providing complete and reliable data against the mandate to protect the privacy of all census respondents. In recent decades, the Census Bureau has implemented a disclosure avoidance technique known as “data swapping” to protect privacy. With data swapping, if there was a situation where the characteristics of a respondent was unique for their geographic area—and thus potentially making it possible to identify them—then their information would be exchanged with that of a respondent in a nearby area. In this hypothetical situation, the data ultimately published for these geographic areas would be accurate for measures such as total population count, but some information on the characteristics of the populations would be altered to protect privacy.

However, there has been a growing concern that the Census Bureau’s disclosure avoidance techniques no longer provide sufficient privacy protections (although there are no known cases of census respondent reidentifications) because of advancements in computing power, more sophisticated data processing techniques and a greater availability of personal information from other sources. As a result, the Census Bureau made the decision to adopt a new disclosure avoidance technique for the 2020 census called “differential privacy.”

The differential privacy (DP) process involves adding a specific amount of statistical noise—or a so-called “privacy-loss budget”—to most of the data variables published for the 2020 census. The particulars of this process are beyond the scope of this brief summary, but those interested in learning more can view the Census Bureau’s web resources on this topic¹ or view the informative video below.

For most data users, the bottom line with this change will be that the greater privacy protections offered by DP could compromise data accuracy in some cases, such as for smaller geographic areas or for smaller sub-populations (e.g., age groups, race groups, etc.). It is important to note that three data variables will not be altered by the DP procedures and will be reported as enumerated. These three “invariant” measures are: total population for states (which provides accurate data for congressional reapportionment), total housing units at the census block level (the smallest unit of census geography) and group quarters at the block level. All other variables will be subject to the DP procedures.

Initial census demonstration data showed alarming results

To help data users gauge the impacts that DP could have on the census results, the Census Bureau released its “2010 Demonstration Data Products,” which included the actual data published from the 2010 census along with a companion data set of the 2010 results after applying the new DP procedures. After having a chance to review these numbers, the response from many in the data user community bordered on panic. Put simply, the data showed such significant biases and distortions that they would have been unsuitable for use in many of the applications that have traditionally relied on census data.

Census Bureau recognizes problems and is working on solutions

In a recent blog post, Census Bureau officials acknowledged that the feedback they received from data users showed that the DP procedure, as currently implemented, introduced an unacceptable level of error.² The bureau reports that its subsequent analysis indicates that much of this error was an unintentional byproduct of its post-processing procedures (as opposed to intentional “noise” introduced to the data through the DP procedure), and that it is currently working through some solutions.

Ideally, the Census Bureau would publish a second set of demonstration data once it had corrected these post-processing errors, but given time and resource constraints, a second release is unlikely. Instead, the Census Bureau is developing a set of summary data quality measures that it will use to evaluate progress as it fine-tunes the DP process.³ Once finalized, the bureau will apply these measures to the previously released demonstration data products to serve as a benchmark, then publish updated summary statistics that will hopefully show the improvements they are making internally.

At the Indiana Business Research Center, we will follow this process closely and provide updates to the Indiana data user community along the way. We will also be sure to give extra scrutiny to the actual 2020 census data that are ultimately published to see if there are any areas in which the data seem unreasonable.

The Census Bureau faces a difficult dilemma: provide the fine-grained, reliable data that we have all come to expect while also guaranteeing privacy to all respondents. With differential privacy, we might have to sacrifice a bit of the former to ensure the latter. Let’s just hope the bureau is able to strike a balance that satisfies both aims.

Notes

“Disclosure Avoidance and the 2020 Census,” U.S. Census Bureau, last modified March 27, 2020, www.census.gov/about/policies/privacy/statistical_safeguards/disclosure-avoidance-2020-census.html.
John M. Abowd and Victoria A. Velkoff, “Modernizing Disclosure Avoidance: What We’ve Learned, Where We Are Now,” Research Matters (blog), March 13, 2020, www.census.gov/newsroom/blogs/research-matters/2020/03/modernizing_disclosu.html.
“2020 Disclosure Avoidance System Updates,” U.S. Census Bureau, last modified July 1, 2021, www.census.gov/data/academy/webinars/2021/disclosure-avoidance-series/2020-disclosure-avoidance-system-update.html.

Census 2020: Balancing privacy and data reliability

Initial census demonstration data showed alarming results

Census Bureau recognizes problems and is working on solutions

Notes

Inside this Issue