Potential privacy lapse found in 2010 census data


Associated Press

WASHINGTON

An internal team at the Census Bureau found that basic personal information collected from more than 100 million Americans during the 2010 head count could be reconstructed from obscured data, but with lots of mistakes, a top agency official disclosed Saturday.

The age, gender, location, race and ethnicity for 138 million people were potentially vulnerable. So far, however, only internal hacking teams have discovered such details at possible risk, and no outside groups are known to have grabbed data intended to remain private for 72 years, chief scientist John Abowd told a scientific conference.

The Census Bureau is now scrapping its old data shielding technique for a state-of-the-art method that Abowd claimed is far better than Google’s or Apple’s.

Some former agency chiefs fear the potential privacy problem will add to the worries that people will avoid answering or lie on the once-every-10-year survey because of the Trump administration’s attempt to add a much-debated citizenship question.

The Supreme Court on Friday announced that it would rule on that proposed question, which has been criticized for being political and not properly tested in the field. The census count is hugely important, helping with the allocation of seats in the House of Representatives and distribution of billions of dollars in federal money.

The 8 billion pieces of statistics in census data are supposed to be jumbled in a way so what is released publicly for research cannot identify individuals for more than seven decades. In 2010, the Census Bureau did this by swapping similar household information from one city to another, according to Duke University statistics professor Jerome Reiter.

In the internal tests, Abowd said, officials were able to match 45 percent of the people who answered the 2010 census with information from public and commercial data sets such as Facebook. But errors in this technique meant that only data for 52 million people would be completely correct – little more than 1-in-6 of the U.S. population.

The decision on the official privacy/accuracy setting for 2020 hasn’t been set. Abowd said policy officials, not engineers or scientists, will make that call.

The Census Bureau tried this system in a 2018 survey using an ultra-strict privacy setting that, while not directly comparable to Google or Apple, is hundreds if not thousands of times more secure for privacy than what’s now being used on data from searches using Google Chrome or Apple’s iPhone, Duke’s Reiter said.

Prewitt suggested the public might not understand the extra efforts underway for the 2020 count but would be spooked by the disclosure about the privacy vulnerability, making people more reluctant to comply with the next census.