Alessandro Acquisti and Ralph Gross published the report, called Predicting Social Security Numbers from Public Data, in the Proceedings of the National Academy Of Sciences of the United States Of America. In it, they claim to have fully guessed 8.5% of social security numbers in the Social Security Administration's list of social security numbers for the deceased, using under 1000 tries per record. They also guessed the first five digits for 44% of those records.
"Extrapolating to the US living population, this would imply the potential identification of millions of SSNs for individuals whose birth data were available," the researchers said.
The algorithm created by the researchers works by analyzing publicly available records in the SSA's Death Master File (DMF). This is a list of social security numbers held for people who have died. The algorithm detects statistical patterns in the social security number assignment for deceased individuals. The researchers claim to be able to guess social security numbers of the living in many cases by interpolating a person's state and date of birth with the patterns detected across deceased individuals' social security numbers.
"These findings confirm that patterns extrapolated from deceased individuals’ SSNs in fact can be used to predict the SSNs of living individuals based entirely on public data," said the report.
The study carries significant ramifications for identity theft and personal security in the United States, where social security numbers have traditionally been used as a means of identifying and verifying individuals. The researchers propose various means by which criminals could test estimates of social security numbers for an individual, such as using botnets to test social security numbers against online services from different IP addresses.
"In the short term, one of the least costly counter measures would have the SSA fully randomize the assignment scheme, abandoning the matching of area numbers to states, and the sequential assignment of serial numbers," said the report, while admitting that this would only help to protect the recipients of new SSNs; existing ones would still be predictable. "To address those concerns, various recent legislative initiatives have been focusing on removing SSNs from public exposure or redacting their first 5 digits." However, the authors go on to say that already-exposed data cannot be taken back with any degree of certainty.