AI can Find you even in Anonymized Datasets

Weekly social encounters create distinct signatures that help individuals stand out.

Wearing a mask might help you remain incognito in a crowd. However, AI can still discover you in anonymized mobile phone databases based on trends in your social activities.

The way you engage with others in a crowd may help you stand out from the crowd, at least to artificial intelligence.

When loaded information about a target person’s mobile phone interactions and also the interactions of their contacts, AI can correctly identify the target out of more than 40,000 anonymous mobile phone service subscribers more than half of the time, according to researchers who published their findings in Nature Communications on January 25.

The results imply that people interact in ways that may be exploited to identify individuals in apparently anonymized databases.

According to Jaideep Srivastava, a computer scientist at the University of Minnesota in Minneapolis who was not involved in the work, it’s no surprise that individuals prefer to stay within established social circles and that these frequent contacts develop a stable pattern over time.

“But the fact that you can use that pattern to identify the individual, that part is surprising.”

Companies that gather information about people’s everyday activities may share or sell this data without users’ knowledge, according to the European Union’s General Data Protection Regulation and the California Consumer Privacy Act. The only caveat is that the information must be anonymized. According to Yves-Alexandre de Montjoye, a computational privacy expert at Imperial College London, some businesses may believe they can achieve this criteria by providing users with pseudonyms. Their findings indicate that this is not the case.

The Anonymity Experiment

Yves-Alexandre de Montjoye and his colleagues proposed that people’s social conduct may be utilized to identify individuals in datasets of anonymous users’ interactions. To put their idea to the test, the researchers trained an artificial neural network — a kind of AI that models the neural circuitry of a real brain — to spot trends in users’ weekly social contacts.

For one test, the team fed the neural network data from an undisclosed mobile phone provider, which tracked the interactions of 43,606 customers over the course of 14 weeks. This information includes the date, time, length, kind of contact (call or text), pseudonyms of the persons engaged, and who started the conversation.

The interaction data for each user was arranged into web-shaped data structures made up of nodes representing the person and their contacts. The nodes were linked together by strings threaded with interaction data. The AI was presented a known person’s interaction web and then turned free to search the anonymized data for the web that bore the most likeness.

When provided interaction webs comprising information about a target’s phone conversations that happened one week after the last records in the anonymous dataset, the neural network only connected 14.7 percent of people to their anonymized self. However, when provided information on not just the target’s interactions but also those of their contacts, it correctly recognized 52.4 percent of individuals.

When the researchers fed the AI interaction data from the target and contacts acquired 20 weeks after the anonymous dataset, the AI correctly recognized individuals 24.3 percent of the time, indicating that social activity may be detected over lengthy periods of time.

To investigate whether the AI could assess social behavior in other contexts, the researchers put it through its paces using a dataset consisting of four weeks of close-proximity data from the cell phones of 587 anonymous university students acquired by Copenhagen researchers. Interaction data comprised students’ pseudonyms, encounter times, and the intensity of the received signal, which indicated closeness to other students. COVID-19 contact tracing software often gather these parameters.

The AI properly recognized students in the sample 26.4 percent of the time when given a target and their contacts’ interaction data.

The results, the researchers warn, are unlikely to apply to Google’s and Apple’s contact tracing techniques, which safeguard users’ privacy by encrypting all Bluetooth information and prohibiting the acquisition of location data.

According to de Montjoye, he expects that the findings will assist policymakers develop measures for protecting users’ identities. According to him, data privacy rules allow for the exchange of anonymized data to enable meaningful research. “However, what’s essential for this to work is to make sure anonymization actually protects the privacy of individuals.”

Tonia Nissen
Based out of Detroit, Tonia Nissen has been writing for Optic Flux since 2017 and is presently our Managing Editor. An experienced freelance health writer, Tonia obtained an English BA from the University of Detroit, then spent over 7 years working in various markets as a television reporter, producer and news videographer. Tonia is particularly interested in scientific innovation, climate technology, and the marine environment.