Exercise app shows why anonymous data can still be dangerous
It wasn't a hack. It wasn't a leak. It wasn't even a mistake, really. But it showed how risky even anonymous data can be.
Strava, which makes a fitness tracking app and website, publicly shared a map of the world, covered in squiggly lines. Each of those lines represented running routes, uploaded by the app's users.
Researchers were able to find or confirm the exact locations of military bases, and suspected CIA black sites, from the exercise trail data uploaded to Strava. This despite the data being "safe" and anonymized: data stripped of identifying information like names and dates. The story blew up after a 20 year old university student studying security revealed the problem on January 27th.
Strava released their global heatmap. 13 trillion GPS points from their users (turning off data sharing is an option). <a href="https://t.co/hA6jcxfBQI">https://t.co/hA6jcxfBQI</a> … It looks very pretty, but not amazing for Op-Sec. US Bases are clearly identifiable and mappable <a href="https://t.co/rBgGnOzasq">pic.twitter.com/rBgGnOzasq</a>
—@Nrg8000
University student Nathan Ruser's tweet revealed the problem
The problem goes well beyond Strava. It also reveals some deep flaws in the way tech companies approach data privacy.
Arvind Narayanan is a computer scientist at Princeton University. His work focuses on digital privacy and security. "There are dozens of companies that have this kind of fine-grained location data about millions of people," Narayanan said. "Strava can be seen as a symptom of a bigger issue, which is the number of companies which have the kind of sensitive data that we're uncomfortable seeing publicly."
There's not a lot of public oversight around these super sensitive databases about billions of people- Arvind Narayanan
In the Strava case, the data was openly published, but user data can also be revealed through hacking. For Arvind Narayanan, this threat is both a security and an oversight issue.
"There's not a lot of public oversight around these super sensitive databases about billions of people, that are held in company servers without a lot of accountability for where those servers are, how they're protected, and how the data collection and use is disclosed to users, and to regulators."
Everybody's behaviour collectively has an impact on everyone else's privacy- Arvind Narayanan
But the problem goes deeper. The Strava issue reflects a broader, misguided approach to data privacy by tech companies. "It's not so much that each individual user's behaviour affects only them, but in fact everybody's behaviour collectively has an impact on everyone else's privacy," Narayanan explained. "Arguing that 'your data is anonymized so you're not going to come to any harm' kind of breaks down here, once we start thinking of privacy as a collective issue."
It may be that the Strava story is a watershed moment in the way we think about data, but that depends on the lessons we take from it. "The right lesson to draw would be that we need to have a more nuanced appreciation of what privacy means," Narayanan said. "It simply cannot be boiled down to anonymity, or putting a bunch of check boxes...for users to figure out…[P]rivacy needs to be really integrated as a core part of the design process."