Law in the Internet Society

Data Mining and Information Sharing

-- By JanethLopez - 16 Oct 2012

Introduction

There’s a diminished expectation of privacy online that most internet users have already come to acknowledge and accept. The use of many products online – email, games, maps, music and video streaming – comes at a price. What we watch, what we buy, who we email and chat with, where we go is all tracked by a computer’s browser or a mobile phone. Many people seem to be willing to sacrifice their right to privacy in exchange for "free" online services. The problem is not that users are unaware that information sharing and data mining occurs, it is that they assume the purchasers of the information are advertisers, who are not looking to personally identify each user, but instead are aggregating information from millions of users to know how to sell products efficiently. While the harm seems minimal when information collected is used for targeted advertising, when data is shared with and abused by credit card companies, insurance companies, and government agencies, not for the purposes of advertising, but for surveillance and risk management, the privacy concerns raised are too great to ignore.

Information Sharing for Targeted Advertising

Companies gather information on the online movements of internet users in order to target ads that they believe these users are statistically more likely to click. The data collected provides an overall view of the user's digital self. For example, Google’s Privacy Policy and Terms of Service allows Google to share information taken from all of its services, aggregating information on a user as Google tracks her every move online. Google reads information from Gmail, videos watched on Youtube, appointments on Google Calendar, gps information when using Google Maps, Google Search terms, purchases made with Google Wallet, mobile purchases of music and games via Google Play, etc. and ties the data together to get an overall view of a user’s interests in order to provide targeted advertising.

Attention is a scarce commodity, and internet users understand that advertisers need to know what their interests are, what they buy, and how much they are willing to spend in order to target the right products. Google assures customers that no humans read emails to provide the ads (Gmail Ads) and that it never sells or shares information that personally identifies them for marketing purposes. Of course, these policies could change, but for now, providing this information is a price people (myself included) are willing to pay for the convenience of integrated services, like Google Maps working with Google Calendar to indicate how much traffic there is to a user's next appointment. It’s spying, yes, but who cares if a bot somewhere reads my search terms and a “relevant” ad appears on the side of my screen? The practice seems harmless. Advertisers are not interested in finding out who I am, just what they can sell me.

The greater problem is the amount of information that is being stored forever and what, in additional to advertising, could be done with the information in the future. Web searches, gps location, calendar appointments, emails, and purchases, when mixed with the abundance of information available on social media sites, including private conversations, family photos, employment information, and details on significant events (births, deaths, marriages) give a far too detailed view of an individual. The danger, which is ignored by most, is the risk that this extensive data set could easily be shared with those who are interested in personally identifying users.

Information Sharing with Banks, Insurance Companies, and Government Agencies

An aspect of this course that has made me uneasy is discussion of the ways in which this data mining can affect online users, not because of the data shared with advertisers, but because of the data being inferred from activity sold to insurance firms and banks or shared with government agencies that will later make assumptions or predictions about my lifestyle choices. This kind of information sharing has a far more direct effect on my life than targeted advertising.

Even putting aside the clear privacy issue of having personal information sold without consent, there are far too many possibilities for abuses and false positives. If a user consistently researches a specific medical condition for a family member or purchases wine with her credit card for a group’s weekly wine and cheese, could this activity reach health or life insurance companies, affecting premiums? What if online searches for a bankruptcy lawyer reaches a bank or credit card company? An alternative concern is that, as a greater online presence has come to be the norm and not the exception, a refusal to provide information online could be regarded as suspicious. Right now, insurance companies and banks are interested in inserting code into Facebook and Twitter to find clues about policyholders or to find promising leads (Economist). What happens if a credit card company or health insurance company decides that a particular individual is actually a higher risk simply because she does not have the kind of online presence that would provide relevant information on her activities?

Many online companies, including Facebook, Microsoft, and Google, cooperate with government agencies' request for user data. Google seems to be the most transparent, releasing information on the amount of government requests and admitting to complying either fully or partially with over 93% of these government requests in the United States ([http://www.google.com/transparencyreport/userdatarequests/][User Data Requests - Second Half 2011]]). The number of requests not only for user information but for the removal of content has increased steadily in the past few years, raising concerns about how Google (and presumably other online companies monitoring online movements) responds to requests that violate users' rights to privacy and free expression (Google Transparency).

Conclusion

In order to make a conscious decision about how much to disclose and to whom, users need to know what kind of information is being collected and to whom it is being sold. At the very least, if assumptions are being made about a user by a bank or insurance company based on online activity or offline purchases, they should be made aware of what information is being used, be provided with a copy of the data collected on them, and be given a chance to challenge the assumptions, much in the same way we are able to look into credit reports when denied credit.


You are entitled to restrict access to your paper if you want to. But we all derive immense benefit from reading one another's work, and I hope you won't feel the need unless the subject matter is personal and its disclosure would be harmful or undesirable. To restrict access to your paper simply delete the "#" character on the next two lines:

Note: TWiki has strict formatting rules for preference declarations. Make sure you preserve the three spaces, asterisk, and extra space at the beginning of these lines. If you wish to give access to any other users simply add them to the comma separated ALLOWTOPICVIEW list.

Navigation

Webs Webs

r2 - 16 Oct 2012 - 14:38:39 - JanethLopez
This site is powered by the TWiki collaboration platform.
All material on this collaboration platform is the property of the contributing authors.
All material marked as authored by Eben Moglen is available under the license terms CC-BY-SA version 4.
Syndicate this site RSSATOM