Choose Your Friends Wisely: Tracking & Profiling on the Web

Note: This article is also available in Portuguese, translated by Anders Bateva.

A lot of data about you and your Internet behavior gets collected when you simply surf the Internet ‘unprotected’. We are currently living in a time when data profiling and getting to know your customers is getting more and more important. In this article I will explore the consequences of data sharing, browser tracking and profiling on the Internet, why it isn’t a good idea to share too much data about yourself, and some of the things we can do as a community.

Data Collection: What Is It?

There are companies out there, like Acxiom (link to Wikipedia) for example, who live on nothing else but to sell your information to other companies who may find use for it. These companies get their data from you. Your browser, or the social networks you’re a part of. Your movements across the Internet are tracked and recorded as well. One of the most ubiquitous form of tracking on the Internet, next to ad networks, is the tracking done by social networks. These networks have convenient ‘share’ or ‘like’ buttons which can be found on millions of websites across the Internet. Simply by visiting these websites with an unprotected web browser, data gets sent to these social network sites. Data about your browser brand/make/version, the OS you use, the country you’re from, sometimes even down to the actual locality, but also your IP address and the URL of the site you visited. So they know your actual surfing behavior, since these buttons are found on many sites.  Nearly a quarter of the top 10,000 websites have Facebook integration, for instance.  And this is data from last year, I’m sure the number is higher today. Another way of profiling is done via ad networks. Because it is inconvenient to manage your own advertising when you are just looking to make some money out of your website, this often gets outsourced to companies who specialize in advertising. And these companies will then serve you ads from their servers when you visit a site that is using it. Because this is all a single point where this data gets collected and indexed, you can imagine that these companies know quite a lot about peoples’ surfing behavior. And this collecting of data, the profiling and tracking of people across the Internet gets done without your knowledge or consent. Now, of course they claim that this is done to better target their ads, so you get served ads aimed specifically at your current interests and your geographic location or linguistic background. And this is true, the more they know about you, the better they can target ads. But this information is worth a lot of money to marketers, who are always on the lookout for ways to target and market their products to just the right audiences, because this will increase the likelihood people will click on their ads and buy their stuff. And this information gets collected centrally, at only a few companies who specialize in this. Most of us make use of content delivery networks hosted in the United States, implement social media integration et cetera and are thereby facilitating easy data collection by these companies. This centralization means that there are only a few companies out there that own a majority of the market share in this business. You can imagine that the amount of data they collect about a single person is quite substantial indeed. And of course, intelligence agencies like the NSA have access too, as seen by the revelations done by Edward Snowden in recent months. Many people don’t know the sheer extent of the data collection done, and the potential consequences that it can have if it’s misinterpreted.

Consequences of Overzealous Data Collection

The main problem with data collection is that data is often misinterpreted, interpreted without context, and there can be serious consequences if this happens to you. The companies using your data infer certain things about you and your behavior based on this data alone. They profile you. However, their assessment is often wrong. The more data you share, the more problematic this can be eventually. A recent example of a serious consequence is that having certain friends on Facebook can actually change your credit score. These companies base this credit score correction on your friends on Facebook. So if you have a lot of friends with questionable credit histories, you may be denied a mortgage or a credit card. Even when you always make sure you never miss a payment. Search engines knowing your search history have access to something very private indeed: you are revealing what you think at that very moment. What things you are likely interested in. This is exactly the reason why this information is so valuable in the hands of advertising companies, so they can adjust their campaigns to make it more likely that they’ll persuade you to click one of their ads. Search engine history also shares your mental state at that very moment, which, together with information on the groceries you buy at the supermarket for instance, can be very valuable information to your health insurance company. It is not inconceivable that insurance companies will be adjusting your premiums based on the food you eat, whether you have a gym membership, whether you smoke or drink alcohol, or whether your search engine history shows that you have an increased risk of depression. Do we really want that? This can potentially lead to some very bad consequences indeed, not just financially. You can also imagine health insurance companies rejecting you for insurance because of your unhealthy lifestyle, car rental companies rejecting you because of the recent fines you received, et cetera. These conclusions get drawn without our knowledge or consent; usually we don’t even know where these companies get the data on which they base their decisions from, and there’s not much we can do about it. The only way to prevent this is by starting to become more aware of what your data is worth to someone else, why it is in their interest to have access to this data, and whether you really want to give them access. And, on the other hand, by starting to think what we as programmers and hackers can do ourselves, by starting to build systems with privacy in mind from the start.

Privacy By Design

What we need to better protect our privacy on the Internet, next to browser add-ons like Ghostery and NoScript, is a change in mentality. We need systems that are built from the ground up with privacy in mind: privacy by design. Think about how much data you really need in order to complete the task at hand. When you’re building forms for your users to fill in, don’t require them to fill in data that isn’t absolutely necessary to complete the current task. So don’t ask your customers for a phone number when an e-mail address will do. Don’t ask them to put in their mail address when you don’t need it to send packages etc. Don’t ask them for their real name either when this isn’t necessary (and usually it isn’t). The reason why we want to limit available data is because this data can come back to bite you later on, as I’ve explained above. This will also protect your business more against cybercriminals looking for personal data to steal, as they cannot steal what isn’t there. Identity theft will also be harder when you’re very selective with who you share your data. If we teach people how to protect their data on the Internet, how to be ‘street smart’ on the Internet so to speak, we will increase their overall security on the Internet, and this is something that is very much necessary nowadays.