I recently gave an interview to RT’s Going Underground programme, regarding Facebook tracking its users and non-users throughout the internet, based on the Share and Like buttons found on millions of websites, and what people can do to stay safe.
About two weeks ago KU Leuven University and Vrije Universiteit Brussel in Belgium published a report commissioned by the Belgian Privacy Commission about the tracking behaviour of Facebook on the internet, more specifically how they track their users (and non-users!) through the ‘Like’ buttons and Share buttons that are found on millions of websites across the internet.
The results of the investigation are depressing. It was found that Facebook disregards European and Belgian privacy law in various ways. In fact, 10 legal issues have been found by the commission. Facebook frequently dismisses its own severe privacy violations as “bugs” that are still on the list of being fixed (ignoring the fact that these “bugs” are a major part of Facebook’s business model). This allows them to let various privacy commissioners think that privacy violations are the result of unintended functionality, while in fact it is, the entire business model of Facebook is based on profiling people.
Which law applies?
Facebook also does not recognise the fact that in this case Belgian law applies, and claims that because they have an office in Ireland, that they are only bound by Irish privacy law. This is simply not the case. In fact, the general rule seems to be that if you focus your site on a specific market, (let’s say for example Germany), as evidenced by having a German translation of your site, your site being accessible through a
.de top-level domain, and various other indicators as well (one option could be the type of payment options provided, if your site offers ways to pay for products or services, or maybe marketing materials), then you are bound by German law as well. This is done to protect German customers, in this example case.
The same principle applies to Facebook. They are active world-wide, and so should be prepared to make adjustments to their services such that they comply with the various laws and regulations of all these countries. This is a difficult task, as laws are often incompatible, but it’s necessary to safeguard consumers’ rights. In the case of Facebook, if they would build their Like and Share buttons in such way that they don’t phone home on page load and don’t place cookies without the user’s consent, they would have a lot less legal problems. The easiest way to comply if you run such an international site, is take the strictest legislation, and implement it such that it complies with that.
In fact, the real reason why Facebook is in Ireland is mostly due to tax reasons. This allows them to evade taxes, by means of the Double Irish and Dutch Sandwich financial constructions.
Another problem is that users are not able to prevent Facebook from using the information they post on the social network site for purposes other than the pure social network site functionality. The information people post, and other information that Facebook aggregates and collects from other sources, are used by Facebook for different purposes without the express and knowing consent of the people concerned.
The problem with the ‘Like’ button
Special attention was given to the ‘Like’ and ‘Share’ buttons found on many sites across the internet. It was found that these social sharing plugins, as Facebook calls them, place a uniquely identifying cookie on users’ computers, which allows Facebook to then correlate a large part of their browsing history. Another finding is that Facebook places this uniquely identifying
datr cookie on the European Interactive Digital Advertising Alliance opt-out site, where Facebook is listed as one of the participants. It also places an
oo cookie (which presumably stands for “opt-out“) once you opt out of the advertising tracking. Of course, when you remove this cookie from your browser, Facebook is free to track you again. Also note that it does not place these cookies on the US or Canadian opt-out sites.
As I’ve written earlier in July 2013, the problem with the ‘Like’ button is that it phones home to Facebook without the user having to interact with the button itself. The very act of it loading on the page means that Facebook gets various information from users’ browsers, such as the current page visited, a unique browser identifying cookie called the
datr cookie, and this information allows them to correlate all the pages you visit with your profile that they keep on you. As the Belgian investigators confirmed, this happens even when you don’t have an account with Facebook, when it is deactivated or when you are not logged into Facebook. As you surf the internet, a large part of your browsing history gets shared with Facebook, due to the fact that these buttons are found everywhere, on millions of websites across the world.
The Filter Bubble
A major problem of personalisation technology, like used by Facebook, but also Google, and others, is that it limits the information users are exposed to. The algorithm learns what you like, and then subsequently only serves you information that you’re bound to like. The problem with that is, that there’s a lot of information that isn’t likeable. Information that isn’t nice, but still important to know. And by heavily filtering the input stream, these companies influence our way of how we think about the world, what information we’re exposed to, etc. Eli Pariser talks about this effect in his book The Filter Bubble: What the Internet is Hiding From You, where he did a Google search for ‘Egypt’ during the Egyptian revolution, and got information about the revolution, news articles, etc. while his friend only got information about holidays to Egypt, tour operators, flights, hotels, etc. This is a vastly different result for the exact same search term. This is due to the heavy personalisation going on at Google, where algorithms refine what results you’re most likely to be interested in, by analysing your previously-entered search terms.
The same happens at Facebook, where they control what you see in your news feed on the Facebook site, based on what you like. Problem is that by doing that a few times, soon you’re only going to see information that you like, and no information that’s important, but not likeable. This massively erodes the eventual value that Facebook is going to have, since eventually, all Facebook will be is an endless stream of information, Facebook posts, images, videos that you like and agree with. It becomes an automatic positive feedback machine. Press a button, and you’ll get a cookie.
What value does Facebook then have as a social network, when you never come in touch with radical ideas, or ideas that you initially do not agree with, but that may alter your thinking when you come in touch with them? By never coming in touch with extraordinary ideas, we never improve. And what a poor world that would be!
I gave my talk about privacy by design last Saturday at eth0 2014 winter edition, a small hacker get-together which was organised in Lievelde, The Netherlands this year. eth0 organizes conferences that aim at bringing people with different computer-related interests together. They organise two events per year, one during winter. I’ve previously given a very similar talk at the OHM2013 hacker conference which was held in August 2013.
Here’s the footage of my talk:
I talked about privacy by design, and what I did with relation to Annie Machon‘s site and recently, the Sam Adams Associates for Integrity in Intelligence site. The talk consists of 2 parts, in the first part I explained what we’re up against, and in the second part I explained the 2 sites in a more specific case study.
I talked about the revelations about the NSA, GCHQ and other intelligence agencies, about the revelations in December, which were explained eloquently by Jacob Applebaum at 30C3 in Hamburg in December. Then I moved on to the threats to website visitors, how profiles are being built up and sold, browser fingerprinting. The second part consists of the case studies of both Annie Machon’s website, and the Sam Adams Associates’ website.
I’ve mentioned the Sam Adams Associates for Integrity in Intelligence, for whom I had the honour to make their website so they could have a more public space where they could share things relating to the Sam Adams Award with the world, and also to provide a nice overview of previous laureates and what their stories are.
One of the things both sites have in common is the hosting on a Swiss domain, which provides for a safer haven where content may be hosted safely without fear of being taken down by the U.S. authorities. The U.S. claims jurisdiction on the average .com, .net, .org domains etc. and there have been cases where these have been brought down because it hosted content the U.S. government did not agree with. Case in point: Richard O’Dwyer, a U.K. citizen, was threatened with extradition to the United States for being the man behind TVShacks, which was a website that provided links to copyrighted content. MegaUpload, the file locker company started by Kim Dotcom, was given the same treatment, where if you would visit their domain, you were served an image from the FBI telling you the domain had been seized.
Recently I came across an article about Facebook, more specifically, that Facebook wants to know why you self-censor, in other words, why you didn’t click Publish on that status update you just wrote, but decided not to publish instead. It turns out Facebook is sending everything you type in the Post textarea box (the one with the “What’s on your mind?” placeholder), to Facebook servers. According to two Facebook scientists quoted in the article: Sauvik Das, PhD student at Carnegie Mellon and summer software engineer intern, and Adam Kramer, a data scientist, they only send back information to Facebook’s servers that indicate whether you self-censored, not the actual text you typed. They wrote an article entitled Self-Censorship on Facebook (PDF, copy here) in which they explain the technicalities.
It turns out this claim that they only send metadata back, not the actual text you type is not entirely true. I wanted to confirm whether they really don’t send what you type to Facebook before you hit Publish, so I fired up Facebook and logged in. I opened up my web inspector and started monitoring requests to/from my browser. When I typed a few letters I noticed that the site makes a GET request to the URL
/ajax/typeahead/search.php with parameters
value=[your search string]&__user=[your Facebook user id] (there are more parameters, but these are the most important for the purposes of this article). The
search.php script probably parses what you typed in order to find contacts that it can then show to you as autocomplete options (for tagging purposes).
Now, the authors of the article actually gathered their data in a slightly different way. They monitored the Post textarea box, and the comment box, and if more than 5 characters were typed in, it would say you self-censored if you didn’t publish that post or comment in the next 10 minutes. So in their methodology, no actual textual content was needed. But it turns out, as my quick research shows above, that your comments and posts actually do get send to Facebook before you click Publish, and even before 5 characters are typed. This is done with a different purpose (searching matches in your contacts for tagging etc.), but clearly this data is received by Facebook. What they subsequently do with it besides providing autocomplete functionality is anyone’s guess. Given that the user ID is actually sent together with the typed in text to the
search.php script may suggest that they associate your profile with the typed in text, but there’s no way to definitively prove that.
When I read through the article, one particular sentence in the introduction stood out to me as bone-chilling:
“(…) Last-minute self-censorship is of particular interest to SNSs [social networking sites] as this filtering can be both helpful and hurtful. Users and their audience could fail to achieve potential social value from not sharing certain content, and the SNS [social networking site] loses value from the lack of content generation. (…)”
“loses value from the lack of content generation.” Let that sink in. When you stop from posting something on Facebook, or re-write it, Facebook considers that a bad thing, as something that removes value from Facebook. The goal of Facebook is to sell detailed profiling information on all of us, even those of us wise enough not to have a Facebook account (through tagging and e-mail friend-finder functionality).
Big Data and Big Brother
And it isn’t just Facebook, it’s basically every social network and ad provider. There’s an entire industry of big data brokers, with companies most of us have never heard of, like Axciom for instance, but there are many others like it, who thrive on selling profiles and associated services. Advertising works best if it is specific, and plays into users’ desires and interests. This is also the reason why, for this to be successful, companies like Facebook need as much information on people as possible, to better target their clients’ ads. And the best way is to provide a free service, like a social network, enticing people to share their lives through this service, and then you can provide really specific targeting to your clients. This is what these companies thrive on.
The bigger problem is that we have no influence on how our data gets used. People claiming they have nothing to hide, and do nothing wrong, forget that they don’t decide on what constitutes criminal behavior, it’s the state making that decision for them. And what will happen when you are suddenly faced with a brutal regime that abuses all the information and data they got on you? Surely we want to prevent this.
This isn’t just a problem in the technology industry, and business, but a problem with governments as well. The NSA and GCHQ, in cooperation with other intelligence agencies around the world are collecting data on all of us, but without providing us, the people, the possibility of appeal, and correction of erroneous data. We have no influence on how this data gets used, who will be seeing it, how it might get interpreted by others, et cetera. The NSA is currently experiencing the same uneasiness as the rest of us, as they have no clue how much or what information Edward Snowden might have taken with him, and how it might be interpreted by others. It’s curious that they now complain about this same problem that the rest of us have been experiencing for years; a problem that NSA partly created by overclassifying information that didn’t need to be kept secret. Of course there is information that needs to be kept secret, but the vast majority of information that now gets rubber stamped with the TOP SECRET marking, is information that is of no threat to national security if it were known to the public, but more likely information that might embarrass top officials.
We need to start implementing proper oversight to the secret surveillance states we are currently subjected to in a myriad of countries around the world, and take back powers that were granted to them, and subsequently abused by them, if we want to continue to live in a free world. For I don’t want to live in a Big Brother state, do you?
(Note: A version of this article also got published on Consortium News) In the last 6 months or so, Edward Snowden, former NSA contractor, came forward with revelations about the NSA, disclosing quite a few of the agency’s surveillance programs, and revealing that the agency has the most blatant disrespect for civil rights and spies on everything and everyone, all over the world, in a Pokémon-style “Gotta catch ’em all!” fashion. The actions of the NSA are also having a real effect on the United States economy. Let’s talk about the economic consequences the NSA’s surveillance programs will have on the United States economy, and, more specifically, its tech industry. The actions of the US administration, and more specifically what the NSA is doing with their surveillance programs, are having a big impact on the US economy, especially in Silicon Valley. Why would I store my data on servers in the United States, where this data is easily accessible by the NSA, among others, if I can just as easily store it in Europe or some other, more secure place?
A Positive Investment Climate
To understand the US hegemony when it comes to IT companies and services, it is good to have a look at the history of the investment climate. Why did these companies pop up in the United States? Why wasn’t Google invented in, say, Germany, or Finland? The reason many of these cloud storage services and internet companies popped up in Silicon Valley as opposed to Europe, say, is because of the investment climate in the United States, which made it much easier to start an internet company in the United States. Large institutional investors, venture capitalists, are less likely to invest in a start-up in Europe. Also, bankruptcy laws are much more relaxed in the US as opposed to Europe. Whereas in the US, you can be back on your feet in a year or so after going bankrupt, in Europe, this is generally a much longer process. According to the Economist, it takes a minimum of 2 years in Spain, 6 years in Germany, and a whopping 9 years in France. In my own country, The Netherlands, it takes 3 years to be debt-free again after a bankruptcy, but if you go bankrupt in Paris, good luck, you’ve just ruined your future. This makes it far more risky to try new things and set up shop in Europe, because the consequences if things go bad are so much worse. Unfortunately, this has left us Europeans in the position that we currently don’t really have a European ‘Silicon Valley’, we don’t have a lot of viable, easy to use alternatives, and these desperately need to get developed. We depend too much on American companies right now, and I think it’s good if we diversified more, so that we will get a healthy market with plenty of good alternatives, instead of what we have now, which is a US monopoly on web-mail (Gmail/Hotmail etc.), social networks (Facebook, Twitter, LinkedIn, Foursquare, etc.), internet search (Google), cloud storage (Dropbox, Microsoft, Amazon), and other things. Already, cloud storage providers in Silicon Valley currently see big drops in their revenues because of the disclosures of Snowden. Why would we store our data across the pond? This is the central question and this is having real economic consequences for the United States.
US Cloud Service Providers Face Economic Consequences
Cloud providers based in the US were experiencing significant profit drops when the NSA revelations were made public. People outside the United States suddenly began to question whether their sensitive data was safe on American soil. All these companies are subject to the PATRIOT Act, which requires them to hand over any information and data they have on their customers, and they are prohibited by the US government to tell their customers about it. So the conclusion can quite definitively be that no, your data cannot be trusted to stay secure if you send it over to the United States, by using ‘convenient’ cloud services like Dropbox, or Amazon, among others.
This is the critical criterion. It doesn’t matter that the company tells you that they use the most high-end military-grade encryption, it doesn’t matter that they thought of an interesting technical solution to try and circumvent surveillance, it doesn’t matter that they write glowing blog posts solemnly promising not to hand over your data, all that matters is that it is a US company, required to obey US law, and required to hand over your data. Few companies will be able to resist the pressure and forfeit their entire business model to protect your privacy. This is also what strikes me as funny when I read about major US tech companies, like Google, Apple and Microsoft, who found out that their server-to-server connections were being intercepted by NSA. These intra-server connections were not encrypted, sent in the clear, probably on some private fibre optic cable. Of course this could be intercepted given the NSA’s technical competence. So now these companies are trying really hard to sell the story to their overseas customers that their intra-server communications are now fully encrypted. This is a feeble attempt to keep some of their customers from switching to alternatives (of which there are not many, unfortunately), as these companies are still US companies, with offices and infrastructure in the US, and the need to obey the laws over there. So it’s totally irrelevant that these tech companies are now encrypting their intra-server communications, as the US government can simply request the data via other, more official means. But these companies aren’t just promoting irrelevant measures, they actively act against our interests. After the revelations done by Edward Snowden, Facebook is making data hand-offs to US authorities easier (fully automated, without judicial oversight). Facebook is also partnering with police to make protests harder to organise. And still we insist in using its social network. These are instruments of control and surveillance. We’re not their customers, we’re the product being sold. We have a distinct lack of viable alternatives which aren’t based in the US, and it’s important to remember that social networks have a social aspect. It isn’t enough for you to change over to a competitor, you have to convince your friends to switch as well. This is what keeps social networks afloat for so long, because this is indeed very hard to do.
March to Irrelevance
In October 2013, Congress raised the debt ceiling again, which will buy some time until January 2014. Then they will have the exact same problem. The United States is structurally spending more money than they have available, and current US national debt ($17 trillion dollars) can never be repaid. They are pretty much already in default. But since the financial system is based on trust and hearsay, smoke and mirrors, it takes a while for people to face the reality, wake up and smell the coffee. At which point the United States will be an irrelevant relic from the past. Here in Europe, we need to protect our own citizens’ interests, and start developing viable alternatives for the US hegemony, because the US hegemony will be over one day.
A lot of data about you and your Internet behavior gets collected when you simply surf the Internet ‘unprotected’. We are currently living in a time when data profiling and getting to know your customers is getting more and more important. In this article I will explore the consequences of data sharing, browser tracking and profiling on the Internet, why it isn’t a good idea to share too much data about yourself, and some of the things we can do as a community.
Data Collection: What Is It?
There are companies out there, like Acxiom (link to Wikipedia) for example, who live on nothing else but to sell your information to other companies who may find use for it. These companies get their data from you. Your browser, or the social networks you’re a part of. Your movements across the Internet are tracked and recorded as well. One of the most ubiquitous form of tracking on the Internet, next to ad networks, is the tracking done by social networks. These networks have convenient ‘share’ or ‘like’ buttons which can be found on millions of websites across the Internet. Simply by visiting these websites with an unprotected web browser, data gets sent to these social network sites. Data about your browser brand/make/version, the OS you use, the country you’re from, sometimes even down to the actual locality, but also your IP address and the URL of the site you visited. So they know your actual surfing behavior, since these buttons are found on many sites. Nearly a quarter of the top 10,000 websites have Facebook integration, for instance. And this is data from last year, I’m sure the number is higher today. Another way of profiling is done via ad networks. Because it is inconvenient to manage your own advertising when you are just looking to make some money out of your website, this often gets outsourced to companies who specialize in advertising. And these companies will then serve you ads from their servers when you visit a site that is using it. Because this is all a single point where this data gets collected and indexed, you can imagine that these companies know quite a lot about peoples’ surfing behavior. And this collecting of data, the profiling and tracking of people across the Internet gets done without your knowledge or consent. Now, of course they claim that this is done to better target their ads, so you get served ads aimed specifically at your current interests and your geographic location or linguistic background. And this is true, the more they know about you, the better they can target ads. But this information is worth a lot of money to marketers, who are always on the lookout for ways to target and market their products to just the right audiences, because this will increase the likelihood people will click on their ads and buy their stuff. And this information gets collected centrally, at only a few companies who specialize in this. Most of us make use of content delivery networks hosted in the United States, implement social media integration et cetera and are thereby facilitating easy data collection by these companies. This centralization means that there are only a few companies out there that own a majority of the market share in this business. You can imagine that the amount of data they collect about a single person is quite substantial indeed. And of course, intelligence agencies like the NSA have access too, as seen by the revelations done by Edward Snowden in recent months. Many people don’t know the sheer extent of the data collection done, and the potential consequences that it can have if it’s misinterpreted.
Consequences of Overzealous Data Collection
The main problem with data collection is that data is often misinterpreted, interpreted without context, and there can be serious consequences if this happens to you. The companies using your data infer certain things about you and your behavior based on this data alone. They profile you. However, their assessment is often wrong. The more data you share, the more problematic this can be eventually. A recent example of a serious consequence is that having certain friends on Facebook can actually change your credit score. These companies base this credit score correction on your friends on Facebook. So if you have a lot of friends with questionable credit histories, you may be denied a mortgage or a credit card. Even when you always make sure you never miss a payment. Search engines knowing your search history have access to something very private indeed: you are revealing what you think at that very moment. What things you are likely interested in. This is exactly the reason why this information is so valuable in the hands of advertising companies, so they can adjust their campaigns to make it more likely that they’ll persuade you to click one of their ads. Search engine history also shares your mental state at that very moment, which, together with information on the groceries you buy at the supermarket for instance, can be very valuable information to your health insurance company. It is not inconceivable that insurance companies will be adjusting your premiums based on the food you eat, whether you have a gym membership, whether you smoke or drink alcohol, or whether your search engine history shows that you have an increased risk of depression. Do we really want that? This can potentially lead to some very bad consequences indeed, not just financially. You can also imagine health insurance companies rejecting you for insurance because of your unhealthy lifestyle, car rental companies rejecting you because of the recent fines you received, et cetera. These conclusions get drawn without our knowledge or consent; usually we don’t even know where these companies get the data on which they base their decisions from, and there’s not much we can do about it. The only way to prevent this is by starting to become more aware of what your data is worth to someone else, why it is in their interest to have access to this data, and whether you really want to give them access. And, on the other hand, by starting to think what we as programmers and hackers can do ourselves, by starting to build systems with privacy in mind from the start.
Privacy By Design
What we need to better protect our privacy on the Internet, next to browser add-ons like Ghostery and NoScript, is a change in mentality. We need systems that are built from the ground up with privacy in mind: privacy by design. Think about how much data you really need in order to complete the task at hand. When you’re building forms for your users to fill in, don’t require them to fill in data that isn’t absolutely necessary to complete the current task. So don’t ask your customers for a phone number when an e-mail address will do. Don’t ask them to put in their mail address when you don’t need it to send packages etc. Don’t ask them for their real name either when this isn’t necessary (and usually it isn’t). The reason why we want to limit available data is because this data can come back to bite you later on, as I’ve explained above. This will also protect your business more against cybercriminals looking for personal data to steal, as they cannot steal what isn’t there. Identity theft will also be harder when you’re very selective with who you share your data. If we teach people how to protect their data on the Internet, how to be ‘street smart’ on the Internet so to speak, we will increase their overall security on the Internet, and this is something that is very much necessary nowadays.