I recently gave an interview to RT’s Going Underground programme, regarding Facebook tracking its users and non-users throughout the internet, based on the Share and Like buttons found on millions of websites, and what people can do to stay safe.
About two weeks ago KU Leuven University and Vrije Universiteit Brussel in Belgium published a report commissioned by the Belgian Privacy Commission about the tracking behaviour of Facebook on the internet, more specifically how they track their users (and non-users!) through the ‘Like’ buttons and Share buttons that are found on millions of websites across the internet.
The results of the investigation are depressing. It was found that Facebook disregards European and Belgian privacy law in various ways. In fact, 10 legal issues have been found by the commission. Facebook frequently dismisses its own severe privacy violations as “bugs” that are still on the list of being fixed (ignoring the fact that these “bugs” are a major part of Facebook’s business model). This allows them to let various privacy commissioners think that privacy violations are the result of unintended functionality, while in fact it is, the entire business model of Facebook is based on profiling people.
Which law applies?
Facebook also does not recognise the fact that in this case Belgian law applies, and claims that because they have an office in Ireland, that they are only bound by Irish privacy law. This is simply not the case. In fact, the general rule seems to be that if you focus your site on a specific market, (let’s say for example Germany), as evidenced by having a German translation of your site, your site being accessible through a
.de top-level domain, and various other indicators as well (one option could be the type of payment options provided, if your site offers ways to pay for products or services, or maybe marketing materials), then you are bound by German law as well. This is done to protect German customers, in this example case.
The same principle applies to Facebook. They are active world-wide, and so should be prepared to make adjustments to their services such that they comply with the various laws and regulations of all these countries. This is a difficult task, as laws are often incompatible, but it’s necessary to safeguard consumers’ rights. In the case of Facebook, if they would build their Like and Share buttons in such way that they don’t phone home on page load and don’t place cookies without the user’s consent, they would have a lot less legal problems. The easiest way to comply if you run such an international site, is take the strictest legislation, and implement it such that it complies with that.
In fact, the real reason why Facebook is in Ireland is mostly due to tax reasons. This allows them to evade taxes, by means of the Double Irish and Dutch Sandwich financial constructions.
Another problem is that users are not able to prevent Facebook from using the information they post on the social network site for purposes other than the pure social network site functionality. The information people post, and other information that Facebook aggregates and collects from other sources, are used by Facebook for different purposes without the express and knowing consent of the people concerned.
The problem with the ‘Like’ button
Special attention was given to the ‘Like’ and ‘Share’ buttons found on many sites across the internet. It was found that these social sharing plugins, as Facebook calls them, place a uniquely identifying cookie on users’ computers, which allows Facebook to then correlate a large part of their browsing history. Another finding is that Facebook places this uniquely identifying
datr cookie on the European Interactive Digital Advertising Alliance opt-out site, where Facebook is listed as one of the participants. It also places an
oo cookie (which presumably stands for “opt-out“) once you opt out of the advertising tracking. Of course, when you remove this cookie from your browser, Facebook is free to track you again. Also note that it does not place these cookies on the US or Canadian opt-out sites.
As I’ve written earlier in July 2013, the problem with the ‘Like’ button is that it phones home to Facebook without the user having to interact with the button itself. The very act of it loading on the page means that Facebook gets various information from users’ browsers, such as the current page visited, a unique browser identifying cookie called the
datr cookie, and this information allows them to correlate all the pages you visit with your profile that they keep on you. As the Belgian investigators confirmed, this happens even when you don’t have an account with Facebook, when it is deactivated or when you are not logged into Facebook. As you surf the internet, a large part of your browsing history gets shared with Facebook, due to the fact that these buttons are found everywhere, on millions of websites across the world.
The Filter Bubble
A major problem of personalisation technology, like used by Facebook, but also Google, and others, is that it limits the information users are exposed to. The algorithm learns what you like, and then subsequently only serves you information that you’re bound to like. The problem with that is, that there’s a lot of information that isn’t likeable. Information that isn’t nice, but still important to know. And by heavily filtering the input stream, these companies influence our way of how we think about the world, what information we’re exposed to, etc. Eli Pariser talks about this effect in his book The Filter Bubble: What the Internet is Hiding From You, where he did a Google search for ‘Egypt’ during the Egyptian revolution, and got information about the revolution, news articles, etc. while his friend only got information about holidays to Egypt, tour operators, flights, hotels, etc. This is a vastly different result for the exact same search term. This is due to the heavy personalisation going on at Google, where algorithms refine what results you’re most likely to be interested in, by analysing your previously-entered search terms.
The same happens at Facebook, where they control what you see in your news feed on the Facebook site, based on what you like. Problem is that by doing that a few times, soon you’re only going to see information that you like, and no information that’s important, but not likeable. This massively erodes the eventual value that Facebook is going to have, since eventually, all Facebook will be is an endless stream of information, Facebook posts, images, videos that you like and agree with. It becomes an automatic positive feedback machine. Press a button, and you’ll get a cookie.
What value does Facebook then have as a social network, when you never come in touch with radical ideas, or ideas that you initially do not agree with, but that may alter your thinking when you come in touch with them? By never coming in touch with extraordinary ideas, we never improve. And what a poor world that would be!
Good news on privacy protection for once: after an 11 March 2015 ruling of the Court of The Hague in the Netherlands in the case of the Privacy First Foundation c.s. versus The Netherlands, the court decided to strike down the Dutch data retention law. The law required telecommunication providers and ISPs to store communication and location data from everyone in the Netherlands for a year. The court based its decision on the reasoning that a major privacy infringement of this magnitude needs proper safeguards. The safeguards that were put in place were deemed insufficient by the court. There is too much room for abuse of power in the current law, which was the reason for the The Hague Court to strike it down, effective immediately.
The question remains what will happen now. The law has been struck down, so it seems logical to scrap it entirely. Whether that will happen, or whether the decision stands should the Ministry of Security and Justice appeal the decision, time will tell.
A few weeks ago, I was in London at the Logan Symposium 2014, which was held at the Barbican Centre in London from 5 to 7 December 2014. During this event, I gave a talk entitled: “Security Dilemmas in Publishing Leaks.” (slides, PDF) The event was organised by the Centre for Investigative Journalism in London.
The audience was a switched-on crowd of journalists and hacktivists, bringing together key figures in the fight against invasive surveillance and secrecy. and it was great to be there and to be able to provide some insights and context from a technological perspective.
Let’s talk a little bit about the rapid proliferation of the so-called Internet of Things (IoT). The Internet of Things is a catch-all term for all sorts of embedded devices that are hooked up to the internet in order to make them “smarter,” able to react to certain circumstances, automate things etcetera. This can include many devices, such as thermostats, autonomous cars, etc. There’s a wide variety of possibilities, and some of them, like smart thermostats are already on the market, with autonomous cars following closely behind.
According to the manufacturers who are peddling this technology, the purpose of hooking these devices up to the internet is to be able to react better and provide more services that were previously impossible to execute. An example would be a thermostat that recognises when you are home, and subsequently raises the temperature of the house. There are also scenarios possible of linking various IoT devices together, like using your autonomous car to recognise when it is (close to) home and then letting the thermostat automatically increase the temperature, for instance.
There are myriad problems with this technology in its current form. Some of the most basic ones in my view are privacy and security considerations. In the case of cars, Ford knows exactly where you are at all times and knows when you are breaking the speed limit by using the highly-accurate GPS that’s built into modern Ford cars. This technology is already active, and if you drive one of these cars, this information (your whereabouts at all times, and certain metrics about the car, like the current speed, mileage, etc.) are stored and sent to Ford’s servers. Many people don’t realise this, but it was confirmed by Ford’s Global VP of Marketing and Sales, Jim Farley at a CES trade show in Las Vegas at the beginning of this year. Farley later retracted his statements after the public outrage, claiming that he left the wrong impression and that Ford does not track the locations of their cars without the owners’ consent.
Google’s $3.2 billion acquisition
Nest Labs, Inc. used to be a separate company making thermostats and smoke detectors, until Google bought it for a whopping $3.2 billion dollars. The Nest thermostat is a programmable thermostat that has a little artificial intelligence inside of it that enables it to learn what temperatures you like, turns the temperature up when you’re at home and turns it down when you’re away. It can be controlled via WiFi from anywhere in the world via a web interface. Users can log in to their accounts to change temperature, schedules, and see energy usage.
Why did Google pay such an extraordinary large amount for a thermostat company? I think it will be the next battleground for Google to gather more data, the Internet of Things. Things like home automation and cars are markets that Google has recently stepped into. Technologies like Nest and Google’s driver-less car are generating massive amounts of data about users’ whereabouts and things like sleep/wake cycles, patterns of travel and usage of energy, for instance. And this is just for the two technologies that I have chosen to focus my attention on for this article. There are lots of different IoT devices out there, that eventually will all be connected somehow. Via the internet.
One is left to wonder what is happening with all this data? Where is it stored, who has access to it, and most important of all: why is it collected in the first place? In most cases this collecting of data isn’t even necessary. In the case of Ford, we have to rely on Farley’s say-so that they are the only ones that have access to this data. And of course Google and every other company out there has the same defence. I don’t believe that for one second.
The data is being collected to support a business model that we see often in the tech industry, where profiles and sensitive data about the users of a service are valuable and either used to better target ads or directly sold on to other companies. There seems to be this conception that the modern internet user is used to not paying for services online, and this has caused many companies to implement the default ads-based and data and profiling-based business model. However, other business models, like the Humble Bundle in the gaming industry for instance, or online crowd-funding campaigns on Kickstarter or Indiegogo have shown that the internet user is perfectly willing to spend a little money or give a little donation if it’s a service or device that they care about. The problem with the default ads-based business model discussed above is that it leaves the users’ data to be vulnerable to exposure to third parties and others that have no business knowing it, and also causes companies to collect too much information about their users by default. It’s like there is some kind of recipe out there called “How to start a Silicon Valley start-up,” that has profiling and tracking of users and basically not caring about the users’ privacy as its central tenet. It doesn’t have to be this way.
Currently, a lot of this technology is developed and then brought to market without any consideration whatsoever about privacy of the customer or security and integrity of the data. Central questions that in my opinion should be answered immediately and during the initial design process of any technology impacting on privacy are left unanswered. First, if and what data should we collect? How easy is it to access this data? I’m sure it would be conceivable that unauthorized people would also be able to quite easily gain access to this data. What if it falls into the wrong hands? A smart thermostat like Google Nest is able to know when you’re home and knows all about your sleep/wake cycle. This is information that could be of interest to burglars, for instance. What if someone accesses your car’s firmware and changes it? What happens when driver-less cars mix with the regular cars on the road, controlled by people? This could lead to accidents.
And what to think of all those “convenient” dashboards and other web-based interfaces that are enabled and exposed to the world on all those “smart” IoT devices? I suspect that there will be a lot of security vulnerabilities to be found in that software. It’s all closed-source and not exposed to external code review. The budgets for the software development probably aren’t large enough to accommodate looking at the security and privacy implications of the software and implementing proper safeguards to protect users’ data. This is a recipe for disaster. Only when using free and open source software can proper code-review be implemented and code inspected for back-doors and other unwanted behaviour. And it generally leads to better quality software, since more people are able to see the code and have the incentives to fix bugs, etc. in an open and welcoming community.
Do we really want to live in a world where we can’t have privacy any more, where your whereabouts are at all times stored and analysed by god-knows who, and all technology is hooked up to each other, without privacy and security considerations? Look, I like technology. But I like technology to be open, so that smart people can look at the insides and determine whether what the tech is doing is really what it says on the tin, with no nasty side-effects. So that the community of users can expand upon the technology. It is about respecting the users’ freedom and rights, that’s what counts. Not enslaving them to closed-source technology that is controlled by commercial parties.
A few days ago I read an article (NRC, Dutch, published 11 September, interestingly) about how TNO (the Dutch Organisation for Applied Scientific Research, the largest research institute in the Netherlands) developed technology (PDF) for smart cameras for use at Amsterdam Schiphol Airport. These cameras were installed at Schiphol airport by the Qubit Visual Intelligence, a company from The Hague. These cameras are designed to recognise certain “suspicious behaviour,” such as running, waving your arms, or sweating.
Curiously enough, these are all things that are commonly found at the stressful environment an international airport is to many people. People need to get at the gate on time, which may require running (especially if you arrived at Schiphol by train, which in the Netherlands is notoriously unreliable), they may be afraid of flying and trying to get their nerves under control, and airports are also places where friends and family meet again after long times abroad, which (if you want to hug each other) requires arm waving.
I suspect that a lot of false positives are going to occur with this technology due to this. It’s the wrong technology at the wrong place. I fully understand the need for airport security, and we all want a safe environment for both passengers and crew. Flights need to operate under safe conditions. What I don’t understand is the mentality that every single risk in life needs to be minimised away by government agencies and combated with technology. More technology does not equal safer airports.
A lot of the measures taken at airports constitute security theatre. This means that the measures are mostly ineffective against real threats, and serve mostly for show. The problem with automatic profiling, which is what this programme tries to do as well, is that it doesn’t work. Security expert Bruce Schneier has also written extensively about this, and I encourage you to read his 2010 essay Profiling Makes Us Less Safe about the specific case of air travel security.
The first problem is that terrorists don’t fit a specific profile, these systems can be circumvented once people figure out how, and because of the over-reliance on technology instead of common sense this can actually cause more insecurity. In “Little Brother”, Cory Doctorow wrote about how Marcus Yallow put gravel in his shoes to fool the gait-recognising cameras at his high school so he and his friends could sneak out to play a game outside. Similar things will be done to try and fool these “smart” cameras, but the consequences can be much greater. We are actually more secure when we randomly select people instead of relying on a specific threat profile or behavioural profile to select who to screen and who gets through security without secondary screening. The whole point of random screening is that it’s random. Therefore, a potential terrorist cannot in advance know what the criteria are that will make the system pick him out. If a system does use specific criteria, and the security of the system depends on the criteria themselves being secret, that would mean that someone would just have to observe the system for long enough to find out what the criteria are.
Technology may fail, which is something people don’t always realise. Another TNO report entitled: “Afwijkend Gedrag” (PDF; Abnormal Behaviour) states under the (admittedly tiny) section that deals with privacy concerns that collecting data about abnormal behaviour of people is ethically just because the society as a whole can be made safer with this data and associated technology. It also states (and this is an argument I’ve read elsewhere as well), that “society has chosen that safety and security trumps privacy.”
Now, let’s say for the sake of the argument that this might be true in a general sense (although it can be debated whether this is always the case, personally I don’t think so, as sometimes the costs are just too high and we need to keep a free and democratic society after all). The problem here is that the way technology and security systems are implemented is usually not something we as a society get to first have a vote on before the (no doubt highly lucrative) contracts get signed. In this case, Qubit probably saw a way to make a quick buck by talking the Schiphol leadership and/or the government (as the Dutch state holds 69.77% of the Schiphol shares) into buying their technology. It’s not something the people had a conscious debate on, and then subsequently made a well-informed decision.
Major Privacy Issues
We have established that these systems are ineffective and can be circumvented (like any system can), and won’t improve overall security. But much more importantly, there are major privacy issues with this technology. What Schiphol (and Qubit) is doing here, is analysing and storing data on millions of passengers, the overwhelmingly vast majority of which is completely innocent. This is like shooting a mosquito with a bazooka.
What happens with this data? We don’t know, and we have to believe Qubit and Schiphol on their word that data about non-suspect members of the public gets deleted. However, in light of recent events where it seems convenient to collect and store as much data about people as possible, I highly doubt any deletions will actually happen.
And the sad thing is: in the Netherlands the Ministry of Security and Justice is now talking about implementing the above-mentioned behavioural analysis system at another (secret) location in the Netherlands. Are we all human guinea pigs ready to be tested and played around with?
What is (ab)normal?
There are also problems with the definitions. This is something I see again and again with privacy-infringing projects like this. What constitutes “abnormal behaviour”? Who gets to decide on that and who controls what is abnormal behaviour and what isn’t? Maybe, in the not-too-distant future, the meaning of the word “abnormal” begins to shift, and begins to mean “not like us,” for some definition of “us.” George Orwell mentioned this effect in his book Nineteen-eighty-four, where ubiquitous telescreens watch and analyse your every move and one can never be sure what are criminal thoughts and what aren’t.
In 2009, when the European research project INDECT got funded by the European Union, there were critical questions asked to the European Commission by the European Parliament. More precisely, this was asked:
Question from EP: How does the Commission define the term abnormal behaviour used in the programme?
Answer from EC: As to the precise questions, the Commission would like to clarify that the term behaviour or abnormal behaviour is not defined by the Commission. It is up to applying consortia to do so when submitting a proposal, where each of the different projects aims at improving the operational efficiency of law enforcement services, by providing novel technical assistance.
(Source: Europarl (Written questions by Alexander Alvaro (ALDE) to the Commission))
In other words: according to the European Commission it depends on the individual projects, which all happen to be vague about their exact definitions. And when you don’t pin down definitions like this (and anchor them in law so that powerful governments and corporations that oversee these systems can be held to account!), these can be changed over time when a new leadership comes to power, either within the corporation in control over the technology, or within government. This is a danger that is often overlooked. There is no guarantee that we will always live in a democratic and free society, and the best defence against abuse of power is to make sure that those in power have as little data about you as possible.
Keeping these definitions vague is a major tactic in scaring people into submission. This has the inherent danger of legislative feature creep. A measure that once was implemented for one specific purpose soon gets used for another if the opportunity presents itself. Once it is observed that people are getting arrested for seemingly innocent things, many people (sub)consciously adjust their own behaviour. It works similarly with free speech: once certain opinions and utterances are deemed against the law, and are acted upon by law enforcement, many people start thinking twice about what they say and write. They start to self-censor, and this erodes people’s freedom to the point where we slowly shift into a technocratic Orwellian nightmare. And when we wake up it will already be too late to turn the tide.
I gave my talk about privacy by design last Saturday at eth0 2014 winter edition, a small hacker get-together which was organised in Lievelde, The Netherlands this year. eth0 organizes conferences that aim at bringing people with different computer-related interests together. They organise two events per year, one during winter. I’ve previously given a very similar talk at the OHM2013 hacker conference which was held in August 2013.
Here’s the footage of my talk:
I talked about privacy by design, and what I did with relation to Annie Machon‘s site and recently, the Sam Adams Associates for Integrity in Intelligence site. The talk consists of 2 parts, in the first part I explained what we’re up against, and in the second part I explained the 2 sites in a more specific case study.
I talked about the revelations about the NSA, GCHQ and other intelligence agencies, about the revelations in December, which were explained eloquently by Jacob Applebaum at 30C3 in Hamburg in December. Then I moved on to the threats to website visitors, how profiles are being built up and sold, browser fingerprinting. The second part consists of the case studies of both Annie Machon’s website, and the Sam Adams Associates’ website.
I’ve mentioned the Sam Adams Associates for Integrity in Intelligence, for whom I had the honour to make their website so they could have a more public space where they could share things relating to the Sam Adams Award with the world, and also to provide a nice overview of previous laureates and what their stories are.
One of the things both sites have in common is the hosting on a Swiss domain, which provides for a safer haven where content may be hosted safely without fear of being taken down by the U.S. authorities. The U.S. claims jurisdiction on the average .com, .net, .org domains etc. and there have been cases where these have been brought down because it hosted content the U.S. government did not agree with. Case in point: Richard O’Dwyer, a U.K. citizen, was threatened with extradition to the United States for being the man behind TVShacks, which was a website that provided links to copyrighted content. MegaUpload, the file locker company started by Kim Dotcom, was given the same treatment, where if you would visit their domain, you were served an image from the FBI telling you the domain had been seized.
I just stumbled upon this funny video made by the ACLU (American Civil Liberties Union). It fits perfectly, and it’s funny to see that when invasions of privacy gets really personal (Santa photographing your face, recording your conversations and rifling through your smartphone), people really don’t like this and some respond strongly, but when the exact same thing is done by some big, anonymous government agency it doesn’t get such a strong response, which in unfortunate. Anyway, without further ado:
Recently I came across an article about Facebook, more specifically, that Facebook wants to know why you self-censor, in other words, why you didn’t click Publish on that status update you just wrote, but decided not to publish instead. It turns out Facebook is sending everything you type in the Post textarea box (the one with the “What’s on your mind?” placeholder), to Facebook servers. According to two Facebook scientists quoted in the article: Sauvik Das, PhD student at Carnegie Mellon and summer software engineer intern, and Adam Kramer, a data scientist, they only send back information to Facebook’s servers that indicate whether you self-censored, not the actual text you typed. They wrote an article entitled Self-Censorship on Facebook (PDF, copy here) in which they explain the technicalities.
It turns out this claim that they only send metadata back, not the actual text you type is not entirely true. I wanted to confirm whether they really don’t send what you type to Facebook before you hit Publish, so I fired up Facebook and logged in. I opened up my web inspector and started monitoring requests to/from my browser. When I typed a few letters I noticed that the site makes a GET request to the URL
/ajax/typeahead/search.php with parameters
value=[your search string]&__user=[your Facebook user id] (there are more parameters, but these are the most important for the purposes of this article). The
search.php script probably parses what you typed in order to find contacts that it can then show to you as autocomplete options (for tagging purposes).
Now, the authors of the article actually gathered their data in a slightly different way. They monitored the Post textarea box, and the comment box, and if more than 5 characters were typed in, it would say you self-censored if you didn’t publish that post or comment in the next 10 minutes. So in their methodology, no actual textual content was needed. But it turns out, as my quick research shows above, that your comments and posts actually do get send to Facebook before you click Publish, and even before 5 characters are typed. This is done with a different purpose (searching matches in your contacts for tagging etc.), but clearly this data is received by Facebook. What they subsequently do with it besides providing autocomplete functionality is anyone’s guess. Given that the user ID is actually sent together with the typed in text to the
search.php script may suggest that they associate your profile with the typed in text, but there’s no way to definitively prove that.
When I read through the article, one particular sentence in the introduction stood out to me as bone-chilling:
“(…) Last-minute self-censorship is of particular interest to SNSs [social networking sites] as this filtering can be both helpful and hurtful. Users and their audience could fail to achieve potential social value from not sharing certain content, and the SNS [social networking site] loses value from the lack of content generation. (…)”
“loses value from the lack of content generation.” Let that sink in. When you stop from posting something on Facebook, or re-write it, Facebook considers that a bad thing, as something that removes value from Facebook. The goal of Facebook is to sell detailed profiling information on all of us, even those of us wise enough not to have a Facebook account (through tagging and e-mail friend-finder functionality).
Big Data and Big Brother
And it isn’t just Facebook, it’s basically every social network and ad provider. There’s an entire industry of big data brokers, with companies most of us have never heard of, like Axciom for instance, but there are many others like it, who thrive on selling profiles and associated services. Advertising works best if it is specific, and plays into users’ desires and interests. This is also the reason why, for this to be successful, companies like Facebook need as much information on people as possible, to better target their clients’ ads. And the best way is to provide a free service, like a social network, enticing people to share their lives through this service, and then you can provide really specific targeting to your clients. This is what these companies thrive on.
The bigger problem is that we have no influence on how our data gets used. People claiming they have nothing to hide, and do nothing wrong, forget that they don’t decide on what constitutes criminal behavior, it’s the state making that decision for them. And what will happen when you are suddenly faced with a brutal regime that abuses all the information and data they got on you? Surely we want to prevent this.
This isn’t just a problem in the technology industry, and business, but a problem with governments as well. The NSA and GCHQ, in cooperation with other intelligence agencies around the world are collecting data on all of us, but without providing us, the people, the possibility of appeal, and correction of erroneous data. We have no influence on how this data gets used, who will be seeing it, how it might get interpreted by others, et cetera. The NSA is currently experiencing the same uneasiness as the rest of us, as they have no clue how much or what information Edward Snowden might have taken with him, and how it might be interpreted by others. It’s curious that they now complain about this same problem that the rest of us have been experiencing for years; a problem that NSA partly created by overclassifying information that didn’t need to be kept secret. Of course there is information that needs to be kept secret, but the vast majority of information that now gets rubber stamped with the TOP SECRET marking, is information that is of no threat to national security if it were known to the public, but more likely information that might embarrass top officials.
We need to start implementing proper oversight to the secret surveillance states we are currently subjected to in a myriad of countries around the world, and take back powers that were granted to them, and subsequently abused by them, if we want to continue to live in a free world. For I don’t want to live in a Big Brother state, do you?