Data Ethics Club: The Rise of Private Spies (28th July 21)#

What’s this?

This is summary of Wednesday 28th July’s Data Ethics Club discussion, where we spoke and wrote about the New Republic article The Rise of Private Spies by Charlie Savage. The summary was written by Huw Day, who tried to synthesise everyone’s contributions to this document and the discussion. “We” = “someone at Data Ethics Club”. Nina Di Cara and Natalie Thurlby helped with the final edit.

Who is Bellingcat? They are Bellingcat.#

The idea of Bellingcat formed initially from amateur internet sleuths using documenting war crimes in the Middle East from the comfort of their own home. From there, it gradually evolved as these sleuths developed expertise in open source data analysis techniques and began diversifying their staff to build a global network of volunteers and employees.

The name of the organisation comes from a fable in which a group of housemice are being terrorised by a cat. They can outrun the cat, but can never hear the cat coming. To get around this, they plan to put a bell on the cat. The question is, who will put the Bell on the Cat? The elected mouse is the Bellingcat.

The analogy extends somewhat dramatically to the real world, with the general population represented by the mice. Those who seek to do harm are the cat (from Mexican drug lords and those who commit crimes against humanity, to tracking the use of chemical weapons and conflicts worldwide). Bellingcat, along with companies they are in cooperation with (which includes journalistic collaberators, law enforcement/intelligence agencies and human rights groups) are the Bellingcat.

Over the past seven years, Eliot Higgins (Bellingcat’s founder) and his team have used open-source information to uncover war crimes in Syria, unmask neo-Nazis who stormed Charlottesville, Virginia, and name the alleged Russian poisoners of former spy Sergei Skripal and his daughter, Yulia in the U.K. Bellingcat’s work in Syria is being used by the European Center for Constitutional and Human Rights (ECCHR) to build war crime cases and Higgins sits on the technology advisory board to the International Criminal Court to help them understand how to apply open-source investigation to their work.

With Bellingcat’s fingers in many pies, there’s natural confusion and conflation of commerical/private as well as espionage/citizen journalism. Bellingcat identify their work as priamarily investigative (as Higgins noted in a tweet response). But they have also been self described as “An Intelligence Agency for the People”. As the article of our interest notes, this is a slogan notorious for its association with Wiki Leaks, a whistleblower organisation who has drawn criticism for inadequately curating its content and violating the personal privacy of individuals.

Journalists have the power to hold people in power account such as the Panama Papers or more recently ex-health secretaries violating social distancing. With the rise of open source investigative journalism, we’re see that data breaches can have positive or negative impacts. These data breaches can be accidental, open data (which user may or may not have intended to make public) or obtained through more so called “black hat” methods; social engineering, hacking - the techniques associated with the modern day spy. It each case, it’s important for us to consider whether use of such data is either legal or ethical.

Now with the rise of social media, where events are being recorded and published by members of the public, your twitter feed becomes a faster (albeit potentially less reliable) source of the latest news. January 6th, during the US Capitol Building attack, one person was fatally shot when storming the capital and there was realtime social media trying to write narratives, framing those illegally entering the Capitol building as legitimate protestors, illegimate insurgents/rioters or even democrat posers performing a false flag operation.

The investigative exploits of organisations like Bellingcat make for fascinating reads and potentially stories of hope for the little guy - the brave little mice putting the bells on the cats seeking to do harm to warn the rest of us. The fact that there’s nothing special about an organisation like Bellingcat is both something that inspires hope and fear. Ordinary people can use open source data to conduct their own investigations. This has huge potential both for good and for bad and is currently largely unregulated. It’s one thing when such techniques are being used to track Russian hitsquads who are suspected of poisoning political opposition, but what if those same techniques were used to track you when you’re applying for a loan, by a stalker to track a victim or applied on mass by a government to keep track of its citizens?

When is it ethical to use publicly available data?#

It seems somewhat surreal that amount of information we willingly share on social media. Our hobbies, our date of birth, perhaps even where we live. This leaves a tricky question, if we post this all in the public domain, should we have any control over who is allowed to use it?

Inference from social media has different legal implications and so it might not be possible to come up with generalisable answer that fits all possible circumstances. Would the distinction be between public vs. hacked or leaked data? This also brings up doxxing - to search for and and publish private or identifying information about a particular individual on the internet, typically with malicious intent. Is good/bad intention a key indicator? How reliable is this as a legal framework?

This brings back to the debate of informed consent of what happens to your social media activity. What’s fair game? And in what circumstances? Does whether something is fair game change depending on the circumstances? Often social media accounts default to public (i.e. very limited) privacy settings. If someone hasn’t resitrcted access then they are “fair game” in a sense but people don’t expressly consent to it because they just don’t know to. How much does an individual have to take responsibility for other people’s snooping?

Is the ethical issue with giving the tools or misuing the tools? Any restriction will inevtiably affect those with ill intent as well as those with good intent. Indeed what constitutes “ill-intent”. Some in marketing might argue that building ad-profiles so that they can give you more relevant advertisements constitutes good-intent, but that does not mean the rest of us are not somewhat easy that every one of our clicks is compiled into an interpretation of who we are.

Some things have been created out of malice and have in some ways turnt out for the better and vice versa. We discussed previously how certain statistical techniques were initially used to track deaths - arguably a good thing? Those same techniques were developed centuries later in the use of predictive policing - arguably a bad thing. Who is responsible for making sure data is used correctly? The collectors or the analysts? How do we enforce this?

There exists a tension between the openness of the internet versus the protection of the information flow. Maybe we should just trust that open organisations like Bellingcat will use this wealth of information for the greater good and potentially accept the fallout of malicious actors using the same data, hoping for a net positive effect? To what extent do we need this fearmongering? Maybe most people will use data in a positive (or at least harmless) way?

One of the issues is that “facts” and numbers can still be misleading, e.g. it’s true that double-vaccinated people have gotten covid. What this lone statement ommits to mention is that they are 3 times less likely to get it than a vaccinated person.

Unfortunately some of the people who are really good at data literacy are conspiracy theorists who seek to use the facts in front of them to tell their preferred narrative. It takes far longer for us to verify a story than for us to retweet it. Bellingcat for their part include extremely detailed evidence chains and seem to be quite open about what is fact and what is simply speculation. Whilst this is commendable, what’s to stop another organisation telling their own stories using open data in a way that fuels their preferred narrative?

How reliable are the kinds of social media data that Bellingcat are using?#

As we’ve discussed before, labelling how reliable data is important. Being honest about gaps in your data (explicit or otherwise) is a vital part of transparency. A collection of proxy measures combined can form a coherent looking narrative, especially if the gaps in the data and assumptions are ommitted. This all requires a huge amount of effort to both verify by those analysing the data initially as well as readers after the fact. Your average reader won’t verify the claims. How many of the links in this article have you clicked on so far?

Some might argue that this amount of effort will put off open source analysts in the future, but with big data increasing the volume of information availible and social media making it easier than ever to share with people over the world, we might see more and more people with a bigger reach, leading to a faster spread of information - factual or otherwise.

This leads to a somewhat awkward discussion of what we mean by reliable? This is particuarly an issue with discussing the intent of others. Much of Bellingcat’s work has documented events that have happening; here is where airstrikes happened in Yemen, here is where police violence has escalated in Colombia and there is where a mass killing occured in Ethiopia. Some evidence is more compelling than others, but if you’re arguing if an attrocity happened or not, as opposed to the intent behind those who committed such an attrocity, there is an objective truth to be found.

With Bellingcat’s collaberations with law enforcement, we arrived upon the natural question of what legal frameworks do organisations like Bellingcat operate under? Warrants are needed for law enforcement to monitor social media in the UK and then the chain of evidence has to be followed in the court of law. You lose this a bit as you cross borders but the premise remains. No such standards apply to private agencies.

Sometimes as private individuals we have more freedom to do things and we can avoid these restrictions - for better or for worse. For example, someone tracked schools being closed down due to covid on social media, this was openly availible data that wasn’t being collected by the public sector. This information could be useful in helping inform decisions about future lockdowns and understanding how the virus is spreading.

Just because the information did not come first hand from a member of law enforcement, does not mean it is unreliable - police rely on witness testimony frequently, as well as relying on experts (for example in forensic analysis). Why should open source analysis be any different? If open source analyists are to fall into the role of expert witnesses, how can we hold them to a high standard? A forensic analyist might have a relevant degree in forensic sciene, a pathologist in medicine. Somehow requiring rigid standards on open source analysts might be against the spirit of Bellingcat, who run workshops for anyone to attend to learn about how they can perform open source analysis.

One other concern about unregulated, open source analysis is that it introduces an assymetry based on groups powers to control a narrative with disinformation. This can be somewhat regulated by a strict chain of evidence so that such information cannot be easily tainted for the purposes of legal evidence. But with information a click away, we’re impatient to read easily digestible, readily obtainable news as soon as it is relevant. If the general public are unwilling to read the chain of evidence in news articles before sharing them, are we not at fault for the spread of disinformation, even if we are not the original authors of it?

What are the implications of police departments working with private investigators who use these methods?#

With their fingers in the pies of espionage, journalism and human rights activism, a valid question is, who are Bellingcat? Each article published by Bellingcat includes a link to the contributors and there’s a comprehensive list of contributors on their website. Asking a larger structual question of who are Bellingcat and what is their agenda, we can at least see transparency in their finances and a list of their contributors and a published policy plan (same link). There is no requirement for a private company to do this, but it lends to an image of transparency. Perhaps the issue then is not with Bellingcat, but with private investigators/organisations that don’t takes these steps towards transparency.

At the end of the day, evidence comes from people. Police don’t always have the resources/ability to collect information directly so if someone else has collected it (even just witness statements), it should still be admissable (as long as it’s verifiable). This can of course lead to issues across boards if the evidence is admissable in one country but not another. If the information is obtained in illegal ways, perhaps the police would (or at least should) have difficulty using that evidence but this is a bit outside the expertise of the data ethics group.

This led us to an interesting hypothetical: If information obtained illegally is left on a train, can it then be admitted by someone who finds it? For how long after it’s left? The line is not so clear. Law enforcement can be in a difficult position here. They inevitable have to rely on judgement calls to dictate how they interpret the law. Unfortunately, this judgement can often be heavily biased.

What’s the difference between police outsourcing to Bellingcat/to other experts and using expert witnesses? Bellingcat characterise themselves primarily as investigators, but perhaps for legal purposes they are just open source data analyst experts who can be consulted like private professionals (such as pathologists, medical statisticians etc.)

Lot’s of question pop up here about quality control; How good are they at doing it? How reliable is it? How do people take it in court? If people take it as gospel in court and they’re not reliable, that’s obviously bad. If people think it’s rubbish and they are reliable, that’s also bad.

How do we verify it? What standards can we hold them to? Trust and expert opinions come up alot. Maybe you can’t trust unaccredited/unverified private organisations in general - do they just go to the highest bidder?

Perhaps people aren’t inherently untrustworthy, but if they’re creating narratives (even if all the evidence is there), then that could be dangerous. As we’ve noted before, fake news travels faster than facts. People might be telling stories that follow their confirmation bias. We’re all only humans after all. But should we stop anyone from telling these stories? It’s a difficult line to walk, but we must walk it. Currently many policy makers are stood still on the issue.


Note: this is not a full list of attendees, only those who felt comfortable sharing their names.

Name, Affiliation, Links to find you

  • Natalie Thurlby, Data Scientist, University of Bristol, NatalieThurlby, @StatalieT

  • Nina Di Cara, PhD Student, University of Bristol, ninadicara, @ninadicara

  • Huw Day, PhDoer, University of Bristol, @disco_huw,

  • Zoë Turner, Senior Information Analyst, Nottinghamshire Healthcare NHS Foundation Trust Lextuga007

  • Ola Michalec, Researcher at Bristol Cyber Security Group (UoB)

  • Vanessa Hanschke, PhD in Interactive AI, University of Bristol

  • Kamilla ‘Milli’ Wells, Citizen Developer

  • Dan Gorringe, Research Engineer Intern, Hadean, DanGorringe