Data Ethics Club: Limits and Possibilities for “Ethical AI” in Open Source: A Study of Deepfakes#

What’s this?

This is summary of Wednesday 29th April’s Data Ethics Club discussion, where we spoke Limits and Possibilities for “Ethical AI” in Open Source: A Study of Deepfakes, a paper by David Widder, Dawn Nafus, Laura Dabbish and James Herbsleb. We were fortunate enough to be joined by the first author David Gray Widder, a PhD candidate from Carnegie Mellon University to introduce his work and participate in our discussion. The summary was written by Huw Day, who tried to synthesise everyone’s contributions to this document and the discussion. “We” = “someone at Data Ethics Club”. Jessica Woodgate, Nina Di Cara and Natalie Thurlby helped with the final edit.

First thoughts#

Our initial paper impressions are that the paper has a strong emphasis on idea of freedom to do things but doesn’t consider the responsibility associated of that freedom. Have these standards changed? Are people nowadays more inclined to follow these sort of standards?

Is the paper saying you shouldn’t produce open source stuff if there is any chance you can use it for negative purposes? If it can be used for bad things, it will be. We felt that this fact didn’t absolve us of responsibilities, but things are sometimes less black and white than it appears.

Often some of our members have found that in open source projects, as a community everyone is watching for malicious/faulty code that means anyone can check the code for anything malicious, but that is assuming lots of people are working on that project. Larger projects tend to have better oversight, but small projects have to have a lot more risk considerations.

What ethical concerns might arise in corporate sponsored vs. community-based open source projects?#

There are different sorts of ethical tradeoffs that are made in different contexts which will relate to the purpose of the projects and the expected outcomes. It all depends what we are prioritising. Profit? Is transparency the most important thing with open source projects? The public sector tends to assume corporate-sponsored focusses around intellectual property and having things resold back to the public sector. This could have reputational damage if mistakes are found.

So, how do open source requirements for transparency clash with product requirements for success? If something is purely open source, transparency is the only context with which to build trust. If you have a different framework then that can sometimes build trust inherently. The open source licencing model might help ensure transparency and code reuse.

In corporate and community projects, the corporate projects tend to be more task orientated and have clearer deliverables. Trade secrets would be more of an issue in industry and profits rely upon this. This might not be as clean of a product based on external requirements versus labor of love. We probably also need to tie some of deepfake materials to laws that protect individuals such as defamation laws.

People will always find new ways to abuse technology as new technologies come about, but the consequences of abuse have to tie back to laws that protect people and organisations. The GDPR and Data Protection Act 2018 can help with protecting individuals’ privacy as a start but probably not enough to protect against deepfakes. By its nature, enforcement is already after the fact and quite clunky and hard to make use of, particuarly in such rapidly developing fields.

If Google released an open source deepfake tool, our sense was that there would be public outcry. Behavioural use licensing means in practise that as groups become more confident they tend to think less about the consequences of subsequent immoral behaivour. People contravene/break licenses all the time. Licenese and contracts might stop some bad use and that can still be meaningful but they are not a catch all solution. Companies still do a lot of unethical things despite licenses dictating what is/is not allowed.

In the case of deep fakes, it tends to be a male dominated field. This is in stark contrast to the fact that the majority of people affected by the misuse of this are women. If it’s open source, you have to rely on personal stakes for emotional accountability whereas with corperate structure you have more rigidity and guidelines.

It can be really difficult for people to get evidence for lots of things, e.g. in domestic violence. DeepFakes could be used to create false evidence and actively sully the water. Some of the participants in the study did know that the software was being used for variety of horrible reasons but that didn’t stop them doing it. It’s often difficult to even begin these sort of conversations.

Hugging face is an open source AI startup started by ex-Google employees. They have a range of parternships with Amazon (which has its own controversies) but is a leading cloud infastructure provider. In this case, is one big tech corp better than the other or is it more the case that you didn’t like one company but you need cloud storage so you have to go to the other?

Why do we think conversations about Ethics in AI often focus on design principles, such as Fairness, Accountability, or Transparency, instead of what systems are used for?#

There is a potential to lose context around analysis and a reason to hold back on sharing the data. This highlights different risks and governance. In contrast, something created by a community can have various purposes, corporate would have own purposes in mind. Would corporations go for the big public sector contracts if they were told they had to share their code?

How do we judge this? Case by case probably, as it’s very difficult to guess context.

A measure could be how easy is it to abuse? Without the code being shared does it slow you down in creating something that could do harm? “Ethical” varies from person to person and is changable. People in the interviews from the paper clashed.

To what extent are we putting a monetary value on harm by making it something that’s legally enforceable? Is paying the fine going to be worth it for doing harm? Is that just paying the fee to do something unethical?

Ethics seems to be more focussed on novelty. AI is new and exciting but we have so many things still to consider in data science. When we are at the beginning of building the tool is a good time to think about this. But tools might not have a fixed, determined purpose so it’s a constant guessing game of what tech can be used for, as well as a catchup game as new developments appear.


Name, Role, Affiliation, Where to find you, Emoji to describe your day

  • Natalie Zelenka, Data Scientist, University of Bristol, NatalieZelenka, @NatZelenka

  • Huw Day, PhDoer, University of Bristol, @disco_huw

  • Noshin Mohamed, Quality Assurance in Children’s Service

  • Zoë Turner, Senior Data Scientist, NHS MLCSU,