Data Ethics Club discusses: UK Statistics Authority: Identifying gaps, opportunities and priorities in the applied data ethics guidance landscape (14th Apr 21)#

What’s this?

This is a summary of Wednesday 31st March’s Data Ethics Club discussion, where we spoke and wrote about the piece UK Statistics Authority’s white paper: Identifying gaps, opportunities and priorities in the applied data ethics guidance landscape.

The summary was written by Natalie Thurlby and Nina di Cara, who tried to synthesise everyone’s contributions to this document and the discussion, and provide some useful feedback to the UK Statistics Authority on our behalf. “We”/”Us” = “someone at Data Ethics Club”.

We were really happy to see this useful and timely landscape review. The data ethics landscape is quite sprawling and fragmented, so it’s really appealing to have a big picture of the different institutions and frameworks.

Here we have summarised the output of our discussion on the review, as a group of statisticians, data scientists, analysts, and software engineers from academia, private industry and the public sector.

General feedback#

One of our first impressions was that whilst there was a welcome focus on ethics support in terms of more detailed and tailored advice and frameworks, there was not much focus on more serious efforts to embed ethical thinking into day-to-day work and decision making. We found that (as the review identified) there are already a lot of resources in the data ethics space, and we would like to see the Centre for Applied Data Ethics grappling with challenging questions that consider broader societal impacts and issues of embedding co-design/participatory design. Some suggestions that were raised in our discussion were:

  • how to provide training which would support statisticians to consider the ethical implications of their work themselves on an ongoing basis.

  • how to avoid ‘outsourcing’ ethical decisions to committees, rather than building this into project design. We see discussion groups that give anyone in data the opportunity to discuss data science and ethics (like ours - Data Ethics Club) as part of the solution to this issue.

  • whether we can broaden the remit of ethics committees or ethics “clinics” to help researchers and analysts to understand the wider societal impacts of their work, as opposed to solely focussing on direct participants and whether the work is legal.

  • how to extend the responsibility of ethical thinking to other data-adjacent roles such as software engineers, so that ethics is everybody’s problem.

  • understanding how data scientists, analysts, statisticians can make best use of participatory, community-informed design in practice (“nothing about us without us”).

  • the role of organisational culture in achieving ethical data science.

These kinds of interventions would hopefully help ethical principles to be ‘baked in’ from the start of an analysis project, rather than (or as well as) being sent off for approval by an ethics committee. We felt that the UK Statistics Authority as a national body would in a position of power to emphasise ‘ethics by design’, and back calls for ethics training to be a necessary part of working as an applied technologist or data scientist.

Notes on Annex B#

One potential easy-win was that we noticed was that some of the language in this review wasn’t particularly inclusive of data analysts who dont see their work as “research” or “data science”, such as those who analyse data in healthcare or policing (framed as ‘business intelligence’ for example). We recognise that this is highly relevant to the question you have posed in Annex B, item 7. There is a risk in these operational contexts to see processes of ethical approval as unnecessary, and for this reason we pose that there isn’t a hard or easily-defined boundary between research and operational users of data. This re-emphasises the importance of ethics by design; there simply isn’t a point at which ethics becomes necessary, it is necessary at all times.

Item 2 asks how we can help data scientists communicate the public good of their work. We’d argue that we are perhaps too good at this already, and that data professionals need to get better at communicating the pitfalls of their work, such as who it works for and why (or why not). This is especially relevant in the private sector when products based on data analysis are being sold for real-world applications.

In item 9 we wondered why GIS data had been specifically mentioned in a different style than all the other items, and what about it would not be covered by the other items in Annex B? For instance, medical data or data on protected characteristics could have the same question posed.

Addional resources and organisations#

We absolutely appreciate that this huge task had to be constrained somehow, but as mostly researchers or research-adjacent ourselves, many of the frameworks and other work in this space that we are aware of come from individual researchers rather than institutions, and often these teams are international. This seemed to be the case particularly for work with a more societal focus such as Data Feminism, the Ethics & Algorithms Toolkit , or any number of the resources and papers in our Data Ethics Club reading list.

The focus on transparency also creates clear overlap with reproducibility movement, but this wasn’t necessarily reflected in the contents of the landscape review. For example, the following key players seemed to be missing from the review:

Similarly, we were also surprised to see that the Central Digital & Data Office nor it’s Data Ethics Framwork were mentioned.

On a technical note, it would be nice to have future reports available as PDF as well as website format wasn’t accessible on all devices.

Overall we were really pleased to see this review, and are excited to see how the Centre for Applied Data Ethics develops! We hope our feedback is helpful.


Note this is a list of attendees who felt comfortable sharing their names:

Name, Job title, Affiliation, GitHub, Twitter, Emoji to describe your day

  • Natalie Thurlby, Data Scientist, University of Bristol, NatalieThurlby, @StatalieT, :sun_with_face:

  • Tessa Darbyshire, Scientific Editor, Patterns, Cell Press @TessaDarbyshire

  • Paul Lee, investor, @pclee27

  • Huw Day, Bristol Maths PhDoer

  • Zoë Turner, Senior Information Analyst @Letxuga007

  • Roman Shkunov, Maths/CS student at UoB

  • Nina Di Cara, GW4 Bio PhD in Data Science and AI, University of Bristol (@ninadicara)

  • Kamilla Wells, RPA at Australian federal agency, previously QUT Kamilla Wells