Data Ethics Club reads “Data Feminism”: Summer Bookclub 2024#

What’s this?

This is summary of the discussions from Data Feminism Book Club, where we spoke and wrote about Data Feminism by Catherine D’Ignazio and Lauren F. Klein over the summer of 2024. We hope you enjoy this writeup. We aim to run another Book Club in the summer of 2025! In the mean time, the co-organisers would be really enthusiastic to support anyone interested in running another book club! Please reach out if you want to get involved. The summary was written by Jessica Woodgate, who tried to synthesise everyone’s contributions to this document and the discussion. “We” = “someone at Data Ethics Club”. Nina Di Cara, Huw Day, Natalie Thurlby, Amy Joint, and Vanessa Hanschke helped with the final edit.

Data Feminism: Chapter 1 – The Power Chapter#

Principle: Examine Power#

“Data feminism begins by analysing how power operates in the world.”

Chapter Summary#

This chapter explores how to examine power through naming and explaining forces of oppression. Power is defined as the configuration of structural privilege and oppression where some have an unfair advantage because systems are designed by people like them and, because of this, those systems work better for them. To explain the configuration and experience of systems of power, the matrix of domination maps how different domains of society (structural, hegemonic, disciplinary, interpersonal) interact. When examining power dynamics in systems, we should ask “who” questions: who does the work (what is the effect of their lived experience?); who benefits (whose data is missing, and who is oversurveilled?); whose priorities get turned into products. A hyperfocus on science, surveillance, and selling can cause significant oversights with life-altering consequences.

Definitions#

power - the current configuration of structural privilege and structural oppression, in which some groups experience unearned advantages and other groups experience systematic disadvantages

matrix of domination [P Hill-Collins]- (p25) a concept that explains how systems of power are configured and experienced.

  • structural domain: organises oppression (law/politics)

  • hegemonic domain: circulates oppressive ideas (culture/media)

  • disciplinary domain: administers and manages oppression (enforcing law/policies)

  • interpersonal domain: individual experiences

minoritized - social group that is actively devalued and oppressed by a dominant group (even when they the majority group e.g. women)

privilege hazard - when decisions are made by those who make up dominant groups to the exclusion of other identities or perspectives crucial input is missed.

counterdata - data that are produced by civil society groups or individuals in order to counter missing data or to challenge existing official data.

Discussion Summary#

Have you seen the matrix of domination before? What did you think – did you fully understand it?#

Many of us hadn’t seen the matrix of domination in this format before, however, were familiar with its parts. Considering our understandings of these domains, it intuitively made sense to see them linked together in this way. For some of us, the matrix has made visible ways we are affected by power that we were not previously aware of.

Mapping together the aspects of domination helps us to talk about issues with power which we have always been aware of but haven’t necessarily had the vocabulary to articulate. In particular, we found it useful to visualise the relation between structural and disciplinary domains. Both domains are informed by the hegemonic domain and also trickle down into one another. However, clearly differentiating the domains facilitates decoupling them to infer actionable insights. The interplay of separateness and connectivity illustrates how causes might not be independent, but solutions can be.

The matrix of domination provides a great framework to support thinking and speech around issues with power which can be hard to articulate. The simple yet informative layout sets the matrix up to be a great communication tool. By providing us with the right understanding and words, we can use it to advocate for ourselves and others.

We also discussed the possible reasons why it was presented as a matrix. We wondered if the use of a matrix was so that it occupies more of a 3D space, reflecting intersectionality and build-up of different domains interacting. Each section needs to occupy its own space but also interact with the other sections. It is tempting to put things into a hierarchy, but it isn’t always possible to order and measure how factors impact people.

However, whilst labelling can be useful, we wondered if it is effective when people don’t believe in the labels themselves. There are always excuses for oppressive behaviour; sometimes we wonder if efforts to examine power make any difference. Labels and other means of categorisation can also feel quite abstract. To put the matrix of domination into practice, and facilitate unlearning, it would be useful to have some more examples about which groups are being dominant. Examples can be found by asking who has privilege and power, and why they have it.

One example of imbalanced power can be found in the context of work, evidenced by disparities such as the gender pay gap. Symptoms of domination in the work place include expectations on people to assimilate to men’s behaviour, such as the wave of “girlboss” feminism and assertiveness training. Rather than challenging existing power imbalances, these kinds of behavioural management techniques imply that success can only be obtained by adapting behaviour to ways that are implicitly white and male; you can only succeed at work if you learn how to communicate like the dominant group. For women, this is a double-edged sword. On the one hand, women are not expected to display masculine traits and conform to traditional ideas of femininity (or be called childless cat ladies). On the other hand, to be seen as a leader they are expected to present as confident and tough. Women must therefore tailor their vocabulary and behaviour to navigate these complex dynamics.

Imposition of behavioural expectations from the dominant group, and other forms of discrimination, often manifest through microaggressions. A microaggression is a “subtle, often unintentional, form of prejudice… often takes the shape of an offhand comment, an inadvertently painful joke, or a pointed insult.”. On the interpersonal level, it can be difficult to articulate and fight against microaggressions. The matrix of domination may help us to address microaggressions, by situating them in a wider context and grounding our explanaitions for why they are wrong.

Alongside challenging entrenched thoughts and behaviours like microaggressions, it is important to apply the matrix of domination and examination of power to change policy. Equality in the personal and hegemonic domains can take a long time. For example, not all women got the right to vote in 1920; for black women, the Voting Rights Act was not signed into law until 1965. Even when the right to vote has been put into legislation, to be able to vote you must still have the right documentation, get transport to voting stations, and take time off of work. The time and resources required constitute barriers for people with low incomes: “financial security is strongly correlated with nearly every measure of political engagement”. The hegemonic domain, involving the circulation of oppressive ideas in culture and media, provides a suitable frame to understand how race becomes something that only people that aren’t white have.

To fight for rebalancing of power relations, people are often required to take on a large amount of unpaid labour before it is (if it ever is) argued as a paid role. This puts a huge burden on people. For example, there is only a certain amount of time and money spent on diversity, equity, and inclusion (DEI), and the labour it involves often isn’t seen as a priority.

The complex effects of privilege make us wonder: should Privilege Hazard be a Data Hazards label? Along a similar line, should using ‘public’ information in a private way (e.g. to infer predictions about people’s pregnancy) be a Data Hazards label?

Where do you work – in your role how does data play in? Who does it benefit – and does it align with the science, surveillance, selling groups?#

We have worked with data in the contexts of biodiversity monitoring and medicine. In a biodiversity monitoring project, photographic data from the public was used. We can see how crowdsourcing data is useful to academics, however, it is easy to be naïve with respect to how exploitative crowdsourcing can be. To avoid exploitative practices, it is important to analyse the role of the three S’s, and how they play into broader power dynamics.

In medicine, we have worked to collect data from animals with the aim to advance medical treatment pathways. Adopting a perspective which includes the three S’s facilitates reflection on who benefitted and who was neglected or harmed. From a science and surveillance perspective, we wondered about the implications for genders that weren’t included in our study, and people with different hormone profiles to those we considered. From a selling perspective, we wondered how the data we gathered could be used to make drug development more or less affordable for certain groups. With a finite number of resources, cost-benefit analyses must be undertaken in drug development. These analyses include asking questions like is it financially viable to create a cure for rare diseases that affect very few people.

In the development of medical devices, we asked ourselves how data we are working with connects to the three S’s. In early days of development, data aligns more with science rather than selling. Medical trials, for instance, seem to involve mostly science, with a bit of surveillance. Later on, however, data will become selling. The translation of data to selling presents challenges if the group of whom the data is collected cannot give consent for medical reasons. There thus arises a tension between the effort to align the data to benefit the people who use the device, and the cautiousness required surrounding handling sensitive data. Data isn’t neutral, and when data is rewritten for different people, it may bias itself. When selling medical devices, there are many stakeholders whose needs must be balanced.

To mitigate negative effects of the three S’s, it is important to ask questions from the ground up, even if you aren’t directly involved with selling. Reflecting on our practices and possible implications of our research is critical to performing good science. Independent research studies are one method we are using to hold projects we’re involved in accountable. Data Hazards is also a useful tool to help people to evaluate their research. It is important to participate in research that you believe is socially beneficial and will positively impact people.

However, sometimes it is unclear if research data really benefits anyone. It can often seem that data is only relevant to the researcher themselves and their collaborators. The abstractness of this can make it difficult to mobilise people to improve data their practices. This is exacerbated by the difficulty with classifying ‘early career researchers’, as there is no defined group that can be collectively galvanised around issues.

When you are located on the science side of research, you don’t always notice the ‘selling’ side. However, it is still there, and it does impact who benefits from the research. Applying for project funding is a form of selling. New health technology and knowledge should be made available, rather than holding research as ‘currency’ and resisting sharing it.

There were some aspects of working with data that don’t fit neatly into the three S’s, however, such as the problem of missing datasets. Identifying with and locating certain groups or categories can be difficult, for example, health carers may not even know that they are carers. In the NHS, the level of influence and change varies according the project and its objectives. Tough questions and issues are expected to be raised in data science teams and industry.

The chapter outlines issues with under-representation in data, and over-representation. What are your thoughts on the tensions between these?#

When we first saw the list of missing data sets, we thought “great, let’s go and collect it!”. An example of why it’s important to consider missing data is survivorship bias, a term coined after the second world war to explain the mistakes of looking at where to reinforce planes by the ones which had been shot at and come home, rather than the planes which had been shot down.

However, we quickly stalled when we read the next part of the chapter explaining how people in power can use data of marginalised groups to oppress people. Issues with the misuse of data are exemplified in the case of research in the US being used to support abortion laws and oppress pregnant people.

Looking at both sides of the coin, we need the data to make changes, but we also need strong ethics to use it responsibly. We must ask who has control over the data, who can access the data, who benefits, and who predictive models are for. Rather than just asking for more data, we should also be looking for better data. Representation and diversity are important basics: people who are affected by research should also participating in conducting the research.

Data Feminism: Chapter 2 – Collect, Analyse, Imagine, Teach#

Principle: Challenge Power#

“Data feminism commits to challenging unequal power structures and working towards justice”

Chapter Summary#

This chapter argues that, alongside examining power, we must challenge power. Challenging power involves mobilising data science to push back against existing and unequal power structures. Four starting points to foster push back are suggested: collect (compile counterdata in the face of missing data); analyse (demonstrate inequitable outcomes, audit opaque algorithms, hold institutions accountable); imagine (in order to get to the root of injustice, imagine the end point not as “fairness” but co-liberation); teach (engage and empower newcomers to shift demographics and cultivate next generation).

  • Collect: when collecting counterdata, we should ask ‘who’ questions, such as who needs to be shown the harms of power differentials and what kind of proof they require to believe the oppression is real. We should also ask on whom the burden of proof is placed. Those in positions of power accept anecdotal evidence from those like themselves but lean more heavily towards requiring statistics from minoritised groups.

  • Analyse: to ensure that past inequalities aren’t propagated in current technologies, the chapter suggests moving from data ethics, which examines ethical issues brought about by reliance on data-driven systems, to data justice, which considers issues in a wider historical context.

  • Imagine: data justice should be pursued with the aim of co-liberation, striving for mutual benefit from both members of dominant and minoritised groups. Co-liberation requires transparency of methods and reflexivity, understood as the ability to reflect on and take responsibility for one’s own position within the multiple, intersecting dimensions of the matrix of domination.

  • Teach: firstly, instead of asking questions that perpetuate a narrative of deficiency, such as “why are there so few women computer scientists?”, we should ask questions like “how can we structurally transform STEM education together?” Secondly, it is important to “stay with the trouble” by persisting in your work, especially when it becomes uncomfortable, unclear, or outright upsetting. Thirdly, we should shift frame from concepts that secure power, like fairness (equality from the standpoint of today, disregarding historical injustices) and accountability, to those that challenge power, like equity (taking into account present power differentials and distributing accordingly) and co-liberation. Challenging power is supported by learning from and designing with the communities that we seek to support.

Definitions#

Racial capitalism – (Cedric Robinson) using data-driven methods to tailor policies to customers based on their demographics

Auditing algorithms - using data science methods (reverse engineering, collecting missing data) to examine bias in algorithms.

Deficit narratives - reducing a group or culture to its “problems” rather than portraying it with strength and agency.

Data justice (vs ethics) - acknowledging structural power differentials and working towards dismantling them (rather than simply locating problems without onward action).

New racism - belief that racism is due to a few bad actors

Discussion Summary#

Have you head of ‘counterdata’ before – how might it apply in your areas of lived experience or expertise?#

A clear definition of counterdata in the chapter would have been useful; we understood ‘counterdata’ as subversive data, or data that asks or answers different questions to data that already exists. Using data in subversive ways is not a new concept – Florence Nightingale collected counterdata on cholera in the Crimean War; similarly, John Snow collected counterdata to identify the source of cholera outbreaks. We have seen counterdata come up as a concept in the books Invisible Women and Weapons of Math Destruction.

There are several areas in which where we thought counterdata would be applicable, for instance, there seems to be very little data around the menopause. In a lot of cases where there appears to be a lack of data, the data is already there, but the right questions haven’t been asked. We wondered if, for cases like the menopause, gaps can be addressed by interrogating data that is already present.

In collecting counterdata, there are risks that arise when data that you are collecting for one cause is harmfully repurposed. Collecting data about homelessness, for example, could be used to find people aid, or it could be used to displace them. The collection of counterdata, especially if it is sensitive in nature, raises potential for privacy concerns in case of hacks, such as the 23andme data breach.

When counterdata has been collected, there is a risk that it simply might not work if it doesn’t match up to people’s experiences. Empowerment is hard to quantify, making it difficult to assess how effective counterdata is. If the response to counterdata is slow, we might not be around to see any changes.

To remedy issues that arise with counterdata, it is crucial to consider the wider social aspect. Afterall, what we are dealing with is a social problem that is encoded in data. A big lesson from climate change activism has been that whilst numbers are useful and important, to make an effective contribution you really need the qualitative aspect. To remain aware of how the subject of oppression is portrayed, we need to advocate for the stories that go alongside the data.

Despite its importance, there are difficulties that arise with the use of qualitative data. Qualitative data is not always considered as reliable as quantitative data, as it requires more resources to get the same amount of data. As a data analyst, there is a fear that if you put too much emphasis on qualitative aspects to tell stories without enough of the data, you won’t be listened to. Sometimes it is unclear whether quantitative or qualitative data will be more impactful. For example, regarding NHS waiting lists, we wondered if it is better to focus on the numbers or the individual stories to generate maximum impact. Usually, we need both data and the story to get a point across. Storytelling is powerful when we support it with data.

When employing stories for counterdata, it is important to be aware of the issues that can arise in the use of language, and how language can propagate existing power dynamics. For example, the gender binary has been perpetuated in children’s literature. Language of domination/violence/gambling also infiltrates medical language. The book Cassandra Speaks addresses relevant issues of language, gender, and power.

To mitigate language encoding bias, qualitative questions should be framed in ways that avoid leading metaphors and analogies, such as the war based metaphors in HIV medicine and research. It also matters who the storyteller is: they should be the people who are actually affected.

Imagine – what would it look like if we were ‘Data Justice Club’ rather than ‘Data Ethics Club’ (see Table 2.1)?#

Using “justice” instead of “ethics” implies action and gives the suggestion that the club is activist. Data justice club would thus be more outward facing and involved. This would require some changes to how the current club functions: we would have to do more and talk less. In some ways, data ethics club is a more suitable name for how we operate at the moment, as we discuss things rather than take action on things. Not all of us want to be part of an outward facing collective, as it is useful to have a place which is just for thought and discussion. There could be space for both a data ethics club and a data justice club.

If we were to lean in more towards data justice, we had a few ideas for what this would look like. Sharing our knowledge is one route; the role of academics is not just to collect data and do the research itself, but also to provide tools and education. Ethics isn’t as accessible as it should be and there are lots of organisations which need, but don’t have, data ethicist roles. The benefit of learning and education is that they promote empowerment of participants, as exemplified in the local Lotto project. Social change happens as a collective, and it is possible to see this work in that light. Data justice club could thus offer education and infrastructure to community groups who have the motivation to do data collection. This could take the form of providing a ‘consult service’ to other data research groups within the university (like a data ethics Ask JGI. Having a limited amount of time for other researchers to ask us questions could provide insights into how to address ethical aspects they might not have considered otherwise.

In addition to the community focussed alternatives, there are ways in which we could implement data justice on a personal level, by taking the information we learn in DEC and applying it to our daily lives. In doing so, it would be useful to work towards understanding a broader array of perspectives (e.g. different political viewpoints on issues). Having a Data Ethics wins channel in our slack group would be a good way to celebrate this. On a more involved level, data justice could look like participating in things like investigative journalism.

What would need to change for projects like Local Lotto to be used more in teaching data skills?#

With respect to changing the way undergrad courses teach data skills, we thought that instead of data ethics being taught as one lecture it could be infused throughout. Outside of university, more engagement with groups outside of academia who are fighting for social change, such as community and non-profit groups, would also improve mobilisation. An example of a relevant outreach project is DataFace, which is working to improve the data literacy of children in schools.

Problems with time and funding would need to be addressed to enable wider implementation of projects like Local Lotto. For many, involvement in these projects is extra to their current workload, and people do not have enough time to balance both. If the workload can be too much for people who are passionate about the cause, we thought it unlikely that data scientists without the same enthusiasm would be willing to dedicate their free time to these kinds of projects.

Data Feminism Chapter 3 – On Rational, Scientific, Objective Viewpoints from Mythical, Imaginary, Impossible Standpoints#

Principle: Elevate Emotion and Embodiment#

“Data feminism teaches us to value multiple forms of knowledge, including the knowledge that comes from people as living, feeling bodies in the world.”

Chapter Summary#

Traditionally, data science has placed value on ‘objectivity’ and ‘neutrality’, designing data visualisation in ways that are plain to avoid eliciting emotion. Underpinning the principles of ‘objective’ data visualisation is statistical graphics expert Edward Tuft’s ‘data-ink ratio’, which denotes that designers should aim to use ink to display data alone. Separating emotion from knowledge is assumed to avoid attempting to persuade, giving readers the space to interpret results for themselves. However, this chapter argues that the lack of persuasion is an illusion. No matter how the data is presented, there will always be decisions that angle the data from a particular viewpoint; usually, the viewpoint of the dominant, default group.

When visualising data it is impossible to avoid interpretation as, unless you are presenting the raw data, there will be some things that are necessarily highlighted and some things that are necessarily obscured. Conventions of data visualisation reinforce people’s perceptions of their factual basis, contributing to the perception of it as objective, scientific, and neutral, and making it more likely that people will believe it. Donna Haraway argues that all forms of knowledge are situated, produced by specific people in specific cultural, historical, and geographic circumstances. Disclosing your position is an important strategy to be transparent about the limits of your knowledge claims.

The logic that grounds the false binary between emotion and reason is gendered because of the stereotype that men are more emotional than women. Emotion can be leveraged alongside visual minimalism to engage different parts of the brain, allowing for a range of learning types and communicating to a wider group of people. Harnessing emotion can help people to experience uncertainty, something which is famously difficult to communicate in data visualisation.

When thinking about how to present data, there is not one hard and fast rule. Each person has a unique skillset and intersection of subject positions which can offer a set of valuable perspectives that frame your work. Decisions should be informed by context, working towards a more holistic and inclusive ideal.

Definitions#

Framing effect – the choices that designers make when visualising data (what to highlight and what to obscure) impacts how people interpret the graphics and what they take away from them

Provenance rhetoric Hullman and Diakopoulos – signalling the transparency and trustworthiness of the presentation source to end users by citing the source of one’s data, increasing the likelihood that viewers will believe what they see

Feminist objectivity D Haraway - that all forms of knowledge are situated, that is that they are produced by specific people in specific circumstances. We can use this to bring together multiple partial perspectives

God trick/view from nowhere D Haraway - the perceived ability to use an impossible/imaginary standpoint that appears to be neutral.

Discussion Summary#

What does it mean to you to ‘Elevate Emotion and Embodiment’ - in data visualisation and/or data generally?#

Elevating emotion and embodiment can help circumnavigate the myth of rationality and impartiality. There are some things that we experience and know to be true, however, scientific process demands empirical evidence to accept they are real. Demanding empirical evidence can detract from other forms of knowledge. When we simplify behaviour research to animal models, for example, we lose context; we think that there must be more useful and humane methods for research.

Science as we know it today was shaped by the Enlightenment movement in Europe, which advanced rationalism and empiricism and discredited the importance of emotion and embodiment. Ideas from the Enlightenment propagated around the world through European colonies, consequentially supressing indigenous knowledge and ways of living. Revaluing emotion and embodiment creates space for different types of knowledge systems. Indigenous populations have been incorporating data visualisation into record keeping for centuries, and have ingenious methods of data collection such as the Marshall Islands stick charts which represent ocean swell patterns. Data sovereignty movements are working towards indigenous communities regaining control over their information whilst pushing back against data colonialism and its harms. In Canada there are toolkits to support indigenous governments in managing and owning their data, appreciating different types of dynamic knowledge systems including qualitative, oral, empirical, and scientific knowledge.

Incorporating more types of knowledge broadens the experience that we can have of data. Whilst the final output may be static, the process of iteratively designing a data visualisation is actually quite embodied. We can expand embodiment to the final output itself by harnessing multiple senses to communicate data. Using illustrative visuals can also be very effective in conveying a message and more colour doesn’t mean less factual. There is a lot of value in making data visualisation emotive, as it can help to represent topics beyond the binary. Data visualisations during the pandemic, for example, were very emotive.

Minimising ink use, making visualisation less expressive and more sterile, is reductionist and ignores potential connotations. Being minimal in the way data is displayed does not mean it is impartial; minimalism is influenced by entrenched biases. The idea of minimising extra ink being best practice is linked to men as ‘less emotional’ and ‘more factual’ than women. How people respond to certain colours is also influenced by their culture, such as Western populations reacting to red-green colours.

Minimalist design decisions, such as choice of colour palette, may often be subliminal yet are impactful for accessibility. Making a visualisation accessible for one group often makes it inaccessible for another. For example, when plotting points for app usage in schools in Bristol we used a heat map (red, yellow, green) to indicate levels of use. Using a red-yellow-green spectrum isn’t colourblind friendly at all, however. People can check their designs using colour blindness checking palettes.

We also wondered who it is that gets to do embodied/visceral art: what resources are required; how easy it is to access those resources; how equal access is across different areas of society.

What did you think of the examples of how positionality is expressed in data visualisations (intended or not) using different graph styles, colour or annotations?#

Default data visualisations imply a white and western interpretation of science, insinuating that because it has this positionality it should be trusted. Whilst Florence Nightingale’s greatest contribution to combating disease and death resulted from the graphs she made to back her public health campaigns, she also drew attention to how graphs persuade, whether or not they depict reality.

What did you think of the reactions of NYT readers to the election gauge? How should we represent uncertainty?#

People generally hate uncertainty, which could be because uncertainty is often equated with messiness. It is important to remember that a lot of people haven’t touched statistics or maths since they were 16, making it difficult to understand statistical uncertainty. Even professors can misunderstand confidence intervals. We wondered if talking about confidence conveys uncertainty, or if talking about significance conveys certainty. People have a lot of cognitive biases which affect the way they interpret data. For example, truncating the y-axis so that it doesn’t start at 0 can heavily influence the way that a graph is read.

Audiences not understanding data visualisation until it’s explained to them is such a common real world problem. We find that some audiences expect ‘boring’ and ostensibly ‘neutral’ visualisations that this chapter criticises. However, if we don’t provide those kinds of visualisations, the audience thinks that we’re either not doing our job properly or we have an agenda and are actively manipulating them. It is difficult to balance these expectations with the recommendations of this chapter.

Figuring out how to balance expectations can perhaps be helped by our takeaways from this chapter: multiple views of data are relevant, and emotion is a valid one of these – not just that ‘emotion is important’. Christian Amanpour famously argued that the role of a journalist is to be truthful, not neutral.

Thinking about how to present data is supported by Nicole Dalzell’s three questions for pedagogical decisions that affect how LGBTQ+ students learn in the classroom:

  • What students are not supported by the design decision?

  • How can we adapt to reflect student needs not supported by the initial design?

  • Is applying this design decision sustainable for both me and the students?