Data Feminism: Chapter 1 – The Power Chapter#

What’s this?

This is summary of the discussions from Data Feminism Book Club, where we spoke and wrote about Data Feminism by Catherine D’Ignazio and Lauren F. Klein over the summer of 2024. We hope you enjoy this writeup. We aim to run another Book Club in the summer of 2025! In the mean time, the co-organisers would be really enthusiastic to support anyone interested in running another book club! Please reach out if you want to get involved. The summary was written by Jessica Woodgate, who tried to synthesise everyone’s contributions to this document and the discussion. “We” = “someone at Data Ethics Club”. Nina Di Cara, Huw Day, Natalie Thurlby, Amy Joint, and Vanessa Hanschke helped with the final edit.

Principle: Examine Power#

“Data feminism begins by analysing how power operates in the world.”

Chapter Summary#

This chapter explores how to examine power through naming and explaining forces of oppression. Power is defined as the configuration of structural privilege and oppression where some have an unfair advantage because systems are designed by people like them and, because of this, those systems work better for them. To explain the configuration and experience of systems of power, the matrix of domination maps how different domains of society (structural, hegemonic, disciplinary, interpersonal) interact. When examining power dynamics in systems, we should ask “who” questions: who does the work (what is the effect of their lived experience?); who benefits (whose data is missing, and who is oversurveilled?); whose priorities get turned into products. A hyperfocus on science, surveillance, and selling can cause significant oversights with life-altering consequences.

Definitions#

power - the current configuration of structural privilege and structural oppression, in which some groups experience unearned advantages and other groups experience systematic disadvantages

matrix of domination [P Hill-Collins]- (p25) a concept that explains how systems of power are configured and experienced.

  • structural domain: organises oppression (law/politics)

  • hegemonic domain: circulates oppressive ideas (culture/media)

  • disciplinary domain: administers and manages oppression (enforcing law/policies)

  • interpersonal domain: individual experiences

minoritized - social group that is actively devalued and oppressed by a dominant group (even when they the majority group e.g. women)

privilege hazard - when decisions are made by those who make up dominant groups to the exclusion of other identities or perspectives crucial input is missed.

counterdata - data that are produced by civil society groups or individuals in order to counter missing data or to challenge existing official data.

Discussion Summary#

Have you seen the matrix of domination before? What did you think – did you fully understand it?#

Many of us hadn’t seen the matrix of domination in this format before, however, were familiar with its parts. Considering our understandings of these domains, it intuitively made sense to see them linked together in this way. For some of us, the matrix has made visible ways we are affected by power that we were not previously aware of.

Mapping together the aspects of domination helps us to talk about issues with power which we have always been aware of but haven’t necessarily had the vocabulary to articulate. In particular, we found it useful to visualise the relation between structural and disciplinary domains. Both domains are informed by the hegemonic domain and also trickle down into one another. However, clearly differentiating the domains facilitates decoupling them to infer actionable insights. The interplay of separateness and connectivity illustrates how causes might not be independent, but solutions can be.

The matrix of domination provides a great framework to support thinking and speech around issues with power which can be hard to articulate. The simple yet informative layout sets the matrix up to be a great communication tool. By providing us with the right understanding and words, we can use it to advocate for ourselves and others.

We also discussed the possible reasons why it was presented as a matrix. We wondered if the use of a matrix was so that it occupies more of a 3D space, reflecting intersectionality and build-up of different domains interacting. Each section needs to occupy its own space but also interact with the other sections. It is tempting to put things into a hierarchy, but it isn’t always possible to order and measure how factors impact people.

However, whilst labelling can be useful, we wondered if it is effective when people don’t believe in the labels themselves. There are always excuses for oppressive behaviour; sometimes we wonder if efforts to examine power make any difference. Labels and other means of categorisation can also feel quite abstract. To put the matrix of domination into practice, and facilitate unlearning, it would be useful to have some more examples about which groups are being dominant. Examples can be found by asking who has privilege and power, and why they have it.

One example of imbalanced power can be found in the context of work, evidenced by disparities such as the gender pay gap. Symptoms of domination in the work place include expectations on people to assimilate to men’s behaviour, such as the wave of “girlboss” feminism and assertiveness training. Rather than challenging existing power imbalances, these kinds of behavioural management techniques imply that success can only be obtained by adapting behaviour to ways that are implicitly white and male; you can only succeed at work if you learn how to communicate like the dominant group. For women, this is a double-edged sword. On the one hand, women are not expected to display masculine traits and conform to traditional ideas of femininity (or be called childless cat ladies). On the other hand, to be seen as a leader they are expected to present as confident and tough. Women must therefore tailor their vocabulary and behaviour to navigate these complex dynamics.

Imposition of behavioural expectations from the dominant group, and other forms of discrimination, often manifest through microaggressions. A microaggression is a “subtle, often unintentional, form of prejudice… often takes the shape of an offhand comment, an inadvertently painful joke, or a pointed insult.”. On the interpersonal level, it can be difficult to articulate and fight against microaggressions. The matrix of domination may help us to address microaggressions, by situating them in a wider context and grounding our explanaitions for why they are wrong.

Alongside challenging entrenched thoughts and behaviours like microaggressions, it is important to apply the matrix of domination and examination of power to change policy. Equality in the personal and hegemonic domains can take a long time. For example, not all women got the right to vote in 1920; for black women, the Voting Rights Act was not signed into law until 1965. Even when the right to vote has been put into legislation, to be able to vote you must still have the right documentation, get transport to voting stations, and take time off of work. The time and resources required constitute barriers for people with low incomes: “financial security is strongly correlated with nearly every measure of political engagement”. The hegemonic domain, involving the circulation of oppressive ideas in culture and media, provides a suitable frame to understand how race becomes something that only people that aren’t white have.

To fight for rebalancing of power relations, people are often required to take on a large amount of unpaid labour before it is (if it ever is) argued as a paid role. This puts a huge burden on people. For example, there is only a certain amount of time and money spent on diversity, equity, and inclusion (DEI), and the labour it involves often isn’t seen as a priority.

The complex effects of privilege make us wonder: should Privilege Hazard be a Data Hazards label? Along a similar line, should using ‘public’ information in a private way (e.g. to infer predictions about people’s pregnancy) be a Data Hazards label?

Where do you work – in your role how does data play in? Who does it benefit – and does it align with the science, surveillance, selling groups?#

We have worked with data in the contexts of biodiversity monitoring and medicine. In a biodiversity monitoring project, photographic data from the public was used. We can see how crowdsourcing data is useful to academics, however, it is easy to be naïve with respect to how exploitative crowdsourcing can be. To avoid exploitative practices, it is important to analyse the role of the three S’s, and how they play into broader power dynamics.

In medicine, we have worked to collect data from animals with the aim to advance medical treatment pathways. Adopting a perspective which includes the three S’s facilitates reflection on who benefitted and who was neglected or harmed. From a science and surveillance perspective, we wondered about the implications for genders that weren’t included in our study, and people with different hormone profiles to those we considered. From a selling perspective, we wondered how the data we gathered could be used to make drug development more or less affordable for certain groups. With a finite number of resources, cost-benefit analyses must be undertaken in drug development. These analyses include asking questions like is it financially viable to create a cure for rare diseases that affect very few people.

In the development of medical devices, we asked ourselves how data we are working with connects to the three S’s. In early days of development, data aligns more with science rather than selling. Medical trials, for instance, seem to involve mostly science, with a bit of surveillance. Later on, however, data will become selling. The translation of data to selling presents challenges if the group of whom the data is collected cannot give consent for medical reasons. There thus arises a tension between the effort to align the data to benefit the people who use the device, and the cautiousness required surrounding handling sensitive data. Data isn’t neutral, and when data is rewritten for different people, it may bias itself. When selling medical devices, there are many stakeholders whose needs must be balanced.

To mitigate negative effects of the three S’s, it is important to ask questions from the ground up, even if you aren’t directly involved with selling. Reflecting on our practices and possible implications of our research is critical to performing good science. Independent research studies are one method we are using to hold projects we’re involved in accountable. Data Hazards is also a useful tool to help people to evaluate their research. It is important to participate in research that you believe is socially beneficial and will positively impact people.

However, sometimes it is unclear if research data really benefits anyone. It can often seem that data is only relevant to the researcher themselves and their collaborators. The abstractness of this can make it difficult to mobilise people to improve data their practices. This is exacerbated by the difficulty with classifying ‘early career researchers’, as there is no defined group that can be collectively galvanised around issues.

When you are located on the science side of research, you don’t always notice the ‘selling’ side. However, it is still there, and it does impact who benefits from the research. Applying for project funding is a form of selling. New health technology and knowledge should be made available, rather than holding research as ‘currency’ and resisting sharing it.

There were some aspects of working with data that don’t fit neatly into the three S’s, however, such as the problem of missing datasets. Identifying with and locating certain groups or categories can be difficult, for example, health carers may not even know that they are carers. In the NHS, the level of influence and change varies according the project and its objectives. Tough questions and issues are expected to be raised in data science teams and industry.

The chapter outlines issues with under-representation in data, and over-representation. What are your thoughts on the tensions between these?#

When we first saw the list of missing data sets, we thought “great, let’s go and collect it!”. An example of why it’s important to consider missing data is survivorship bias, a term coined after the second world war to explain the mistakes of looking at where to reinforce planes by the ones which had been shot at and come home, rather than the planes which had been shot down.

However, we quickly stalled when we read the next part of the chapter explaining how people in power can use data of marginalised groups to oppress people. Issues with the misuse of data are exemplified in the case of research in the US being used to support abortion laws and oppress pregnant people.

Looking at both sides of the coin, we need the data to make changes, but we also need strong ethics to use it responsibly. We must ask who has control over the data, who can access the data, who benefits, and who predictive models are for. Rather than just asking for more data, we should also be looking for better data. Representation and diversity are important basics: people who are affected by research should also participating in conducting the research.

Data Feminism: Chapter 2 – Collect, Analyse, Imagine, Teach#

Principle: Challenge Power#

“Data feminism commits to challenging unequal power structures and working towards justice”

Chapter Summary#

This chapter argues that, alongside examining power, we must challenge power. Challenging power involves mobilising data science to push back against existing and unequal power structures. Four starting points to foster push back are suggested: collect (compile counterdata in the face of missing data); analyse (demonstrate inequitable outcomes, audit opaque algorithms, hold institutions accountable); imagine (in order to get to the root of injustice, imagine the end point not as “fairness” but co-liberation); teach (engage and empower newcomers to shift demographics and cultivate next generation).

  • Collect: when collecting counterdata, we should ask ‘who’ questions, such as who needs to be shown the harms of power differentials and what kind of proof they require to believe the oppression is real. We should also ask on whom the burden of proof is placed. Those in positions of power accept anecdotal evidence from those like themselves but lean more heavily towards requiring statistics from minoritised groups.

  • Analyse: to ensure that past inequalities aren’t propagated in current technologies, the chapter suggests moving from data ethics, which examines ethical issues brought about by reliance on data-driven systems, to data justice, which considers issues in a wider historical context.

  • Imagine: data justice should be pursued with the aim of co-liberation, striving for mutual benefit from both members of dominant and minoritised groups. Co-liberation requires transparency of methods and reflexivity, understood as the ability to reflect on and take responsibility for one’s own position within the multiple, intersecting dimensions of the matrix of domination.

  • Teach: firstly, instead of asking questions that perpetuate a narrative of deficiency, such as “why are there so few women computer scientists?”, we should ask questions like “how can we structurally transform STEM education together?” Secondly, it is important to “stay with the trouble” by persisting in your work, especially when it becomes uncomfortable, unclear, or outright upsetting. Thirdly, we should shift frame from concepts that secure power, like fairness (equality from the standpoint of today, disregarding historical injustices) and accountability, to those that challenge power, like equity (taking into account present power differentials and distributing accordingly) and co-liberation. Challenging power is supported by learning from and designing with the communities that we seek to support.

Definitions#

Racial capitalism – (Cedric Robinson) using data-driven methods to tailor policies to customers based on their demographics

Auditing algorithms - using data science methods (reverse engineering, collecting missing data) to examine bias in algorithms.

Deficit narratives - reducing a group or culture to its “problems” rather than portraying it with strength and agency.

Data justice (vs ethics) - acknowledging structural power differentials and working towards dismantling them (rather than simply locating problems without onward action).

New racism - belief that racism is due to a few bad actors

Discussion Summary#

Have you head of ‘counterdata’ before – how might it apply in your areas of lived experience or expertise?#

A clear definition of counterdata in the chapter would have been useful; we understood ‘counterdata’ as subversive data, or data that asks or answers different questions to data that already exists. Using data in subversive ways is not a new concept – Florence Nightingale collected counterdata on cholera in the Crimean War; similarly, John Snow collected counterdata to identify the source of cholera outbreaks. We have seen counterdata come up as a concept in the books Invisible Women and Weapons of Math Destruction.

There are several areas in which where we thought counterdata would be applicable, for instance, there seems to be very little data around the menopause. In a lot of cases where there appears to be a lack of data, the data is already there, but the right questions haven’t been asked. We wondered if, for cases like the menopause, gaps can be addressed by interrogating data that is already present.

In collecting counterdata, there are risks that arise when data that you are collecting for one cause is harmfully repurposed. Collecting data about homelessness, for example, could be used to find people aid, or it could be used to displace them. The collection of counterdata, especially if it is sensitive in nature, raises potential for privacy concerns in case of hacks, such as the 23andme data breach.

When counterdata has been collected, there is a risk that it simply might not work if it doesn’t match up to people’s experiences. Empowerment is hard to quantify, making it difficult to assess how effective counterdata is. If the response to counterdata is slow, we might not be around to see any changes.

To remedy issues that arise with counterdata, it is crucial to consider the wider social aspect. Afterall, what we are dealing with is a social problem that is encoded in data. A big lesson from climate change activism has been that whilst numbers are useful and important, to make an effective contribution you really need the qualitative aspect. To remain aware of how the subject of oppression is portrayed, we need to advocate for the stories that go alongside the data.

Despite its importance, there are difficulties that arise with the use of qualitative data. Qualitative data is not always considered as reliable as quantitative data, as it requires more resources to get the same amount of data. As a data analyst, there is a fear that if you put too much emphasis on qualitative aspects to tell stories without enough of the data, you won’t be listened to. Sometimes it is unclear whether quantitative or qualitative data will be more impactful. For example, regarding NHS waiting lists, we wondered if it is better to focus on the numbers or the individual stories to generate maximum impact. Usually, we need both data and the story to get a point across. Storytelling is powerful when we support it with data.

When employing stories for counterdata, it is important to be aware of the issues that can arise in the use of language, and how language can propagate existing power dynamics. For example, the gender binary has been perpetuated in children’s literature. Language of domination/violence/gambling also infiltrates medical language. The book Cassandra Speaks addresses relevant issues of language, gender, and power.

To mitigate language encoding bias, qualitative questions should be framed in ways that avoid leading metaphors and analogies, such as the war based metaphors in HIV medicine and research. It also matters who the storyteller is: they should be the people who are actually affected.

Imagine – what would it look like if we were ‘Data Justice Club’ rather than ‘Data Ethics Club’ (see Table 2.1)?#

Using “justice” instead of “ethics” implies action and gives the suggestion that the club is activist. Data justice club would thus be more outward facing and involved. This would require some changes to how the current club functions: we would have to do more and talk less. In some ways, data ethics club is a more suitable name for how we operate at the moment, as we discuss things rather than take action on things. Not all of us want to be part of an outward facing collective, as it is useful to have a place which is just for thought and discussion. There could be space for both a data ethics club and a data justice club.

If we were to lean in more towards data justice, we had a few ideas for what this would look like. Sharing our knowledge is one route; the role of academics is not just to collect data and do the research itself, but also to provide tools and education. Ethics isn’t as accessible as it should be and there are lots of organisations which need, but don’t have, data ethicist roles. The benefit of learning and education is that they promote empowerment of participants, as exemplified in the local Lotto project. Social change happens as a collective, and it is possible to see this work in that light. Data justice club could thus offer education and infrastructure to community groups who have the motivation to do data collection. This could take the form of providing a ‘consult service’ to other data research groups within the university (like a data ethics Ask JGI. Having a limited amount of time for other researchers to ask us questions could provide insights into how to address ethical aspects they might not have considered otherwise.

In addition to the community focussed alternatives, there are ways in which we could implement data justice on a personal level, by taking the information we learn in DEC and applying it to our daily lives. In doing so, it would be useful to work towards understanding a broader array of perspectives (e.g. different political viewpoints on issues). Having a Data Ethics wins channel in our slack group would be a good way to celebrate this. On a more involved level, data justice could look like participating in things like investigative journalism.

What would need to change for projects like Local Lotto to be used more in teaching data skills?#

With respect to changing the way undergrad courses teach data skills, we thought that instead of data ethics being taught as one lecture it could be infused throughout. Outside of university, more engagement with groups outside of academia who are fighting for social change, such as community and non-profit groups, would also improve mobilisation. An example of a relevant outreach project is DataFace, which is working to improve the data literacy of children in schools.

Problems with time and funding would need to be addressed to enable wider implementation of projects like Local Lotto. For many, involvement in these projects is extra to their current workload, and people do not have enough time to balance both. If the workload can be too much for people who are passionate about the cause, we thought it unlikely that data scientists without the same enthusiasm would be willing to dedicate their free time to these kinds of projects.