Data Ethics Club reads “Data Feminism”: Summer Bookclub 2024#

What’s this?

This is summary of the discussions from Data Feminism Book Club, where we spoke and wrote about Data Feminism by Catherine D’Ignazio and Lauren F. Klein over the summer of 2024. We hope you enjoy this writeup. We aim to run another Book Club in the summer of 2025! In the mean time, the co-organisers would be really enthusiastic to support anyone interested in running another book club! Please reach out if you want to get involved. The summary was written by Jessica Woodgate, who tried to synthesise everyone’s contributions to this document and the discussion. “We” = “someone at Data Ethics Club”. Nina Di Cara, Huw Day, Natalie Thurlby, Amy Joint, and Vanessa Hanschke helped with the final edit.

Data Feminism: Chapter 1 – The Power Chapter#

Principle: Examine Power#

“Data feminism begins by analysing how power operates in the world.”

Chapter Summary#

This chapter explores how to examine power through naming and explaining forces of oppression. Power is defined as the configuration of structural privilege and oppression where some have an unfair advantage because systems are designed by people like them and, because of this, those systems work better for them. To explain the configuration and experience of systems of power, the matrix of domination maps how different domains of society (structural, hegemonic, disciplinary, interpersonal) interact. When examining power dynamics in systems, we should ask “who” questions: who does the work (what is the effect of their lived experience?); who benefits (whose data is missing, and who is oversurveilled?); whose priorities get turned into products. A hyperfocus on science, surveillance, and selling can cause significant oversights with life-altering consequences.

Definitions#

power - the current configuration of structural privilege and structural oppression, in which some groups experience unearned advantages and other groups experience systematic disadvantages

matrix of domination [P Hill-Collins]- (p25) a concept that explains how systems of power are configured and experienced.

  • structural domain: organises oppression (law/politics)

  • hegemonic domain: circulates oppressive ideas (culture/media)

  • disciplinary domain: administers and manages oppression (enforcing law/policies)

  • interpersonal domain: individual experiences

minoritized - social group that is actively devalued and oppressed by a dominant group (even when they the majority group e.g. women)

privilege hazard - when decisions are made by those who make up dominant groups to the exclusion of other identities or perspectives crucial input is missed.

counterdata - data that are produced by civil society groups or individuals in order to counter missing data or to challenge existing official data.

Discussion Summary#

Have you seen the matrix of domination before? What did you think – did you fully understand it?#

Many of us hadn’t seen the matrix of domination in this format before, however, were familiar with its parts. Considering our understandings of these domains, it intuitively made sense to see them linked together in this way. For some of us, the matrix has made visible ways we are affected by power that we were not previously aware of.

Mapping together the aspects of domination helps us to talk about issues with power which we have always been aware of but haven’t necessarily had the vocabulary to articulate. In particular, we found it useful to visualise the relation between structural and disciplinary domains. Both domains are informed by the hegemonic domain and also trickle down into one another. However, clearly differentiating the domains facilitates decoupling them to infer actionable insights. The interplay of separateness and connectivity illustrates how causes might not be independent, but solutions can be.

The matrix of domination provides a great framework to support thinking and speech around issues with power which can be hard to articulate. The simple yet informative layout sets the matrix up to be a great communication tool. By providing us with the right understanding and words, we can use it to advocate for ourselves and others.

We also discussed the possible reasons why it was presented as a matrix. We wondered if the use of a matrix was so that it occupies more of a 3D space, reflecting intersectionality and build-up of different domains interacting. Each section needs to occupy its own space but also interact with the other sections. It is tempting to put things into a hierarchy, but it isn’t always possible to order and measure how factors impact people.

However, whilst labelling can be useful, we wondered if it is effective when people don’t believe in the labels themselves. There are always excuses for oppressive behaviour; sometimes we wonder if efforts to examine power make any difference. Labels and other means of categorisation can also feel quite abstract. To put the matrix of domination into practice, and facilitate unlearning, it would be useful to have some more examples about which groups are being dominant. Examples can be found by asking who has privilege and power, and why they have it.

One example of imbalanced power can be found in the context of work, evidenced by disparities such as the gender pay gap. Symptoms of domination in the work place include expectations on people to assimilate to men’s behaviour, such as the wave of “girlboss” feminism and assertiveness training. Rather than challenging existing power imbalances, these kinds of behavioural management techniques imply that success can only be obtained by adapting behaviour to ways that are implicitly white and male; you can only succeed at work if you learn how to communicate like the dominant group. For women, this is a double-edged sword. On the one hand, women are not expected to display masculine traits and conform to traditional ideas of femininity (or be called childless cat ladies). On the other hand, to be seen as a leader they are expected to present as confident and tough. Women must therefore tailor their vocabulary and behaviour to navigate these complex dynamics.

Imposition of behavioural expectations from the dominant group, and other forms of discrimination, often manifest through microaggressions. A microaggression is a “subtle, often unintentional, form of prejudice… often takes the shape of an offhand comment, an inadvertently painful joke, or a pointed insult.”. On the interpersonal level, it can be difficult to articulate and fight against microaggressions. The matrix of domination may help us to address microaggressions, by situating them in a wider context and grounding our explanaitions for why they are wrong.

Alongside challenging entrenched thoughts and behaviours like microaggressions, it is important to apply the matrix of domination and examination of power to change policy. Equality in the personal and hegemonic domains can take a long time. For example, not all women got the right to vote in 1920; for black women, the Voting Rights Act was not signed into law until 1965. Even when the right to vote has been put into legislation, to be able to vote you must still have the right documentation, get transport to voting stations, and take time off of work. The time and resources required constitute barriers for people with low incomes: “financial security is strongly correlated with nearly every measure of political engagement”. The hegemonic domain, involving the circulation of oppressive ideas in culture and media, provides a suitable frame to understand how race becomes something that only people that aren’t white have.

To fight for rebalancing of power relations, people are often required to take on a large amount of unpaid labour before it is (if it ever is) argued as a paid role. This puts a huge burden on people. For example, there is only a certain amount of time and money spent on diversity, equity, and inclusion (DEI), and the labour it involves often isn’t seen as a priority.

The complex effects of privilege make us wonder: should Privilege Hazard be a Data Hazards label? Along a similar line, should using ‘public’ information in a private way (e.g. to infer predictions about people’s pregnancy) be a Data Hazards label?

Where do you work – in your role how does data play in? Who does it benefit – and does it align with the science, surveillance, selling groups?#

We have worked with data in the contexts of biodiversity monitoring and medicine. In a biodiversity monitoring project, photographic data from the public was used. We can see how crowdsourcing data is useful to academics, however, it is easy to be naïve with respect to how exploitative crowdsourcing can be. To avoid exploitative practices, it is important to analyse the role of the three S’s, and how they play into broader power dynamics.

In medicine, we have worked to collect data from animals with the aim to advance medical treatment pathways. Adopting a perspective which includes the three S’s facilitates reflection on who benefitted and who was neglected or harmed. From a science and surveillance perspective, we wondered about the implications for genders that weren’t included in our study, and people with different hormone profiles to those we considered. From a selling perspective, we wondered how the data we gathered could be used to make drug development more or less affordable for certain groups. With a finite number of resources, cost-benefit analyses must be undertaken in drug development. These analyses include asking questions like is it financially viable to create a cure for rare diseases that affect very few people.

In the development of medical devices, we asked ourselves how data we are working with connects to the three S’s. In early days of development, data aligns more with science rather than selling. Medical trials, for instance, seem to involve mostly science, with a bit of surveillance. Later on, however, data will become selling. The translation of data to selling presents challenges if the group of whom the data is collected cannot give consent for medical reasons. There thus arises a tension between the effort to align the data to benefit the people who use the device, and the cautiousness required surrounding handling sensitive data. Data isn’t neutral, and when data is rewritten for different people, it may bias itself. When selling medical devices, there are many stakeholders whose needs must be balanced.

To mitigate negative effects of the three S’s, it is important to ask questions from the ground up, even if you aren’t directly involved with selling. Reflecting on our practices and possible implications of our research is critical to performing good science. Independent research studies are one method we are using to hold projects we’re involved in accountable. Data Hazards is also a useful tool to help people to evaluate their research. It is important to participate in research that you believe is socially beneficial and will positively impact people.

However, sometimes it is unclear if research data really benefits anyone. It can often seem that data is only relevant to the researcher themselves and their collaborators. The abstractness of this can make it difficult to mobilise people to improve data their practices. This is exacerbated by the difficulty with classifying ‘early career researchers’, as there is no defined group that can be collectively galvanised around issues.

When you are located on the science side of research, you don’t always notice the ‘selling’ side. However, it is still there, and it does impact who benefits from the research. Applying for project funding is a form of selling. New health technology and knowledge should be made available, rather than holding research as ‘currency’ and resisting sharing it.

There were some aspects of working with data that don’t fit neatly into the three S’s, however, such as the problem of missing datasets. Identifying with and locating certain groups or categories can be difficult, for example, health carers may not even know that they are carers. In the NHS, the level of influence and change varies according the project and its objectives. Tough questions and issues are expected to be raised in data science teams and industry.

The chapter outlines issues with under-representation in data, and over-representation. What are your thoughts on the tensions between these?#

When we first saw the list of missing data sets, we thought “great, let’s go and collect it!”. An example of why it’s important to consider missing data is survivorship bias, a term coined after the second world war to explain the mistakes of looking at where to reinforce planes by the ones which had been shot at and come home, rather than the planes which had been shot down.

However, we quickly stalled when we read the next part of the chapter explaining how people in power can use data of marginalised groups to oppress people. Issues with the misuse of data are exemplified in the case of research in the US being used to support abortion laws and oppress pregnant people.

Looking at both sides of the coin, we need the data to make changes, but we also need strong ethics to use it responsibly. We must ask who has control over the data, who can access the data, who benefits, and who predictive models are for. Rather than just asking for more data, we should also be looking for better data. Representation and diversity are important basics: people who are affected by research should also participating in conducting the research.

Data Feminism: Chapter 2 – Collect, Analyse, Imagine, Teach#

Principle: Challenge Power#

“Data feminism commits to challenging unequal power structures and working towards justice”

Chapter Summary#

This chapter argues that, alongside examining power, we must challenge power. Challenging power involves mobilising data science to push back against existing and unequal power structures. Four starting points to foster push back are suggested: collect (compile counterdata in the face of missing data); analyse (demonstrate inequitable outcomes, audit opaque algorithms, hold institutions accountable); imagine (in order to get to the root of injustice, imagine the end point not as “fairness” but co-liberation); teach (engage and empower newcomers to shift demographics and cultivate next generation).

  • Collect: when collecting counterdata, we should ask ‘who’ questions, such as who needs to be shown the harms of power differentials and what kind of proof they require to believe the oppression is real. We should also ask on whom the burden of proof is placed. Those in positions of power accept anecdotal evidence from those like themselves but lean more heavily towards requiring statistics from minoritised groups.

  • Analyse: to ensure that past inequalities aren’t propagated in current technologies, the chapter suggests moving from data ethics, which examines ethical issues brought about by reliance on data-driven systems, to data justice, which considers issues in a wider historical context.

  • Imagine: data justice should be pursued with the aim of co-liberation, striving for mutual benefit from both members of dominant and minoritised groups. Co-liberation requires transparency of methods and reflexivity, understood as the ability to reflect on and take responsibility for one’s own position within the multiple, intersecting dimensions of the matrix of domination.

  • Teach: firstly, instead of asking questions that perpetuate a narrative of deficiency, such as “why are there so few women computer scientists?”, we should ask questions like “how can we structurally transform STEM education together?” Secondly, it is important to “stay with the trouble” by persisting in your work, especially when it becomes uncomfortable, unclear, or outright upsetting. Thirdly, we should shift frame from concepts that secure power, like fairness (equality from the standpoint of today, disregarding historical injustices) and accountability, to those that challenge power, like equity (taking into account present power differentials and distributing accordingly) and co-liberation. Challenging power is supported by learning from and designing with the communities that we seek to support.

Definitions#

Racial capitalism, Cedric Robinson using data-driven methods to tailor policies to customers based on their demographics

Auditing algorithms - using data science methods (reverse engineering, collecting missing data) to examine bias in algorithms.

Deficit narratives - reducing a group or culture to its “problems” rather than portraying it with strength and agency.

Data justice (vs ethics) - acknowledging structural power differentials and working towards dismantling them (rather than simply locating problems without onward action).

New racism - belief that racism is due to a few bad actors

Discussion Summary#

Have you head of ‘counterdata’ before – how might it apply in your areas of lived experience or expertise?#

A clear definition of counterdata in the chapter would have been useful; we understood ‘counterdata’ as subversive data, or data that asks or answers different questions to data that already exists. Using data in subversive ways is not a new concept – Florence Nightingale collected counterdata on cholera in the Crimean War; similarly, John Snow collected counterdata to identify the source of cholera outbreaks. We have seen counterdata come up as a concept in the books Invisible Women and Weapons of Math Destruction.

There are several areas in which where we thought counterdata would be applicable, for instance, there seems to be very little data around the menopause. In a lot of cases where there appears to be a lack of data, the data is already there, but the right questions haven’t been asked. We wondered if, for cases like the menopause, gaps can be addressed by interrogating data that is already present.

In collecting counterdata, there are risks that arise when data that you are collecting for one cause is harmfully repurposed. Collecting data about homelessness, for example, could be used to find people aid, or it could be used to displace them. The collection of counterdata, especially if it is sensitive in nature, raises potential for privacy concerns in case of hacks, such as the 23andme data breach.

When counterdata has been collected, there is a risk that it simply might not work if it doesn’t match up to people’s experiences. Empowerment is hard to quantify, making it difficult to assess how effective counterdata is. If the response to counterdata is slow, we might not be around to see any changes.

To remedy issues that arise with counterdata, it is crucial to consider the wider social aspect. Afterall, what we are dealing with is a social problem that is encoded in data. A big lesson from climate change activism has been that whilst numbers are useful and important, to make an effective contribution you really need the qualitative aspect. To remain aware of how the subject of oppression is portrayed, we need to advocate for the stories that go alongside the data.

Despite its importance, there are difficulties that arise with the use of qualitative data. Qualitative data is not always considered as reliable as quantitative data, as it requires more resources to get the same amount of data. As a data analyst, there is a fear that if you put too much emphasis on qualitative aspects to tell stories without enough of the data, you won’t be listened to. Sometimes it is unclear whether quantitative or qualitative data will be more impactful. For example, regarding NHS waiting lists, we wondered if it is better to focus on the numbers or the individual stories to generate maximum impact. Usually, we need both data and the story to get a point across. Storytelling is powerful when we support it with data.

When employing stories for counterdata, it is important to be aware of the issues that can arise in the use of language, and how language can propagate existing power dynamics. For example, the gender binary has been perpetuated in children’s literature. Language of domination/violence/gambling also infiltrates medical language. The book Cassandra Speaks addresses relevant issues of language, gender, and power.

To mitigate language encoding bias, qualitative questions should be framed in ways that avoid leading metaphors and analogies, such as the war based metaphors in HIV medicine and research. It also matters who the storyteller is: they should be the people who are actually affected.

Imagine – what would it look like if we were ‘Data Justice Club’ rather than ‘Data Ethics Club’ (see Table 2.1)?#

Using “justice” instead of “ethics” implies action and gives the suggestion that the club is activist. Data justice club would thus be more outward facing and involved. This would require some changes to how the current club functions: we would have to do more and talk less. In some ways, data ethics club is a more suitable name for how we operate at the moment, as we discuss things rather than take action on things. Not all of us want to be part of an outward facing collective, as it is useful to have a place which is just for thought and discussion. There could be space for both a data ethics club and a data justice club.

If we were to lean in more towards data justice, we had a few ideas for what this would look like. Sharing our knowledge is one route; the role of academics is not just to collect data and do the research itself, but also to provide tools and education. Ethics isn’t as accessible as it should be and there are lots of organisations which need, but don’t have, data ethicist roles. The benefit of learning and education is that they promote empowerment of participants, as exemplified in the local Lotto project. Social change happens as a collective, and it is possible to see this work in that light. Data justice club could thus offer education and infrastructure to community groups who have the motivation to do data collection. This could take the form of providing a ‘consult service’ to other data research groups within the university (like a data ethics Ask JGI. Having a limited amount of time for other researchers to ask us questions could provide insights into how to address ethical aspects they might not have considered otherwise.

In addition to the community focussed alternatives, there are ways in which we could implement data justice on a personal level, by taking the information we learn in DEC and applying it to our daily lives. In doing so, it would be useful to work towards understanding a broader array of perspectives (e.g. different political viewpoints on issues). Having a Data Ethics wins channel in our slack group would be a good way to celebrate this. On a more involved level, data justice could look like participating in things like investigative journalism.

What would need to change for projects like Local Lotto to be used more in teaching data skills?#

With respect to changing the way undergrad courses teach data skills, we thought that instead of data ethics being taught as one lecture it could be infused throughout. Outside of university, more engagement with groups outside of academia who are fighting for social change, such as community and non-profit groups, would also improve mobilisation. An example of a relevant outreach project is DataFace, which is working to improve the data literacy of children in schools.

Problems with time and funding would need to be addressed to enable wider implementation of projects like Local Lotto. For many, involvement in these projects is extra to their current workload, and people do not have enough time to balance both. If the workload can be too much for people who are passionate about the cause, we thought it unlikely that data scientists without the same enthusiasm would be willing to dedicate their free time to these kinds of projects.

Data Feminism: Chapter 3 – On Rational, Scientific, Objective Viewpoints from Mythical, Imaginary, Impossible Standpoints#

Principle: Elevate Emotion and Embodiment#

“Data feminism teaches us to value multiple forms of knowledge, including the knowledge that comes from people as living, feeling bodies in the world.”

Chapter Summary#

Traditionally, data science has placed value on ‘objectivity’ and ‘neutrality’, designing data visualisation in ways that are plain in order to avoid eliciting emotion. Underpinning the principles of ‘objective’ data visualisation is statistical graphics expert Edward Tuft’s ‘data-ink ratio’, which denotes that designers should aim to use ink to display data alone. Separating emotion from knowledge is assumed to avoid attempting to persuade, giving readers the space to interpret results for themselves. However, this chapter argues that the lack of persuasion is an illusion. No matter how the data is presented, there will always be decisions that angle the data from a particular viewpoint; usually, the viewpoint of the dominant, default group.

When visualising data it is impossible to avoid interpretation as, unless you are presenting the raw data, there will be some things that are necessarily highlighted and some things that are necessarily obscured. Conventions of data visualisation reinforce people’s perceptions of their factual basis, contributing to the perception of it as objective, scientific, and neutral, and making it more likely that people will believe it. Donna Haraway argues that all forms of knowledge are situated, produced by specific people in specific cultural, historical, and geographic circumstances. Disclosing your position is an important strategy to be transparent about the limits of your knowledge claims.

The logic that grounds the false binary between emotion and reason is gendered because of the stereotype that men are more emotional than women. Emotion can be leveraged alongside visual minimalism to engage different parts of the brain, allowing for a range of learning types and communicating to a wider group of people. Harnessing emotion can help people to experience uncertainty, something which is famously difficult to communicate in data visualisation.

When thinking about how to present data, there is not one hard and fast rule. Each person has a unique skillset and intersection of subject positions which can offer a set of valuable perspectives that frame your work. Decisions should be informed by context, working towards a more holistic and inclusive ideal.

Definitions#

Framing effect – the choices that designers make when visualising data (what to highlight and what to obscure) impacts how people interpret the graphics and what they take away from them

Provenance rhetoric, Hullman and Diakopoulos – signalling the transparency and trustworthiness of the presentation source to end users by citing the source of one’s data, increasing the likelihood that viewers will believe what they see

Feminist objectivity, D Haraway - that all forms of knowledge are situated, that is that they are produced by specific people in specific circumstances. We can use this to bring together multiple partial perspectives

God trick/view from nowhere, D Haraway - the perceived ability to use an impossible/imaginary standpoint that appears to be neutral.

Discussion Summary#

What does it mean to you to ‘Elevate Emotion and Embodiment’ - in data visualisation and/or data generally?#

Elevating emotion and embodiment can help circumnavigate the myth of rationality and impartiality. There are some things that we experience and know to be true, however, scientific process demands empirical evidence to accept they are real. Demanding empirical evidence can detract from other forms of knowledge. When we simplify behaviour research to animal models, for example, we lose context; we think that there must be more useful and humane methods for research.

Science as we know it today was shaped by the Enlightenment movement in Europe, which advanced rationalism and empiricism and discredited the importance of emotion and embodiment. Ideas from the Enlightenment propagated around the world through European colonies, consequentially supressing indigenous knowledge and ways of living. Revaluing emotion and embodiment creates space for different types of knowledge systems. Indigenous populations have been incorporating data visualisation into record keeping for centuries, and have ingenious methods of data collection such as the Marshall Islands stick charts which represent ocean swell patterns. Data sovereignty movements are working towards indigenous communities regaining control over their information whilst pushing back against data colonialism and its harms. In Canada there are toolkits to support indigenous governments in managing and owning their data, appreciating different types of dynamic knowledge systems including qualitative, oral, empirical, and scientific knowledge.

Incorporating more types of knowledge broadens the experience that we can have of data. Whilst the final output may be static, the process of iteratively designing a data visualisation is actually quite embodied. We can expand embodiment to the final output itself by harnessing multiple senses to communicate data. Using illustrative visuals can also be very effective in conveying a message and more colour doesn’t mean less factual. There is a lot of value in making data visualisation emotive, as it can help to represent topics beyond the binary. Data visualisations during the pandemic, for example, were very emotive.

Minimising ink use, making visualisation less expressive and more sterile, is reductionist and ignores potential connotations. Being minimal in the way data is displayed does not mean it is impartial; minimalism is influenced by entrenched biases. The idea of minimising extra ink being best practice is linked to men as ‘less emotional’ and ‘more factual’ than women. How people respond to certain colours is also influenced by their culture, such as Western populations reacting to red-green colours.

Minimalist design decisions, such as choice of colour palette, may often be subliminal yet are impactful for accessibility. Making a visualisation accessible for one group often makes it inaccessible for another. For example, when plotting points for app usage in schools in Bristol we used a heat map (red, yellow, green) to indicate levels of use. Using a red-yellow-green spectrum isn’t colourblind friendly at all, however. People can check their designs using colour blindness checking palettes.

We also wondered who it is that gets to do embodied/visceral art: what resources are required; how easy it is to access those resources; how equal access is across different areas of society.

What did you think of the examples of how positionality is expressed in data visualisations (intended or not) using different graph styles, colour or annotations?#

Default data visualisations imply a white and western interpretation of science, insinuating that because it has this positionality it should be trusted. Whilst Florence Nightingale’s greatest contribution to combating disease and death resulted from the graphs she made to back her public health campaigns, she also drew attention to how graphs persuade, whether or not they depict reality.

What did you think of the reactions of NYT readers to the election gauge? How should we represent uncertainty?#

People generally hate uncertainty, which could be because uncertainty is often equated with messiness. It is important to remember that a lot of people haven’t touched statistics or maths since they were 16, making it difficult to understand statistical uncertainty. Even professors can misunderstand confidence intervals. We wondered if talking about confidence conveys uncertainty, or if talking about significance conveys certainty. People have a lot of cognitive biases which affect the way they interpret data. For example, truncating the y-axis so that it doesn’t start at 0 can heavily influence the way that a graph is read.

Audiences not understanding data visualisation until it’s explained to them is such a common real world problem. We find that some audiences expect ‘boring’ and ostensibly ‘neutral’ visualisations that this chapter criticises. However, if we don’t provide those kinds of visualisations, the audience thinks that we’re either not doing our job properly or we have an agenda and are actively manipulating them. It is difficult to balance these expectations with the recommendations of this chapter.

Figuring out how to balance expectations can perhaps be helped by our takeaways from this chapter: multiple views of data are relevant, and emotion is a valid one of these – not just that ‘emotion is important’. Christian Amanpour famously argued that the role of a journalist is to be truthful, not neutral.

Thinking about how to present data is supported by Nicole Dalzell’s three questions for pedagogical decisions that affect how LGBTQ+ students learn in the classroom:

  • What students are not supported by the design decision?

  • How can we adapt to reflect student needs not supported by the initial design?

  • Is applying this design decision sustainable for both me and the students?

Data Feminism: Chapter 4 – “What Gets Counted Counts”#

Principle: Rethink Binaries and Hierarchies#

“Data feminism requires us to challenge the gender binary, along with other systems of counting and classification that perpetuate oppression.”

Chapter Summary#

Data is information which can be computed in some way, ranging from counting to sorting and from modelling to visualising. In order to format information into data, it has to be classified in some way; chapter 4 revolved around the effects of counting or not counting certain things in data collection. Where things which are counted or classified are included in data and used to inform policymaking and resource allocation, what is not counted becomes invisible.

Classification systems are created by particular people, living in a particular place, at a particular time, influenced by the world around them. Once a system is in place, we often forget to question it until the system becomes an object of contention or starts to break down. Sometimes, leaving people out means that their experience goes unsaid, exacerbating issues of structural inequality. Other times, counting people can lead to increased surveillance or misuse of their data.

Requiring people to fit into a predefined and finite number of categories has consequences for people who are not included in those categories. For example, to apply for a passport in the UK gender must be specified as male or female, leaving no option for those who identify as non-binary. Non-binary people either have to choose a gender which they do not identify with, or not have a passport.

Representing identities in data makes those identities visible, yet visibility poses significant risks to the health and safety of minoritised groups. Exposing people can result in unwanted attention for those who would prefer not to disclose their gender identity, or discrimination, violence, or imprisonment depending on where they live. The most ethical approach to avoid potential harms varies according to the circumstances. Data scientists should analyse power and proceed with awareness of context to determine whose interests are being served by being counted, and who runs the risk of being harmed.

When we encounter problematic classifications, we must ask whether it’s the categories that are inadequate or whether it is the system of classification itself. We should question why distinctions have come about, what values they reflect, what hierarchies they encode, and whether they should exist in the first place. When we work within our own communities with consideration and care, we can use counting and measuring to hold power accountable and rebalance unequal distributions.

Definitions#

Administrative violence, D Spade - coercive and inequitable results achieved through legal methods.

Patriarchy - the combination of legal frameworks, social structures and cultural values that contribute to the continued male domination of society.

Paradox of exposure – the double bind that place those who stand to significantly gain from being counted in the most danger from that same counting (or classifying) act

Discussion Summary#

How can we make sure we are using classification when its needed, but not enforcing unnecessary distinctions when it’s not?#

Often, there isn’t a perfect answer for when classification is necessary and when it isn’t. Appropriateness of classification could depend on the social context you’re in, including where you are geographically, the time, and other contributing factors. Processes should be constantly reviewed to evaluate whether or not classification is needed and what type of classification is suitable.

Classification can help to improve diversity, such as looking at protected characteristics to reduce discrimination in recruitment. Attempting to bypass the importance of race by being “colourblind” ignores structural biases. Instead, appreciating different attributes in a positive way helps to celebrate diversity. It is important to distinguish between positive action and positive discrimination: positive action allows employers to take preferential action to support a group that shares a protective characteristic to overcome disadvantages or have their needs met; positive discrimination is unlawfully giving people preferential treatment.

Within places like the NHS, classification in demographic data such as ethnicity, gender, etc. is used to examine if everyone is getting access to care. Classification can make people feel seen, such as being told a classification of your condition in mental health therapy. Ignoring relevant distinctions means that the needs of different people go unseen, highlighted by the historic assumption of people as white and male as the default in medical research and pedagogy.

Classification also helps to reduce uncertainty, which is a positive for data scientists. As there is already so much uncertainty in statistics and data science, introducing more uncertainty by removing classification is often not something which is thought of very fondly.

Despite there being clear benefits to classification such as promoting diversity and equality, a lot of the information that is collected and recorded seems unnecessary. Sometimes, the pressure to conform to data collection standards means that we may collect data we do not need, which can be related to sample size, significance, and equal treatment. An example of unnecessary data collection is including sex on passports, or gender in research studies. In research, if the hypothesis has nothing to do with gender it is not necessary to collect the data. Yet, when preparing questionnaires for students to investigate a hypothesis about their use of something, we have been told that we have to ask for gender “because you always ask for gender”. Requiring people to associate with the binary male/female reinforces an ignorant view of gender, and being forced to pick a category can be emotional for people who do not fit into that category. In addition to recording gender, we find that most of the time asking for sexual orientation is irrelevant. If we provide information about our sexual orientation, we don’t want that to then become the “gay spokesperson” for the office. Giving employees a “how are you feeling today chart” is also unnecessary. It might be better to just ask employees “what could we do better for diversity?”

When classification is implemented, it could be manipulated by companies for self-interested purposes. Advertisers may put pressure on companies like Facebook to change their categorisation for their own interest. It is important to question classification systems, which are sometimes hangovers of historical inequalities such as the size of women’s pockets.

Have you come across the paradox of exposure before? Do you have examples of it being addressed well, or badly?#

Listing gender on passports elicits the paradox of exposure. Some countries such as Argentina and Iceland allow ‘X’ as gender designation, allowing for wider gender identification. However, in places where it is dangerous to be seen as anything outside of heteronormativity, listing identity as something other than male or female could compromise peoples’ safety. An option such as “prefer not to say” is perhaps an alternative which acknowledges the paradox without committing to a particular option.

Whilst this book was written a few years ago (2020), it mirrors what is currently happening in the real world. The 2024 Olympics boxing controversy saw two boxers accused of being ineligible to compete in their gender category. The accusations resulted in a media frenzy around the boxers and cyberbullying from high profile celebrities. Requiring people to disclose not only their gender but also sensitive biological information is highly invasive.

Do you have any good examples of quantitative data supporting qualitative data or vice versa? (relevant enough?)#

Putting qualitative and quantitative data together is very powerful and can increase awareness as qualitative data helps to explain and communicate the numbers. Where quantitative data is useful to obtain the numbers and general trends, qualitative data can fill in the gaps to get the stories behind the numbers. In the NHS, mixed-method studies are used to measure impact and experience. Formal interviews can be used to gather qualitative data to support quantitative data. The way that information is framed and who provides the information influences how qualitative and quantitative data is presented.

Looking at quantitative data alone can miss important insights, especially in cases where there is not a lot of data. In examining doctor and parent insights into the health of children with rare genetic conditions, the rarity of the conditions means that it is easy to miss things when examining quantitative data. If qualitative analysis is used to supplement the quantitative data, it is possible to highlight extreme views and target them as something which needs to be addressed. Drawing attention to outliers with little data but qualitative weight is also important in the context of the UK extreme right wing, where referring to them as a “small minority” is dangerous due to the extremity of their views.

However, analysing qualitative data is challenging. Adding an open text box to a questionnaire, for example, can make answers difficult to examine as you have a lot of data to process and summarise. Analysing qualitative data is perhaps something that AI could help solve.

Data Feminism: Chapter 5 – Unicorns, Janitors, Ninjas, Wizards, and Rock Stars#

Principle: Embrace Pluralism#

“Data feminism insists that the most complete knowledge comes from synthesising multiple perspectives, with priority given to local, Indigenous, and experiential ways of knowing.”

Chapter Summary#

At each stage of the data science process – from collection, to cleaning, to analysis, to communication - data science methods can inadvertently commit epistemic violence by working to suppress voices. To mitigate harm and embrace pluralism, chapter 5 argues that different and varied perspectives and voices should be valued throughout the process. Data feminism thus aims to change received ideas in data science that work against pluralistic processes.

To clean data, data scientists must undertake deliberate action to put data in their proper place. Yet, in the process of altering data to organise them, some perspectives will be lost and others will be imposed. Data cleaning originates from the assumption that data are inherently messy and need to be tidied, assuming that there is an underlying ‘correct’ order. However, the ideas underpinning the belief that data should be clean and controlled have tainted historical roots.

Data does not need cleaning until there are strangers in the dataset – until people using the data are not the people that were involved in the creation and collection of them. Strangers often sit at one, two, or many levels removed from the collection and maintenance process of the data they work with. These people who work with data have been called unicorns (they are rare with special skills), wizards (they can do magic), ninjas (they execute expert moves), rock stars (they outperform others), and janitors (they clean messy data). However, the nicknames for data scientists propagate an inaccurate assumption that people work in isolation.

To embrace the value of multiple perspectives, transparency and reflexivity allow those involved in a project to be explicit about the methods behind the project and their identities. Other perspectives should be actively and deliberately invited into the data analysis and storytelling process. Co-liberation should be positioned as and end goal and criteria for success, requiring that technical workers acknowledge that they are engaged in a struggle for their own liberation as well.

We found this chapter to be powerful. The structure was interesting, starting with examples before the concept was introduced which meant that clarity gradually built through the chapter.

Definitions#

Occlusion: the “problem” that occurs when some marks obscure other important features; sometimes occlusion itself is a pattern which highlights a problem.

Pluralism: when people make knowledge they do so from a certain standpoint and that by pooling our standpoints/positionalities we can have a richer and more robust understanding of the world.

Reflexivity: acknowledging and understanding your own positionality, and using it to inform your work.

Co-liberation: an end state in which people from dominant groups and minoritized groups work together to free themselves from oppressive systems.

Epistemic violence: harmful practices that can include misinterpretations, and distortions that can negatively impact people’s ability to produce and legitimise knowledge.

Discussion Summary#

In your career until now, have you come across stereotypes re: data analysts or working with data? To what degree do you think these hinder or support the work being done? How would you reframe the public image of the data analyst within your workplace to support the processes you would like to model?#

Analyst work has a gendered component reflected in the terms we associate with it. We have seen terms like rockstar in job advertisements for data analysts to create a buzz around a job that might actually involve quite boring work. Whilst terms like rockstar aren’t necessarily equated with being a man in English, attributes surrounding those terms tend to be. In other languages such as German, terms like rockstar (and data analyst) are more closely associated with masculinity as the articles are male. We weren’t sure if those terms are generally seen by the wider public, however.

Names can reinforce stereotypes and are powerfully associative, e.g. using someone’s surname like “Anning” instead of their full name “Mary Anning” makes them sound like a scientist. When the analyst role is perceived as overly “objective”, it is detrimental as people think analysts can fix any problems that occur in data collection using data science later down the line. If the scientific element is too strong, it removes focus from the actual decisions made when doing the work.

It is important to be conscious of the fact that in any use of data, some decisions must be made, and some manipulation has to happen. Storytelling happens with any form of data communication, the process of which may be infiltrated by one’s own bias. At times, the entire analysis process seems predetermined. Cleaning data doesn’t always make it better. Neatness sometimes avoids the reality of messiness; a shortcut to neglecting people that don’t fall cleanly into tidy categories.

Whilst there are issues with data cleaning, there are also many aspects of it that are still important. We worried that the chapter frames data cleaning as “taking away”, but there are many aspects of it that aren’t oppressive and are still important. Data cleaning doesn’t just take away – it also makes data useful. Without cleaning data, we don’t always know how to analyse it. It would be interesting to hear how the book so far shapes the opinion or view of data science for people who haven’t come across data science much.

In our experience, analysts usually have really good questions that interrogate the process, offering valuable perspectives and challenging the stereotypes of number crunchers. Expectation management is an essential part of an analyst role, to provide a reality check and focus.

The distance between analyst/researcher and the data itself can introduce significant bias in the process of analysis. Also, sometimes in our jobs we do not have the scope to aim for co-liberation. Do you have any examples of what can be done to bridge this distance?#

The issue with distance between those who provide or collect the data and those who analyse it is that misinterpretation can arise. Understanding is lost when there is too much space between different steps of processing data. In hospitals and medical studies, for example, there is often a large gap between those who are collecting the data and those who are leading the studies or doing the footwork. Information easily falls between these gaps, for example, sex differences that arise in data collection.

Proximity to data helps one to understand them in a lived experience way. At least seeing the setting once provides some intuition for the context. Rather than analysts being completely removed, retaining contact with domain experts facilitates bidirectional understanding. There should be more partnerships between people with closeness to the data; analysts should work with data scientists, and participants could even co-analyse the data. Those closer to the data often have the most valuable questions, but in reality this often isn’t reflected in who asks the questions.

To bridge gaps, it is important that connections are sustained. Connections should be made at the beginning of a process and then maintained throughout co-liberation. Stakeholders should be identified at the beginning and end of a journey and could be brought together. Existing gaps can be treated as communication problems. Improving communication along the process might help to narrow gaps by providing briefings between different steps and sharing perspectives. However scientific we are being, we still need to keep peoples’ stories.

Participation in citizen science encourages feelings of community; perhaps we should call it “community science” rather than “citizen science”. From what we’ve heard, people enjoy working directly with patients in health research. In the NHS, data processes put the patients at the core of the process in a very indirect way. There are inspiring applications that could be made to non-profit organisations using co-liberation approaches.

Integrating those close to the data could be aided by utilising user story type processes, where researchers define what is in and out of scope. The scope of misinterpretation of a dataset should also be laid out. Defining scope differentiates conclusions from next steps, providing avenues for future work for what wasn’t covered in the initial process. We should record unsuccessful and “hidden” work. Documenting decisions and pursuing reflexivity makes sense to us; we would like to see more reflexivity in data science. Even we wouldn’t necessarily analyse our own data the same way a year later.

When thinking of co-liberation as the end goal of data analysis, which parts of the process do you find least intuitive to model for this outcome (i.e. collection, cleaning, analysis, storytelling)? Do you have examples of processes where you have strived for data for good or co-liberation?#

Some data doesn’t seem too relevant to co-liberation such as animal data or mice models, however, we could still involve society in the process. Other data is logistically challenging to incorporate into participatory processes. For example, in clinical trials it is difficult to put patients at the core of the process, especially around the design process. Perhaps there is a spectrum of involvement, with different aspects suitable for different applications.

Data Feminism Chapter 6 – The Numbers Don’t Speak for Themselves#

Principle: Consider Context#

“Data feminism asserts that data are not neutral or objective. They are the products of unequal social relations, and this context is essential for conducting accurate, ethical analysis.”

Chapter Summary#

All knowledge is situated, therefore numbers always derive from a particular setting. To avoid reinforcing unjust power dynamics, this chapter argues that setting must be taken into account. Data should be understood in relation to the knowledge infrastructure from which they originate. “Raw data” is actually an oxymoron; data enter into research projects already fully cooked as a result of a complex set of social, political, and historical circumstances.

Rather than seeing knowledge artifacts (e.g. datasets) as raw input that can be fed into a statistical analysis or data visualisation, a feminist approach insists on connecting data back to the context in which they were produced. Under the feminist view of situated knowledge, it is the responsibility of the person evaluating or building upon knowledge to ensure that its situatedness is taken into account. When approaching any new source of knowledge, it’s essential to ask questions about the conditions under which the knowledge was produced, as well as the identities of the people who created it. Instead of taking data at face value and looking towards future insights, data scientists can first interrogate the context, limitations, and validity of data, looking at the cooking process that produces “raw” data.

Context allows data scientists to better understand any functional limitations of the data and associated ethical obligations. Raw data might be good for starting a conversation but cannot ensure engagement or accountability. Local knowledge is important to understand how power and privilege operate in an ecosystem and contribute to the making of data, possibly obscuring the truth.

There is a default assumption that unfettered access to information is inherently good, however, facilitating unbridled access can remove context from the data. Whilst the open data movement has admirable aims, it has also inadvertently worked against the provision of context by publishing raw data with little documentation. Data user guides are an example of how data can be situated, containing narrative portraits of a dataset. Data user guides explain (among other things) the purpose and application of the data; the history, format, and standards; the organisational context; other analyses and stories that have used the dataset; the limitations and ethical implications of the dataset.

Definitions#

Big Dick Data - big data projects that are characterized by patriarchal, cis-masculinist, totalizing fantasies of world domination as enacted through data capture and analysis

Knowledge infrastructure C Borgman - “an ecology of people, practices, technologies, institutions, material objects and relationships.”

Zombie data - datasets published without any purpose or clear use case in mind

Subjugated knowledge - forms of knowledge that have been pushed out of mainstream institutions

Discussion Summary#

What problems does big data solve versus create?#

About ten years ago, the big trend in open data was that if you have enough data, some trends will fall out of it. Underpinning this trend is a trust that if data is “big” it is reliable. We found it strange that people tend to just accept that others have collected their data properly and responsibly. Having worked with open data in research settings, the issues with big data highlighted in this chapter were not surprising to us.

A key issue with big data this chapter highlights is loss of context. Some of us found it strange hearing the perspective that open data (i.e. putting contextless data out into the world) works against providing context for data. However, when data is collected at scale it is harder to collect proper contextual information. Standardising data also often means that useful data is removed from more specialised areas. Even if we could collect everybody’s opinion for a fixed point in time, we don’t know if they will change their mind.

The “oxymoron of raw data” effectively illustrates the positionality of data. Cooked data can’t be uncooked – you can’t separate data from the context within which they were created. Data are always modelled according to a particular standpoint or question, otherwise a lot of information would be lost. When contrasting concepts, such as raw vs cooked data, it is often inherently implied that we’re contrasting against things that are inherently malicious. In fact, it is more likely that the contrast has arisen from lack of foresight.

To mitigate loss of context in data handling, training through academia or other channels to speak in certain ways regarding data may help. There is a tendency to treat data as slightly mysteriously higher powers even if (sometimes especially if) we don’t properly understand them. The trial of Lucy Letby highlighted how badly statistics can be explained in the courtroom and the Royal Statistical Society (RSS) has made a statement on this. Math on Trial by Leila Schneps and Coralie Colmez delves into the problematic use of statistics in the courtroom.

What are your thoughts on how we label graphs with context? If you have time, what would you choose as a Title and Sub caption for this one?#

Putting text on charts can tell an interesting story. In the chapter, the third change to the graph made us all revisit our assumptions. Sometimes, being led to the correct conclusion helps people to make up their minds - just as our thinking was challenged by the graph in the chapter. Skills to critically analyse data visualisation are important but not easy to achieve.

Whilst different ways of presenting graphs have their strengths, it is challenging to fight the status quo. People might be alienated by unusual methods and think that it means you are not “impartial”, thus discrediting you. A toned-down version of a message might be better if it reaches more people.

Where does responsibility lie for proving data context in your area? How does funding play in?#

Where responsibility lies depends on the situation and stage of research. In science, the responsibility falls on the person doing new research. Yet, when data is available for other people to use, anyone who uses that data will be responsible for looking deeper and discovering more of the context. If people don’t understand numbers very well, however, they might be scared to question the data. People also deliberately misuse data; in Germany there is a big problem of the alt-right spreading adverts with biased graphs and data.

To mitigate people misunderstanding or misusing data, when making work that is reproducible we should think about who is trying to reproduce it. We aren’t very well practiced in sharing as a community how we do things, but regularly sharing context and learning in groups can distribute the responsibility and change how people work. Working with others in groups to share context doesn’t have to be very ambitious, you can just aim to do it well.

In grant funding, often things fall into the category of “Good” with a capital “G” without having any data to back up why they are Good. Reproducibility problems are exacerbated by a lack of raw data; one paper found that being asked to provide raw data led to more than half of manuscripts being withdrawn. Trustworthy data in the NHS tends to be either low level and immediately verifiable, or public health data that have gone through extensive checks, by which time they are out of date. Most people believe that what they’re doing is good, but just having the data alone isn’t enough to justify the means. More than openness is needed for reproducibility. There are platforms helpful for protecting privacy in data analysis like OpenSAFELY, a set of best practices encoded as software for analysing health records data. If you phrase it right, funders are coming around to approaches which incorporate context.

Issues with incorporating context are specialism problems, exacerbated by the stereotype of STEM as masculine. The German education system also fails with these problems. In reality, women are good at maths but they are discouraged from it; a deep cultural problem which starts from a very young age. The gender disparity continues past the education system into the workplace, reflected in the gender pay gap.

Generalisability of this chapter and the book itself to different contexts is restricted as it leans heavily on US centric examples. We were relieved when a non-US example was finally used. We liked that the audit at the end of the book highlights the bias towards the US and thought that all academic writing should hold itself accountable with an audit. The bias is not only with examples but also language and cultural references. Focusing on USA examples created this weird feeling in us of being marginalised even if we are outside of the USA and aren’t used to feeling marginalised in this way. Perhaps this is deliberate, as it did increase our awareness of how it would feel to be more marginalised.

The audit reflects on how perspectives were only included if they met the authors’ “quota”; we wondered why global south and indigenous perspectives were so hard to gather and meet their quotas. It might be the case that people from the global south have more barriers to publishing such as the bias in academic publishing to English and publication costs. Whilst the book has several times used “google” as a verb, defaulting it as the search engine norm, Google is not the primary search engine in all countries. Arguably, Google dominates our lives in the exact way discussed in the matrix of domination. Avoiding the use of “google” could be a way of making the topics more colloquial and accessible.

Data Feminism Chapter 7 – Show Your Work#

Principle: Make Labour Visible#

“The work of data science, like all work in the world, is the work of many hands. Data feminism makes this labour visible so that it can be recognised and valued.”

Chapter Summary#

When releasing a data product to the public, we often cite the source of the dataset and names of people who designed and implemented the code and graphical elements. We tend not to credit the many hands who perform the work, such as those who created data in the first place and those who collected and processed data for use.

In capitalist societies, we tend to value the work we can see; the result of a system in which cultural worth of work is directly connected to the price we pay for it. More often than not, this also means that we fail to recognise the larger value of services we get for free: the phenomenon of invisible labour. Invisible labour encompasses various forms of labour including unwaged, underwaged, and waged, that are rendered invisible because they take place inside of the home, out of sight, or lack physical form altogether. Behind the marketing of data products there is always hidden labour, often performed by women and people of colour, which is both a cause and effect of the fact that labour is both underwaged and undervalued.

Showing the work is crucial to ensure that undervalued and invisible labour receives the credit it deserves and to understand the true cost and consequences of data work. An emphasis on labour opens the door to tracing data production back to its material conditions and contexts, as well as the quality and character of the work and people required to make it. Giving formal credit for a broad range of work is a way of resisting how certain types of people take up space by screening out others, making them invisible. We can also look to the data themselves to honour the range of invisible labour involved: who is credited and whose work is screened out.

Screening techniques are not always intentional, but they are self-perpetuating. Like a comfortable chair that is moulded to the shape of your body, you probably won’t notice how it will be uncomfortable for those who haven’t spent time sitting in it. Those of us who occupy comfortable chairs must remain vigilant in reminding ourselves of the additional forms of labour and people that our own work rests upon.

Definitions#

Data supply chain - the work that goes into all stages of creating a data product

Invisible labour - the various forms of labour that are rendered invisible because they take place in homes, out of sight, or because they lack visible form

Emotional labour - managing one’s feelings, or someone else’s, in response to the demands of society or a particular job

Info-maintainers - people who work in libraries and archives and related preservation fields making sure that information in the present remains accessible in the future

Discussion Summary#

What other products’ hidden labour would you like to see mapped like the Amazon Echo in Anatomy of an AI system?#

We’d like to see hidden labour mapped out for ChatGPT and social networks such as reddit. In social networks, there is a lot of work that goes unseen (e.g. content moderation); it would be interesting to see the anatomy of this. Comparing the anatomies of different systems could reveal interesting insights into how technologies are similar and different.

Regarding the Wikipedia example referenced in the chapter, we were surprised by how large the proportion of editors are men. We think this could be because women have less free time outside of work to do free labour like Wikipedia editing. The lack of free time for women also contributes to phenomena like women not applying for jobs unless they are totally qualified for them. We have seen job adverts encouraging women and minorities to apply, but we know that the most qualified will get the position anyway. The sentiment is nice but perhaps meaningless.

Do you have any examples of invisible labour that you do in your work?#

Open source tech and software packages are an example of invisible labour we participate in. Work that goes into hosting communities like Data Ethics Club or NHS R is not necessarily invisible but perhaps ignored. There is also invisible labour involved in crowdsourcing, such as the Netflix Prize, the Guardian, ProPublica, and Vox Media.

Another example which we don’t always consider is the time used to apply for jobs. It is common for companies to ask applicants to do “homework”, such as pitching and then using your ideas. In an application for a tech job, we had two weeks of “homework” or “trial” which we were never paid for. Companies also exploit free labour through unpaid internships or relying on volunteers; we have spent time volunteering for a food collective with only one paid member of staff. Unless you have explicitly studied the topic, many non-profits require you to do a lot of volunteering before they employ you.

Administrative jobs often involve invisible work; without this work the projects wouldn’t happen, but the contribution is not explicitly acknowledged. Research assistants help to collect and process data, a key part of the research journey, but aren’t listed as an author in papers. We would like more acknowledgement for the unpaid labour that allows researchers to do their work. Data cleaning is part of the invisible labour we do, and we were concerned that the way data cleaning is presented in this book de-values it. Being explicit about data cleaning may help both to retain its value and also to ensure it is done well.

Project management is an example discussed in the chapter which was initially surprising to us, but rang true when we thought about the role, our experience with it, and the skills associated with it. Project managers take a largely supportive role holding everything together. In our jobs, we have experience in managing people and people’s expectations, as well as our own stress, which involves a lot of emotional labour. Much of the labour that project managers do is underappreciated, as managing people is a difficult job which isn’t properly recognised.

Previously, we haven’t thought much about emotional labour and dealing with others’ emotions. In library work, we found entrenched invisible labour. There is a gendered component to emotional labour, as generally women are expected to perform it. Women are expected to want to make people feel comfortable and taken care of. Women have been historically undervalued in the workplace; in the 80s, women in Iceland went on strike to highlight their contribution to the economy.

The example of likes on social media as labour discussed in a previous writeup was strange to us. The argument revolves around the idea that “if you don’t know what the product is then you’re the product”. However, we opt into interacting with social media, which seems to be on a different level to being compensated for housework and raising children. It could be more accurate to say that consumption of product is not labour, but participation in product is labour. By purchasing a blender you aren’t sending feedback to a company, but when you interact with social media the data that you volunteer becomes a product in itself as it can be packaged and repackaged. Whilst there are ecosystems that can emerge around products, what is going on with social media is different to blenders.

We liked the comparison with volunteering for a non-profit but wondered if there is a difference in the outcomes of volunteer labour and data labour that generates a product. It is difficult to draw the line labelling one as bad, whether it is through the monetisation or the output. The way the outcomes of your time and effort are used is distinct for non-profits and social media. We tend to volunteer data to social media with a lack of understanding of what happens to it. Sometimes, the data is even volunteered illegally. For non-profits, if someone is using something you volunteer, you benefit through a sense of well-being. We wondered if there is an assumption that if you enjoy the labour then you shouldn’t be paid for it; this assumption could also play into gendered assumptions such as women not being paid for childcare.

How do you think invisible labour should be communicated? Could we do anything differently?#

Citation is one method used to assign credit in academia; there are mixed feelings about how good a method it is however. CRediT is a resource used to assign credit for the many different people that work on a paper. We have seen CRediT used in labs that really care about open science. However, it resulted in a paper with 3000 authors, listing everybody that contributed. Although this is a commendable attempt to making everyone visible, it ends up masking meaning through overload.

Showing invisible labour is important but there is a flip side: once things are exposed and valued, they have to then be ranked. We are concerned about how to show invisible labour without commodifying things such as emotions. We don’t want to end up in situations where we are trying to decide how much an hour of crying was worth. There are also complex issues that arise with how much involvement we want in the home from the government, such as in providing wages for housewives. Value could be quantified in other ways such as time, e.g. how many hours a day we dedicate to a task.

If we quantified (in dollars) the value of invisible labour, people might pay more attention but formalising the value of emotional labour is complex. If the output can’t be expressed or the impact quantified then “softer” work is hard to fund. The International Labour Organisation is working to measure unpaid domestic work, and Oxfam is helping to make care work more visible. If caretaking jobs were predominantly taken by men, they would be paid better. The onus to recognise invisible labour should be on the privileged people that fund work.

Data Feminism: Conclusion: Now Let’s Multiply#

Chapter Summary#

Part of the reason that data and computation have proved to be so lucrative is their ability to scale. This means that tech companies can make extraordinarily high profits, but it also means that the core workers they rely on have an extraordinary amount of bargaining power, messaging power, and subversion power. This chapter explores how tech workers can harness their position to occupy digital infrastructure, slow capitalism, and channel digital solidarities.

Using the work of Black feminist theorists and activists, the book has applied intersectionality and feminism to data science with the intention that readers will use the work to reflect on their own identities and examine how power and privilege operate in data science and the world. We should nurture all emerging ecosystems and build links between them to mobilise resistance to the differentials of power embedded in our current datasets and data systems, to mobilise courage and creativity, and to imagine data science and artificial intelligence beyond the matrix of domination.

Definitions#

Design justice - redesigning design so that those normally marginalised become co-creators

Intersectional feminism - a vibrant body of knowledge and action that challenges the unequal distribution of power

Discussion Summary#

What is your favourite new definition/thing you’ve learned about from the book?#

The book provided words for things we have felt but didn’t have the vocabulary for; for things which we were aware of but couldn’t fully grasp. We saw data through a new lens, exploring physical data objects and counterdata. The balance between data cleaning being useful for data analysis and imparting bias is delicate and necessary to be aware of. Embracing messiness in data visualisation is powerful, as it reflects the actual collected data. Showing what’s not there – the absence of data – is another powerful mechanism in data visualisation. Data can also be used for storytelling and visuals on a larger scale.

We transitioned to viewing data as a social construct, emphasising the importance of positionality in how we interact with data and is important for co-liberation. Unmasking the illusion of impartiality leads to accountability, facilitating analysis and addressing structural issues around information. Co-liberation and the differences between data ethics and data justice make sense to us and we will incorporate these concepts in our thinking.

Is there anything you plan to do differently on the basis of what we’ve read?#

We would like to focus on small projects; local work that impacts local people. Although it doesn’t get recognised in the same way as a global AI project, small and local projects are also impactful. We will work to pursue diversity in our teams and are considering how to choose amongst different routes at work based on who we are. The jobs we apply for in the future will be influenced by what we have learnt; some of us are planning to do Ph.Ds. or to go into research.

Do you have a question for the author?#

  • What was the process of finding diverse examples?

  • What has been the life of the book after it has come out?

  • In the section after the conclusion “Our values and our metrics for holding ourselves accountable”, how do you do your metrics? How did you come up with these numbers? How do you count it e.g. if there is one female out of five authors on one paper does that count as a female citation?

  • Does the author apply these principles in each paper she writes?

  • How do you work with communities and reflect this data when you are working on a global scale or outside of the community on large national data sets?