Hacking the cis-tem#
Whatâs this?
This is summary of Wednesday 21st Septemberâs Data Ethics Club discussion, where we spoke Hacking the cis-tem, a paper by Mar Hicks from the Illinois Institute of Technology. The summary was written by Huw Day, who tried to synthesise everyoneâs contributions to this document and the discussion. âWeâ = âsomeone at Data Ethics Clubâ. Nina Di Cara and Natalie Thurlby helped with the final edit.
Introduction#
In this meeting, we discussed the paper Hacking the cis-tem, written by Mar Hicks. As outlined in the abstract:
âThis paper looks at the case of transgender Britons who tried to correct the gender listed on their government-issued ID cards, but ran up against the British governmentâs increasingly computerized methods for tracking, identifying, and defining citizens.â
We used the paper to launch our own discussions about how data science practitioners are often guilty of forcing people/subjects into discrete categories at the expense of their identity. This can take the form of being too restrictive with initial inputs of data (e.g. sex or gender) or making it difficult to change these categories (e.g. if someone transitions or if someoneâs name changes due to marriage or other circumstances). Often this isnât done maliciously, but it remains the responsibility of the data wranglers in question to be well informed of potential pitfalls.
We shared times we had ourselves been guilty of these practises, discussed how this can lead to discrimination, issues with machine learning systems trying to âidentifyâ or arbitrarily place people in these categories and the responsibilities of those constructing these systems.
Can you think of a time youâve been guilty of this practice?#
Euanâs background is astrophysics but he started working with data working with horses. One of his earliest encounters with identity categories was of âhorse sexâ. It transpires that there are more than 2 categories of horse sex.
One of our members has seen that the NHS is guilty of reinforcing some of these issues. They claim to collect âgenderâ but itâs actually sex. There is an âundisclosedâ option. Depending on the service, this category can be interpreted as sex or gender. This feels like the data is being used against people rather than to help them and that it is probably causing harm somewhere.
Some of our members admitted they likely had been guilty of these practises, but unaware of it. Now they choose not to collect it because itâs not relevant to what theyâre doing. These practises came up when doing âAIâ categorisation of gender (looking at names for example), to see what averages of people are discussing/believe different things. It is difficult to include non binary people but an uphill battle to make this issue a priority for private companies. On the other hand, weâve discussed previously how not collected data about marginalised groups prevents us from testing whether models are biased against them.
The patent office did something similar where they aiming to find out whether some groups of people (e.g. women) were underrepresented in patents, but the methodology is quite flawed (which they admit). They could identify male and female names but there are a number of names that fall into both categories depending on region and when the person was born/named.
Just because something does not harm many people, that doesnât necessarily mean it is less harmful overall - a lot of harm could be caused to a small number of people. This subtlety is complicated because we also donât always have reliable numbers for either of these things.
A common one was making decisions about how to record gender in forms. Open text? Drop down? Difficult when youâre having to satisfy someone elseâs requirements for reporting or accounting reasons if working within wider systems. Thereâs also often an inherent assumption of women being associated with non-binaryness, which is maybe better addressed as âmarginalised gendersâ.
In Australia the [census this year was an online tool], so questions appeared based on what answer you selected. This led to an interesting discussion about how in Samoan culture boys may sometimes be raised as girls, known as Faâafafine, which the rest of us had not heard about before.
How can the construction and manipulation of identity categories by computerised systems lead to discrimination?#
The problems we discussed highlight that biological essentialism was baked into the welfare state and the assumptions that came with that. However, a lot of the design choices are probably convenience based for the people designing the system rather than maliciousness (but not always, and this does not excuse these design choices). Sometimes it can be a case of âprotectingâ the protected variables. If its a category someone might be discriminated against, it might be worth protecting/obscuring this to the general public.
Dataset size is an issue when you use a larger number of categories. For example, if youâve got 100 people identifying as male and then 10 as non-binary then it might be hard to do meaningful analysis on the smaller dataset.
The difficulty with putting people into categories is that things like gender are being regarded as more of a spectrum and so classifying many different pronouns, you start to lose anonyminity when you identify them. Being in a minority group puts you at higher risk of discrimination and so we need to strike the balance between minorities being seen but not at a higher risk.
How do you handle the fact that these variables might change over time? Be it gender, sexuality or even surname. Euan worked with a database which had no identifying numbers, only had names. One database did update surnames if population members got married, one didnât. This lead to difficulty when comparing between datasets. But anytime identifiers are needed, the requirement of anonymity needs to be considered.
Are changes to datasets going forward going to include updating retrospective data? Could this lead to misunderstandings of how society is changing? Is co-design a solution? Should we be having more conversations about when collecting certain data is even necessary? Weâre told that digitising things is improving them, but often thatâs not always true.
The problem is often bureaucracy, not necessarily technology. For example, not being able to get a bank account without an address. Thereâs an assumption there that everyone has a house.
It is vital to consider edge cases. A system that works 99% of the time sounds good, but the consequence of being in the 1% is very hard. Although as Ruth Ng points out: âIn tech, there are âedge casesâ. But there are no such thing as edge case people.â.
What are the risks of AI systems artificial intelligence systems that are designed to âdiscernâ or algorithmically âidentifyâ aspects like gender or sexuality from facial data or other categories?#
Humans jump to conclusions is because it helps us make very quick decisions. One thing we do is look at someoneâs face and make judgement calls. These judgements can include gender identity, sexuality or race. You canât get an algorithm to do that quick thinking and justify it like a human can. Without explainability, you donât have a leg to stand on. Even as people, thatâs kind of what we do.
Datasets trained on straight cis white men will be better at identifying straight cis white men than other groups. We have to be careful about algorithimic decision pipelines which donât have humans involved. We have online data but also natural data from our world. People present themselves online in a different way to how they actually are (e.g. instagram vs reality).
Is it somehow better if youâre aggregating a group vs targetting individuals? We got into this in a previous discussion âParticipantâ Perceptions of Twitter Research Ethics (25th Aug 21). It could be used to not market to a particular group (bad) or try to be more accessible to a particular group (good).
In general we felt that such systems opens a whole can of worms and was basically tech based phrenology - a pseudoscience which involves the measurement of bumps on the skull to predict mental traits.
What responsibility do government and/or technological platforms have when constructing such computerised systems?#
All of the responsibility. The government need to do better. Some countries have systems where self identity is the default. The government has a responsibility to treat all citizens equallyâŠright? This should follow on for tech companies too, in theory.
There appears to be an issue with diffusion of responsibility. If youâre making a system, many of us feel you should be responsible for every step of the way in how its used. Itâs sad how a business case often needs to be made for a company to perform ethically. See our previous talk on Economies of Virtue: The Circulation of âEthicsâ in Big Tech.
From the perspective of government/healthcare, some trans people are getting health invites for things that are not relevant to them. For medical reasons, the types of data gathered might need to be different. This even applies to cis people as their personal circumstances change with regards to their health.
An imperfect and time consuming solution for this would be to have individuals tick a box/answer a question about every possible health condition and treatment they might have to undergo and whether it is relevant to them personally. This creates an apparent dichotomy between categorising and efficiency. We believe that in the era of big data, data scientists should be able to do better.
Should employees in tech firms have to take a hippocratic oath in order to use AI in healthcare? This is a concern of clinicians. Meaningfully categorising any group is a challenge, and a political descision. Political descisions become design decisions and sometimes the designers are not qualified enough to make these decisions without outside input.
Closing thoughts#
Many of us reading this didnât realise there were difficulties with people confirming their gender as far back as the 1950s. We didnât like how the paper described transwoman as âglamourousâ and a transman as âstoicâ - is this reinforcing the gender binary or pointing out how trans people felt that they had to conform to stereotypes?
What data do we need to know and what donât we need to know: by not collecting data about people some categories, you donât cater to them. By leaving people out, will we miss finding important things about them? Drugs for cystic fibrosis canât be tested on pregnant people, but it could also harm the unborn child to come off the medication. This affected peopleâs attitudes to taking the covid vaccine when pregnant.
In the ONS Census last year children under age 16 could not have the option of answering the question related to identifying the same as at birth. This question had been tested by ONS but was deemed publicly acceptable for this age group. How does this affect trans teenagers and how they feel about their gender/sexuality?
This has the implicit implication that itâs not worth thinking about your gender identity until youâre at least 16. One of our members summed up their thoughts on this with the line:
âThe unexamined cis-life is not worth living.â
Attendees#
Name, Role, Affiliation, Where to find you, Emoji to describe your day
Natalie Zelenka, Data Scientist, University of Bristol, NatalieZelenka, @NatZelenka
Nina Di Cara, Research Associate, University of Bristol, ninadicara, @ninadicara
Huw Day, PhDoer, University of Bristol, @disco_huw
Euan Bennet, Lecturer, University of Glasgow, @DrEuanBennet
Laura, Software engineer working in data collection
Lucy Bowles, Junior Data Scientist @ Brandwatch
Roman Shkunov, about to start (!) as a Graduate Data Scientist @ Lloyds Banking Group
Zoë Turner, Data Scientist, Nottinghamshire Healthcare NHS Foundation Trust @Letxuga007
Marissa Ellis, Founder www.diversily.com, https://twitter.com/diversily
Kamilla Wells, Citizen Developer, Australian Public Service, Brisbane