Data Ethics Club: How It’s Unfair to Use Personality Tests in Hiring (International Women’s Day Special)#
What’s this?
This is summary of Wednesday 5th March’s Data Ethics Club (DEC) International Women’s Day Special, where we spoke and wrote about the video How It’s Unfair to Use Personality Tests in Hiring by Cathy O’Neil. The summary was written by Jessica Woodgate, who tried to synthesise everyone’s contributions to this document and the discussion. “We” = “someone at Data Ethics Club”. Huw Day helped with the final edit.
Article Summary#
Algorithms, O’Neil argues, are built with the values of their designers embedded within them. These values are presented as objective “truths”, creating an illusion of indisputability. In actuality, algorithms are not objective and are usually masking a hidden agenda. When you have to encode a value system, which you do when making algorithmic design choices, there is no such thing as objectivity. In the case of corporations, this hidden agenda is to make as much profit as possible; in the case of governments, it is to make things as efficient as possible.
Important algorithms that we should pay attention to, in consideration of concealed value systems, are those which are: widespread, affecting a lot of people in important ways by making an actual impact on their lives (e.g. if you get a job, if you get insurance, how long you go to prison); secret, as people don’t understand them or why (in the case of scoring systems) scores have been assigned or even that they have been scored; destructive on a societal level, creating pernicious feedback loops that undermine their original goals. One example of an algorithm that is widespread, secret, and destructive is the use of personality testing in hiring processes, which has the effect of systematically filtering out people with mental health status.
Implicit bias is the automatic, nondeliberate, favourable, or unfavourable mental representation of social categories such as ethnicity, race, or gender. Hiring algorithms have a tendency to proliferate implicit bias. The reason for this rests upon key assumptions in machine learning (ML) development: to build an ML algorithm, you need (a) historical data and (b) a definition of success. For hiring algorithms, you would thus look at the data of all the people that have been hired before. A reasonable definition of success could be the people went on to be promoted. After training the algorithm and optimising for accuracy, if those who have been successful in the past match only a specific profile of people (e.g. men), the algorithm will continue to give that group more points when it is applied to new applicants. ML algorithms thus propagate the status quo, automating historical biases.
Every company has implicit bias. The assumption for a data scientist thus cannot be that if you use a relevant dataset, a reasonable definition of success, and optimise for accuracy, then you have done your job. To mitigate the propagation of historical biases, it is important to pay attention to implicit biases that the algorithm designer may have and how they influence the value systems they encode.
Discussion Summary#
Can you think of a kind of implicit bias that affects you?#
Alongside thinking about the types of implicit bias we have experienced, we should also flip the question around and ask ourselves when we have had implicit bias about others, involving some self-reflection.
We’ve seen implicit bias manifest in a variety of areas such as in the workplace, name spelling, between disciplines, and gender. During the pandemic, we experienced implicit bias based on people’s physical housing backgrounds when working from home. Variations in name spelling can in instigate implicit bias; if names are written in ways that differ from conventional spellings in specific regions, people can mistakenly assume they are from different ethnic backgrounds. There can be implicit biases that arise between people from different disciplines, such as STEM and humanities, where people from backgrounds in particular topics have preconceived notions about the knowledge of people from other backgrounds. These preconceptions can limit communication, as people may assume that you are an expert on everything vaguely related to your area (e.g. assume you are an expert in maths because you understand machine learning) or assume you don’t know about certain topics (e.g. assume you don’t understand maths because you have a background in the humanities). Assumptions about knowledge and expertise are also influenced by gender-based implicit bias; many of us have experienced people thinking we have no technical proficiency because we are women – even for those of us who have highly technical backgrounds.
Accents carry a lot of weight for assumptions that people make about one another in terms of class background. In the US, accent bias has been found to lead to unequal access to employment; similar studies are currently underway in the UK. We wondered if there are similarities between the preconceptions formed from accents and personality tests. People infer assumptions about social class from accent; it could be the case that some ways of answering personality tests are perceived as correlating with certain classes. As “soft” skills like communication overlap with personality, “personality traits” may be conflated with particular skills or lack of skills. In roles that don’t require specialised skills or education and rely on soft skills, recruiters thus end up hiring lots of people with similar backgrounds to themselves who communicate in similar ways.
There are significant issues with personality tests, explored in a previous DEC writeup, but implicit bias in interviews and other hiring procedures is a very real thing. In our own work, we previously processed 400 applications for a particular role. The technique at the employer’s was to “sift”, where each application is read through. Over time, the reviewer tends to become more involved in the process and may change their scoring. Temporal bias could thus arise, as the reviewer scores applications processed earlier differently to those processed later. Scoring may also be affected by the reviewer’s background; if the reviewer has a different speciality to the role being processed, certain details may be missed. For example, if the role is for a data scientist, non-data scientists may not recognise that R and Python are both programming languages.
Issues with bias in hiring are thus not unique to personality tests but occur throughout the pipeline, making us wonder if there is sufficient justification to fully remove the tests. In the ideal scenario, if five perfect people apply the recruiter would hire all five of them. In reality, companies do not usually have capacity for this. When job listings have hundreds, or even thousands, of equally qualified applicants, figuring out how to deconstruct that list is not an insignificant task. A tempting approach is to problematise this and come up with solutions like sorting algorithms, but these methods run into challenges when one considers the importance of preventing discrimination based on secondary factors like education or location.
To evaluate the relevancy of personality tests, we asked ourselves what their purpose is and what employers are looking to find out from using them. An employer may use maths tests to measure numerical skills; English tests to measure written aptitude. For these kinds of tests, there are clear ways to grade results and define success. However, defining a “good” result in a personality test is conceptually different; it seems strange to insinuate that some personalities are inherently better than others. It may be the case that certain personalities are better suited to certain roles, such as extraverted people being suited to leadership positions, but we weren’t sure if this is generally how the results from personality tests are used. From the video, it appears that personality tests uniformly rule out certain profiles - profiles that align with mental health diagnoses.
By attempting to bracket complex human beings into discrete categories, personality tests oversimplify the things they are aiming to measure. Assigning someone’s personality from something that they did or said is a generalisation. Mental health diagnoses, which personality tests have the effect of screening out, are often not lifelong things and can be highly situational. The issue raises questions with what employers value at a deeper level: robots that fit into oversimplified and standardised moulds, or human people with quirks and diversity in thinking. We wondered whether personality tests have been matched with outcomes, examining whether people that did well on these tests and were hired went on to be successful in their jobs e.g. by being promoted or staying at the company.
When we were required to undertake psychometric tests, we found the experience quite humiliating. Presumably, the tests were used to screen out high numbers of applicants. The questions were odd; we were asked lots of things like “would you ever steal a pencil”. We wondered if the best way to treat these tests is to gamify them and try to “win” at personality through consistency rather than honesty. Whilst watching the video, we were reminded of The Psychopath Test, which explores the idea that many leaders in society are psychopaths whose actions towards others can only be explained by taking that fact into account. Traits of psychopaths include glibness or superficial charm; a grandiose sense of self-worth, pathological lying; conning and manipulative behaviour; callousness or a lack of empathy. If it seems more reasonable in personality tests to prioritise answers that are more likely to be accepted than which are honest (which feels like manipulative behaviour), we wondered if the tests actually end up optimising for psychopathy.
What sorts of power affect peoples’ rights to speak out about implicit bias (e.g. access to legal representation, transparency in decision making etc.)?#
It is not easy for many people to access expert advice in terms of legal representation or career guidance. If people do not personally know experts in an area, or have enough money to pay for expert advice, their issues often end up going unacknowledged. An example of this is the Robodebt scandal in Australia: an automated government scheme that incorrectly demanded welfare recipients to pay back benefits, forcing many into worse financial circumstances and leading to three known suicides. Robodebt only became a known issue and was investigated after it started to affect middle class people.
Once someone has become part of a system, and that system has a recorded judgement of them, the fact they have been judged makes it more difficult to protest that judgement. The judgement is imputed to them, and the act of protesting it can have the effect of reinforcing it, as people interpret the protest itself as a sign of affirmation. In institutions, bias against protestors can lead to those that are promoted being the ones that do not raise their voices too much. The bias can also lead to neurodivergent people being misunderstood, as people think that someone is a bit weird, and communication breaks down. We have seen neurodivergent people assigned malintent when they were just trying to clarify a point, highlighting the importance of communication accessibility.
What change would you like to see on the basis of this piece? Who has the power to make that change?#
The existence of implicit bias has been found to predict discrimination in a range of circumstances. To address discrimination in the workplace, diversity training is a popular tool amongst employers. Whilst it is important to acknowledge the effects of implicit bias on discrimination, there is a lack of robust evidence that implicit bias reduction in “diversity training” is effective in accomplishing lasting changes. Firstly, there is a conceptual challenge in addressing attitudes that by definition people have a limited awareness of. Secondly, the fact that companies spend money on implicit bias training despite limited evidence of positive effects suggests it is a potential channel for ethics washing. Techniques do not seem to reduce implicit bias outside of the sessions and have not been thoroughly tested in natural settings.
To instigate meaningful change, we would like to see organisations confronting the ways their structures propagate unfair power dynamics, and making necessary shifts to redress unjust inequalities. Effective methods have been found to include implementing accountability structures so that units are held responsible for diversity, equity, and inclusion (DEI) goals; establishing organisational opportunities for various groups to interact with one another; providing resources to support underrepresented members; changing organisational messaging; blocking the biases of decision-makers from affecting processes (e.g. blind auditions in orchestras).
Attendees#
Huw Day, Data Scientist, University of Bristol: LinkedIn, BlueSky
Amy Joint LinkedIn
Kamilla Wells, Citizen Developer, Australian Public Service, Brisbane
Joe Carver, Data Scientist, Brandwatch
Melanie Stefan, Computational Neuroscientist, Medical School Berlin. Website last updated when the dinosaurs roamed the earth
Euan Bennet, Lecturer, University of Glasgow, BlueSky
Robin Dasler, Data Product Manager, California
Paul Matthews, Lecturer, University of the West of England, Mastodon