Critical perspectives on Computer Vision#
Whatâs this?
This is summary of Wednesday 12th Mayâs Data Ethics Club discussion, where we spoke and wrote about the conference talk Critical perspectives on Computer Vision by Emily Denton.
The summary was written by Huw Day, who tried to synthesise everyoneâs contributions to this document and the discussion. âWeâ = âsomeone at Data Ethics Clubâ. Nina Di Cara and Natalie Thurlby helped with the final edit.
The View from Nowhere#
The idea of the view from nowhere is to be completely impartial when reporting/drawing conclusions. By some itâs considered a holy grail in unbiased reporting (be it journalistic or scientific). By others itâs considered a place that doesnât even exist, a naively idealistic goal that leads to sloppy writing filled with a narrator trapped in an argument with themselves, often at the expense of the readerâs understanding.
Anyone with a technical but largely theoretical background (for example, abstract maths, computer science or theoretical physics), might understand the view from nowhere as not only a place that exists, but the only logical place from which to argue theoretical results. In the axiomatic approach to proofs in abstract maths for example, statements donât come with bias, only true, false or as yet unproven. If youâve ever scribbled a mathematical proof that 0.9 recurring is the same 1 on the back of a napkin, only to have your mathematically complete logic refuted by a bartender, you know what Iâm talking about here. And if you donâtâŠprobably for the best.
The real world however, is a little less black and white. Whilst you can still reach objective truth among certain parameters, thereâs a whole lot of maybes out there. As soon as you deal with anything concrete but potentially uncertain, the view from nowhere starts looking like a lost cause. A set of points can fit a statistical distribution with a certain level of confidence, but it takes a leap to say that this correlation implies causation.
Meanwhile anthropologists and ethnographers have understood for a long time that you canât have a viewpoint-less view and that the very nature of our observations is affected by the lens through which we view the world. To reach scientific conclusions, hypotheses must still be made and tested and those hypotheses are nothing more but guesses, fueled by our bias. Even in abstract problems with no real world consequences, many results are guessed before they are proven.
Better accounting for our views#
In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP 6.
âWhat are you doing?â, asked Minsky.
âI am training a randomly wired neural net to play Tic-tac-toeâ, Sussman replied.
âWhy is the net wired randomly?â, asked Minsky.
âI do not want it to have any preconceptions of how to playâ, Sussman said. Minsky then shut his eyes.
âWhy do you close your eyes?â Sussman asked his teacher.
âSo that the room will be empty.â At that moment, Sussman was enlightened.
Machine learning requires a mapped dataset for training, usually with a binary result. There is little to no accounting for the grey of the world if the model is built on true/false prediction accuracy. Worse, this can encourage us to think of the world in a binary way, and flatten complex concepts down into a binary. It could be relevant to think about how to change machine learning inputs/outputs to move away from boiling down the world in this way.
This will require much more detail and thought in our methods: if weâre not viewing from nowhere, where is the view from? Rather than seek out neutrality from where we are now, we should use detailed annotations and descriptions in our data sets to understand where our bias comes from. For example, perhaps in an Italian data set, a higher % of men would be described as âbelloâ (beautiful), as thatâs more common than describing men as beautiful in English.
Documenting our view means publishing who your annotator(s) are, their demographics, what instructions you gave them. Perhaps as the field evolves, further information might be the norm as we ask what is annotating data useful for? What should it be useful for?
However, a distict problem that is common in data science is that itâs rare to use your own data; typically you use what other people have collected. Reusing pre-made datasets can smuggle in a lot of preconceptions that we arenât explicitly aware of (this especially applies to collecting personal data). Datasheets for Datasets, an approach introduced to computer science by Timnit Gebru, is probably a great place for us to start handing over our data sets with care. But there are many existing datasets that are widely used that we know next to nothing about.
Letâs think carefully#
In the video Denton discusses biological essentialism, and the harm that it can do. Its a flaw that many fields fall into, so how could we do better?
One suggestion was that data scientists should think critically about the variable we want data for, and separate that from variables that we assume are related/correlated. For example, asking someoneâs gender (to ask for potential pregnancy in medical setting), instead of asking are you potentially pregnant. If we use these âpseudo variablesâ instead, we may end up encoding our own biases and assumptions in the model without realising. Since data scientists are quite likely to be white cis men, itâs more than likely that they will display (unwillful or otherwise) ignorance on certain categories.
For all of us as data scientists do this effectively though weâll need to keep challenging our own understanding of socially constructed concepts. For instance, how on earth can we account for race, when racial identity is socially constructed? It also requires us to think carefully about the research problems we are trying to solve: thereâs a big difference between saying âwe should collect this data and then control for itâ and âwe can predict this personâs sexuality/gender etc.â
So, computer vision: good or evil?#
Overall, we recognised that there are lots of benefits to computer vision - we use it all the time in our day to day lives! But as with most ethical problems, the question is more about when should we use it, and what for.
This discussion will keep coming back to fundamental problems of epistemology and perception - these problems are not necessarily exclusive to data science and machine learning, they crop up everywhere in science! We need to be aware that machine learning is the systematisation of (flawed) human perception and understanding. This might wind up a few logiciains and topologists, but getting scientists to consider the potential downsides of their work could be an important step towards more ethical research practices.