“You Social Scientists Love Mind Games”: Experimenting in the “divide” between data science and critical algorithm studies#

What’s this?

This is summary of Wednesday 9th February’s Data Ethics Club discussion, where we discussed the paper “You Social Scientists Love Mind Games”: Experimenting in the “divide” between data science and critical algorithm studies by David Moats and Nick Seaver.

The summary was written by Huw Day, who tried to synthesise everyone’s contributions to this document and the discussion. “We” = “someone at Data Ethics Club”. Nina Di Cara and Natalie Thurlby helped with the final edit.

Introduction#

This week at Data Ethics Club we discussed the paper “You Social Scientists Love Mind Games”: Experimenting in the “divide” between data science and critical algorithm studies by David Moats and Nick Seaver. The paper examines the divide between and intersection between data scientists and social scientists using a methodology more familiar to anthropologists than those with a STEM background:

“we attempted to stage an encounter in which practicing data scientists were asked to analyze a corpus of critical social science literature about their work, using tools of textual analysis such as co-word and topic modelling. The idea was to provoke discussion both about the content of these texts and the possible limits of such analyses.”

We discussed the authors’ approach, the divides between social and data science and what we can do to bridge this gap. We see some reoccuring themes when data scientists try to bring their tools to bear on social science problems and vice versa: https://xkcd.com/1831/

What do you think about the authors’ approach (framing of the experiment, the first vs the second attempt, tools suggested, their reflections, Nick’s email)?#

We thought the experiment was interesting, getting data scientists to be reflective in a way that is not typical of their skillset, though some of us felt the experiment lacked specificity about what data scientists actually do (using quantitative methods to answer specific questions). Perhaps it could have been more successful by asking the data scientists to read the papers and discuss them, as opposed to getting them to analyse them with quantitative methods. It didn’t seem like there was a way for the participants to succeed with their analysis.

On the other hand, maybe the utility of the paper was in the self reflective nature of the study? If the objective was to start a conversation rather than to get any particular analysis out of it, then there is still value in that. The training of the authors is ethonography and anthropology. This mode of reasoning is different to that of data scientists and the hallmarks of what the participants were meant to achieve is different. This means lower sample sizes a greater focus on individual feedback/insight, with the discussion of the experiment coming from the point of view of statistical/scientific method.

It’s not necersarily a bad experiment and it’s also a very early pilot study. In this instant we would prefer fewer stories with more substances than many stories with little substance. On the topic of ethnography, perhaps it would’ve been helpful to have observational steps based on case studies before progressing to this point, to characterise the systems in which data scientists operate?

The authors mention the power imbalances between data scientists and social scientists but didn’t expand on it. We would have liked to have seen more on this. Those working in STEM, (for example MDs and data scientists), are expected to know the answer and to express certainty, when sometimes they just don’t know. Perhaps the goal of the experiment was to tell stories, and so in that sense the experiment was successful.

What other divides, binaries and camps should we pay attention to?#

There seem to be Two Cultures in Statistical Modeling:

“There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown.”

Data scientists are submerged in the technical details whilst social scientists have to consider the social context - how did the categories/labelling come about? Data scientists come from lots of different backgrounds. The paper focussed on data scientists in academia, whereas many are also in industry. Within industry there is also a divide between the marketing/sales side vs the people who are involved in the technical work.

Data scientists are having to make rules and regulations around what they’re doing. The way that law formalises and thinks about things is very different from how the startup world wants to innovate and has an inevitable time lag. There is an interesting divide within data scientists between academics and more industry focussed tinker-ers.

There tend to be different normative aims/committments. Social scientists prefer to leave questions open to provoke debates as opposed to data scientists who will try and give an answer. Social scientists study the people which is quite different to try to understand the subject from the outside.

We would question the idea that ethics can be taught in the manner of, say, Javascript 101. As with law, there’s a hefty contextual element and efforts to deal with it early in a project and then forget it exists tend to be a) commonplace and b) unhelpful. Perhaps the best we can do is build in ethics components into formal data science training and make sure people are aware about support and organisations able to give advice (e.g. data protection officer for example).

Social scientists and data scientists are quite heterogenous group names - there’s lots of diversity and differences within the same field. Perhaps data scientist is not a very helpful construction, like the label “AI”. Is AI just a shorthand for something that gets funding?

Which boundaries should we be paying attention to? What do we want to change/keep the same? The fact that social scientists and data scientists don’t talk to each other so much is likely a problem. What about social scientists who do quant stuff? Computational social scientists? Is it getting closer or is it just two groups refusing to cross train appropriately? Is ‘discipline loyalty’ more of a generational thing these days? Should we consider using term ‘transdisciplinary’ as an alternative to ‘interdisciplinary’?

What intervention could we stage in this group?#

We all thought this paper was really meta - that’s not necersarily a bad thing. Interventions could be you don’t give recommendations, but you do workshops with data scientists to try to test the theory of changing minds/reaching improvements (a more applied type of social science). From our group, data scientists, physicists, and social scientists all found this very readable! No unnecessary coining of new terms.

Is the very existence of this group the intervention? It seems that there is a place for teaching active listening; for fostering further mutual respect (even beyond STEM); and for encouraging the use of phrases like ‘I don’t know, what do you think?’ Other projects like Data Hazards count towards this effort as well.

With so many questions being raised about the divide and sometimes arbitary lines drawn between disciplines, there needs to be more of an effort to cross disciplines. Just because we study degrees in set disciplines (be it STEM such as physics, statistics, computer science or closer to sociology, anthropology or philosophy), it does not mean that the problems we face will fall into these cookie cutter shapes. Different training backgrounds grants us different tools and perspective with which to approach problems. Some tools will of course be better suited to certain problems, but all of us, regardless of our background and expertise could likely stand to benefit from being a little less intradisciplinary and a little more interdisciplinary.


Attendees#

Name, Role, Affiliation, Where to find you, Emoji to describe your day

  • Natalie Thurlby, Data Scientist, University of Bristol, NatalieThurlby, @StatalieT

  • Huw Day, PhDoer, University of Bristol, @disco_huw

  • Emma Tonkin, Research Fellow (Digital Health), University of Bristol @emmatonkin

  • Sergio Araujo-Estrada, Research Associate, University of Bristol

  • Euan Bennet, Senior Research Associate, University of Bristol, @DrEuanBennet

  • Ola Michalec, RA, University of Bristol

  • Robin Dasler, research software product manager, daslerr

  • Kamilla Wells, Citizen Developer, Australian Public Service, Brisbane