Data Ethics Club: ChatGPT listed as an author on research papers: many scientists disapprove#

What’s this?

This is summary of Wednesday 7th February’s Data Ethics Club discussion, where we spoke ChatGPT listed as an author on research papers: many scientists disapprove, a Nature article written by Chris Stokel-Walker. The summary was written by Huw Day, who tried to synthesise everyone’s contributions to this document and the discussion. “We” = “someone at Data Ethics Club”. Jessica Woodgate, Nina Di Cara and Natalie Thurlby helped with the final edit.

What are our experiences with Chat GPT?#

Many of us haven’t used Chat GPT but it’s obviously quite mainstream and talked about. It seems to be everywhere at the moment. Patrick has played around with it but knows plenty of people are using it professionally and casually. It’s not really a secret that people are using it to help with work.

Several of us have used Chat GPT to look up some code but already have lots of concerns around the accuracy of the code - Huw even used it to write a data ethics writeup and detailed his experiences using it here. Ola has played around a little bit, and it is a bit basic, dryly communicating the message rather than creating a beautiful argument. It has a very strict structure.

ZoĂ« can’t be bothered to sign up and doesn’t like the idea of paying, which might come in. A payment model might be what ends up happening here, in the same way people pay for Grammarly. Is Huw not using it anymore because he feels like he’s cheating? Whilst it can help you achieve tasks, it takes away your ability to learn how to do it. It might be slower to look up how to write code as you go, looking up different functions and tweaking parameters to get the desired result, but what you gain in speed with ChatGPT you lose in time to contemplate. ChatGPT writes well commented code but if you don’t take the time to read through it, there’s no way for you to learn from the process.

Many of us have read about undergrads writing essay based questions using Chat GPT and heard about markers being told to stay on the look out for signs of it being used. Has ChatGPT made traditional homework assessments all but impossible? Does this tool require us to reform the way we assess in education? If so, what does that say about the way we assess in education? Using ChatGPT to cheat by writing essays should tell us that essays aren’t the best way to assess knowledge/creativity. Why is it surprising that ChatGPT can do as well as a last-minute essay written overnight before the deadline? It’s the same process of throwing whatever rubbish on the page and hoping for the best!

Noshin hasn’t used it yet, but has noticed that school application systems are upping their exam pass thresholds to try and get around this. Schools don’t appear to have a way of combating this yet. If you ask ChatGPT where it got its data from, it doesn’t always know! This doesn’t mean it won’t get flagged for plagarism.

We have seen examples of ChatGPT being used to write CVs and cover letters which are better received because they are being read by language models. What happens if you get ChatGPT to write a cv based on your qualifications? Does it write something similar to what you wrote?

Zoe has used ChatGPT in work: “Write a blog about NHS R community” and the results were very boring! 11-year-old son of Zoë’s colleague has used it to make up stories. Could this capability be used maliciously to write conspiracies? Almost certainly. Perhaps the scariest part are the applications outside academia. People think it’s a search engine (as Bing have) and just believe the answers it gives. Mathoverflow already had a deluge of nonsense answers written by Chat-GPT. How would the pandemic have gone if ChatGPT was around and mainstream earlier? Would there have been even more conspiracies?

Some of us have asked ChatGPT about itself. It has lied about it’s training data - it said it was trained up to July 2021, but if you ask who the ceo of Twitter is it answers Elon Musk, which occurred after this date. When tackled, it lies, just as humans can. However, are we anthroporphising by naming this ‘lying’? Perhaps something else is happening, for which we don’t yet have terminology.

A language model that was constantly being trained by a variety of inputs could easily be skewed into regurgitating conspiracies and false statements. There’s not a clear consideration of truth in the algorithm as the ground-truth is sometimes up for debate, which is also the case in real life. There could even be psychological impacts on people who have to review the content being generated. Miranda did research of GPT3 showing bias against different cities in the UK e.g. “A Bristolian walks into a courtroom”.

“It’s just like that guy at the party who thinks he knows all the stuff” as one our members put it. Or perhaps a STEM major weighing in on social issues they know nothing about, with the same confidence they talk about their own area of expertise.

One of the benefits is that people suddenly care about this, a lot. Research has been going in this direction for ages, but very few people were actually concerned. Some of us are hopeful that since Microsoft have bought access to ChatGPT. We are hoping that if ChatGPT3 integrated with office tools, we’ll all be using it, and Microsoft will need ethical accountability. We don’t rely on what we as individuals think is ethical, we rely on private companies and what they think is ethical. E.g. America has no GDPR, which entails lots of data security breaches.

We had concerns around data storage - what information is it holding about you, and how safely is this secured? Technically, for researchers much of what they write belongs to the university anyway. However, there are cultural differences in preferences over data management; in the US people care more about monetary compensation for their data, whereas in Europe the focus is on credit and privacy of data. Companies can collect data initially for one reason, but then can be used for completely different purposes without their knowledge if it’s picked up by other services (chatGPT, Facebook etc). You can actually check if you are in some of the larger image databases.

Should ChatGPT be allowed to be listed as an author on publications?#

What accountability do we expect from a human author that we can’t get from ChatGPT? Supervisors often get names on papers even if they contribute nothing. On the other hand, undergrads often get left off authorship lists when they have contributed meaningful work. Should we then include ChatGPT as an author if we’re neglecting real human contributors? A point Robin made was that it’s weird how people seem almost more likely to allow ChatGTP as a co-author than allowing an undergraduate research assistant who has actually done a lot of the lab work. They often are relegated to acknowledgements, at best. What does that says about the position of research students in the science hierarchy?

Academics aren’t great at calling each other out for poor behaviour. Some journals are getting better at properly attributing credit to authors. Language models are clearly not authors! Authorship on papers sometimes doesn’t reflect what people might expect. (who did the actual work? If the journal doesn’t use proper credit attribution, then how can a random reader tell?) The lines blur between contributing as an author when you’ve just checked something over, as opposed to writing hundreds of words.

Thanking your mum and dad in acknowledgments doesn’t mean mum and dad should be co-authors. If someone who did hardly any of the work/writing can be an author, then why not ChatGPT? The problem is the academic culture surrounding attribution of credit. This also ties into deeper problem with the superficial metrics of ‘success’ in academia.

Human authors don’t always also have accountability (e.g. papers with 3000 authors), so where we draw that distinction? Viv mentioned a paper which has ChatGPT as the lead author. What does that say about the contributions of the co-authors? Huw is yet to see an example of ChatGPT being a paper co-author that wasn’t laziness or novelty.

It all comes down to how you define an author - are you copying text from ChatGPT word for word, or are you citing it? Can it be a reference or acknowledgement instead? Citations are maybe next best place, but do Language Models need their own form of citeable identifier? ChatGPT can’t be legally responsible like an author can. It can get the right grammar and sentence structure, but it can’t know if the information is correct. A ‘senior prof’ author on a paper who gets put on without doing anything is different because the human should be responsible for any inaccuracies!

People should be transparent if they’ve used it, but should we give them credit for writing with it? It’s just taking other people’s content and re-working it. It’s not original thought, but don’t people do that all the time? Could you develop some kind of explainability system? How hard are these systems to explain? We expect the work that is produced to be credited and cited appropriately. If we can’t give credit where its due then maybe we shouldn’t use ChatGPT. If you were to plagarise a piece of writing knowingly then that’s clearly bad. Why should it be any different if you get a black box system to plagarise many sources in a way you don’t understand?

How should we use language models like ChatGPT going forward?#

There are some fun, novel things you can do with ChatGPT: Huw used it to write a haiku about his mum’s violin playing waking the dog up. Using it for creating and commenting on coding is also a very useful application of it. There are issues with relying on being able to access ChatGPT to do your job in the case of outages. Additionally, those of us who had tried using ChatGPT to help with code felt that it inhibited our ability to learn that code. This is in contrast to the standard method of trial and error.

But is ChatGPT just used for novelty or is its main value entertainment? What’s the breakdown? When the creators made it, what use-cases did they have in mind? At the end of the day, we are training data for this model, but to what end? If you’re not paying for something then you’re the product.

There’s this flowchart for good use. It presents a nice premise, but the trouble is, who is actually using it this way? Individually it may not matter if one output is true, but cumulatively this could get really bad. Every problem is self-reinforcing, as ChatGPT learns from its own outputs. Will people ever have time to check all the outputs? ChatGPT talks very confidently even when it’s completely wrong. But humans are also really bad at being confidently incorrect, especially in certain fields, however we are still accountable for mistakes or misinformation! We need to fight back by being more creative, more accountable, more open about our flaws etc. We can’t uninvent ChatGPT, but we can force it to be better by being better ourselves (i.e being more human by acknowleging mistakes and being uncertain!).

Is there a possibility ChatGPT will influence how we write and speak? Is it about time for the death of the essay? How is it updated? Might we need to completely reform publishing, as well as education and specifically methods of assessment? Some are experiencing wonder at how seemingly creative ChatGPT has proven to be, even asking what sets us apart from it in terms of these capabilities but from a statistical point of view it is no more inherently creative that a linear regression soup. One of our members put it best:

“People trained a language model to sound like a corporate manager and concluded that the model is sentient rather than the correct answer which is that corporate managers aren’t.”


Attendees#

Name, Role, Affiliation, Where to find you, Emoji to describe your day

  • Natalie Zelenka, Data Scientist, University of Bristol, NatalieZelenka, @NatZelenka

  • Nina Di Cara, Research Associate, University of Bristol, ninadicara, @ninadicara

  • Huw Day, PhDoer, University of Bristol, @disco_huw

  • Euan Bennet, Lecturer, University of Glasgow, @DrEuanBennet

  • Noshin Mohamed, Children’s Quality Assurance

  • Vanessa Hanschke, PhD Interactive AI, University of Bristol

  • Melanie Stefan, Computational Biologist, Medical School Berlin

  • ZoĂ« Turner, Senior Data Scientist, Strategy Unit/NHSR Community, @Letxuga007@fosstodon.org

  • Laura Williams, Senior Data Scientist, Ministry of Justice

  • Ola Michalec, Researcher at the University of Bristol

  • Robin Dasler, research data product manager, daslerr

  • Amy Joint, Content Acquisition Manager, F1000 - @AmyJointSci