Data Ethics Club: “Participant” Perceptions of Twitter Research Ethics (25th Aug 21)#

What’s this?

This is summary of Wednesday 25th August’s Data Ethics Club discussion, where we spoke and wrote about the paper “Participant” Perceptions of Twitter Research Ethics written by Casey Fiesler and Nicholas Proferes. The summary was written by Huw Day, who tried to synthesise everyone’s contributions to this document and the discussion. “We” = “someone at Data Ethics Club”. Nina Di Cara and Natalie Thurlby helped with the final edit.


This week’s discussion centred around the paper “Participant” Perceptions of Twitter Research Ethics, written by Casey Fiesler and Nicholas Proferes. The paper focusses on the ethical issues associated with using openly accessible data from social media (in particular, Twitter) to conduct research.

The debate has two main sides. On the one hand, if you have a public twitter account, then as soon as you click tweet, whatever you’ve said is now fair game. The majority of users do have public accounts, some studies estimate only around 12% of users opt for private/protected accounts.

On the other hand, just because someone can get their hands on your data, it does not mean you consent to them using that data. (This links rather nicely to our previous discussion on Bellingcat, the Rise of Privates Spies and open source investigation).

Results and Implications - Attribution vs. Anon#

People’s responses would change quite radically depending on the aim/goal of the research. Studies that sought to classify indivduals might be less popular than studies that just recorded number of tweets on certain topics.

In the paper Participant Objections to Being Studied and the Ethics of Chatroom Research, researchers found they were less likely to get kicked out the more people are in the chat room. The likelihood fell in half each time there were 13 more people involved. It seems the larger the crowd we feel we can hide in, the more anonymous we feel. The distinction in this analogy is that it is quite hard to track a single person through a crowd - Twitter meanwhile comes with a built in “follow” button.

Should we have to curate our online presence? Is it too idealistic to want to be able to avoid this? What about the mass use of data? Can stuff anonymise it? Does this violate twitter TOS? What about sharing other people’s data? People find comfort in oversharing but you do not have to consent for someone to tag you in a tweet.

The study notes that people are worried about researchers using their whole history, perhaps because it feels more personally targeted. They also feel it a greater intrusion if there is use of deleted tweets. Excluding tweets that are deleted post the scraping of data is a challenge and researchers may not attempt to do so.

There are lots of situations where privacy is good - people may speak up against government or political opinion, professional risk to your employment if you tweet about them. Then again people can use privacy to hide behind a psyudonym for hate speech. You do not necessarily want to be in the position to enable people who speak with hate speech but if you get false positives (falsely attributing hate speech to someone), that could have negative consequences. Troll accounts specifically use the anonymity of twitter to abuse others, but just because we do not like what they have to say, that does not mean we should waive their right to anonymity.

Journalism vs. Open Source Research#

This brings us to an interesting parrallel between using public tweets for research and using them for journalistic purposes. The fact that you put a photo on twitter does not mean you consent to that photo/tweet to be published in the newspaper. Journalists find and publish tweets often - do they ask for permission? Not always.

Over lockdown, a member of the discussion group had their children following along with at home workouts in a room upstairs, above the kitchen where they were working. The wear and tear of daily star jumps and forward rolls was too much for the kitchen ceiling to withstand, dropping some of the ceiling in, leaving a mess in the kitchen. The member tweeted a photo of this mess with the accompanying story and a newspaper picked up on it, reporting it without this individual’s consent.

There is a delicate balancing act between attributing quotes to people versus keeping people anonymous. How is this different from writing in the newspapers? Tweets from a public Twitter account are accessible for anyone to see. If someone writes something down, they are now accountable for that, for better or for worse. Where does the balance lie on accountability? The article notes that Twitter TOS dictate that quoted tweets must be attributed to the original author. In practice, this is not at all enforced. Is this something the author always wants?

The attribution versus anonyminity debate applies in the research world as well. There is a difference between different STEM disciplines (social sciences/physical sciences) and how they treat data. A medical journal might seek repeatability, publishing the data any conclusions were made from so that their results can be replicated, whilst a social scientist is more likely to prioritise individual confidentiality.

Old Patterns, Lazy Research and the Importance of Education#

Has this debate happened before? Is this another cycle of open source information being analysed for research? There were personal sections/classified ads in newspapers which researchers could have in theory used for research purposes, but the response to this “problem” was arguably copyright laws. The balance to strike is between what we know we own and how we think it should be treated. Perhaps the scale is something to consider, with Twitter provided a larger and more easily accessible dataset.

Is Twitter scraping just lazy research? Chucking a load of tweets into a linear regression soup and see what comes out isn’t the most inspired approach. Perhaps first we should weigh up the purposes of scraping that data versus the privacy implications - is it actually going to help someone or are you just exploring some inane question?

General Data Protection Regulation (GDPR) is explicitly opt-in, and you have to know what your data is being used for. If Twitter has a blanket opt-in for research could this ever work? You probably wouldn’t get opt in for people who use their accounts for hate speech.

Also, it’s important to consider who is not on Twitter? Its population may present an enticing dataset, but not the same as a census for example. Researchers should be cautious of using Twitter as a source of data, (for example when conducting sentiment analysis on public messaging from NHS on Covid information).

In closing, we think the article missed the potential for public education. Perhaps the biggest utility of this study, was not the answers but the questions. Some of us felt that when we read through the “Findings” that the ideas from Table 3 “Percentage of Respondents Checking Each Contextual Factor, Ordered Highest Percentage to Lowest Percentage” presented some really interesting thresholds.

Most respondents were relaxed about some of their Twitter data being used in one way or another, but with various caveats and thresholds. Perhaps giving Twitter users more specific control over their data (whether or not consent is saught out for use of their data, what informations was being accessed, whether tweets were quoted verbatum, etc.) would at the very least give them an understanding of how their data might be used. This understanding might lead to them more carefully consider what data they present.

If Twitter was more interested in giving their users more control of their data, perhaps they would more carefully enforce various privacy options but you’d have to be a bit of a Twit to expect them to do this without more pressure from their users.


Note: this is not a full list of attendees, just those that were comfortable sharing their names.

Name, Role, Affiliation, Where to find you, Emoji to describe your day