Data Ethics Club: A giant biotechnology company might be about to go bust. What will happen to the millions of people’s DNA it holds?#
What’s this?
This is summary of Wednesday 20th November Data Feminism Book Club discussion, where we spoke and wrote about the article “A giant biotechnology company might be about to go bust. What will happen to the millions of people’s DNA it holds?” by Megan Prictor. The summary was written by Jessica Woodgate, who tried to synthesise everyone’s contributions to this document and the discussion. “We” = “someone at Data Ethics Club”. Huw Day helped with the final edit.
Article Summary#
23andMe, one of the largest biotechnology companies advertising genetic testing services, has undergone a rapid downfall from being valued at US$3 billion in 2022 to being on the verge of bankruptcy today. Since it was founded in 2006, 23andMe has never made a profit. In 2023, the company suffered a major data breach which affected almost seven million users. In 2024, 23andMe settled a class action lawsuit for US$30 million and its value dropped more than 97%. The article and this freakonomics podcast explore what led to the demise of 23andMe, and what bankruptcy could mean for consumers.
23andMe promises that DNA data will never be shared with employers, insurance companies, or public databases without consent. Currently, consumers are able to choose whether their spit sample is kept by the company or if their de-identified genetic and other data is used in research; 4 out of 5 people have agreed to their data being used in research.
Aside from granting research access to genetic data, 23andMe uses consumer data in many different ways including sharing with service providers. If the company goes bankrupt or is sold, peoples’ information could also be accessed, sold or transferred. To avoid this, people can delete their account. Deleting your account opts you out from future research and discards your sample, however, the company will keep hold of information including your genetic data, date of birth, and sex. Additionally, if your data has been used in research already you can’t get it back.
Discussion Summary#
What are the potential risks that could come about with 23andMe using peoples’ data in the way they’ve outlined in their service agreements?#
Under the service agreements of 23andMe, there is a lot of uncertainty surrounding what will happen with the data of consumers. The service agreements state that data will never be shared with employers, insurance companies, or public databases without consent. However, this is a short and abstract list that could be interpreted in many different ways. We don’t know who will buy the data or what they could do with it. There is a lot of opaqueness of how this kind of data is handled; another DNA testing company with links to Russia has recently just “vanished”, along with the data it held.
Uncertainty in what will happen to the data makes us alert to the risks of potential misuse. There is precedence for companies misusing data such as Cambridge Analytica. It’s hard to know exactly how dangerous biological data could be in the wrong hands, on a scale of malicious insurance companies to biological weapons. Biological data is especially vulnerable to mistreatment; after donating your body to science, for example, there is the possibility that students may not be respectful of your body or DNA could be sequenced from it.
If there is a risk that data could be misused, people may want to reclaim their data. However, a problem with 23andMe and so many other domains is that once you’ve shared the data you can’t get it back or keep track of what happens to it. Volunteering samples of your DNA is similar to donating your body to research in that once it’s gone, you don’t know where it’s going to go. Organs, like DNA, could be considered data and we should be concerned about how they are used by other people. Once the data is in the hands of third parties, it becomes even more difficult to find out what happens to the data. Many similar questions arise in regard to what happens to our data when we pass away. For 23andMe, it is possible to ask for data to be destroyed but only if it hasn’t already gone for research.
What it really means for data to be used for research is not clear; when you agree to volunteer your data, exactly where does it go and how long does it take before it is shipped off? “Research” is quite a broad umbrella which could feasibly include market research or institutions with less stringent ethical procedures. If you volunteer your data, the lack of control over what happens next thus entails you could be contributing to unethical purposes such as the identification of autism genes, which incites some thorny ethical dilemmas.
Whilst keeping track of your data is partly supported by a section on the 23andMe website where you can see the published works you have been a part of, the possibility that your data might be used for unethical purposes entails that reporting after publication will sometimes be too late. We would like to be notified before that stage; ideally, consumers could opt out of their data being used for specific purposes.
In reality, however, it may be unlikely that the company will ask for consent each time to ensure consumers are properly informed about where their data goes as it requires significantly more admin. Even if the company did send an email to all customers with a tick box agreeing for their data to be used in whichever particular way, it seems unlikely that many people would actually read the email. Identifying appropriate ways to keep people informed could be illuminated by the processes that other organisations such as the NHS have to tell people how their data has been used.
Being kept informed about how your data is used has benefits for your autonomy yet also invokes privacy concerns if you are singled out as someone used in a particular study. Making the information about which individuals were involved in which research project more widely available increases the possibility that individuals could be de-anonymised. 23andMe itself was subject to a massive data breach in 2023. Investment to prevent data breaches is essential, yet as a private company it is not clear what accountability the company has.
Infringements on privacy are a risk with any sort of data collection but are especially risky in the case of DNA as DNA is immutable and can be heinously misused. If your password or credit card is hacked, it is possible to change your details to make the data secure again; it is not possible to change your DNA. Genealogy data can be used for surveillance; the police globally use genetic data. Police use of genetic data has resulted in some famous cases such as the Golden State Killer. Although genetic screening has a certain place, there are limits; there is a long history of people using genetics as a way to promote “superior” or “inferior” people through eugenics.
In addition to DNA being a particularly sensitive type of data, DNA is special as it is not solely relevant to you but also all of your blood relatives. What happens to information about your DNA is thus also a privacy issue for other people and puts them at risk. Tracing lineages through genealogy has rippling implications for peoples’ lives such as discovering additional living relatives. 1 in 20 people using genetic testing have found out that their parent isn’t their biological parent. Along similar lines, genotyping your pets can lead to people discovering that their dogs are illegal breeds.
When relatives have been discovered, people must then navigate factors such as whether they would want to meet up and how new dynamics affect wills and inheritance. Many anonymous sperm donors made the decision to donate at a time when it was not possible for offspring to trace them in this way. After 2005 [1] the law changed so that anonymity was removed for donors, which initially led to a decrease in donations. This cartoon explores the multifaceted issues that arise from peoples’ DNA stories).
Considering that a lot of people who use these services use them to build up a family tree, we wondered if most people are aware of the repercussions of what they are signing up for. Building family trees uses an immense amount of peoples’ data; perhaps relatives should be consulted before a family member decides to share. Helen Nissenbaum’s idea of contextual integrity could help delineate what is appropriate. Nissenbaum argues that privacy protection is tied to the norms of specific contexts, building on Michael Walz’s “spheres of justice”.
If you were the Chief Executive of 23andMe, what would you be prioritising to make sure highly personal genetic data was being protected if the company is sold (or even if it isn’t!)#
Deleting the data completely or returning it to consumers or would be a top priority for us. If the company hasn’t made a profit for 18 years and is essentially bankrupt, there might not be much point trying to profit from the data now. An ethical “spin” could be put on deleting the data, such as the environmental impact of storing that much data and the reduction in emissions from deleting the data.
Realistically, however, we were unsure as to whether deleting the data makes good business sense. The amount of financial trouble the company is in provides a strong incentive to sell the data. Going bust might be the rational business step if it means the controls and constraints on the use of data fall away and the data can be easily sold. Making a profit out of selling data may not be an intrinsically bad thing, but either way the motivation should be made clear and transparent.
Underlying motivations of the company are unclear to us, further complicated by the CEO’s connections to big tech companies such as YouTube and Google, as well as how vocal she has been about using the data for health and drugs. Working with highly sensitive data for profit motivations seems like a conflict of interest. We wondered if the company really intended to make a profit from the services they advertise. 23andMe are contradictory in that they proclaim data protection - they will not share data with certain sources under the guise of protecting people from being identified - yet may decide to sell the data. It could be the case that 23andMe was a front to collect a massive genetic data repository, and it is the data that holds the true value. Now that the data has been acquired, it could be used elsewhere. Selling the data may be the inevitable consequence of involving profit motivations in genetic sequencing.
If deleting the data is unlikely or if the data is to be sold, the top priority should be anonymisation. It could be a big assumption that it is possible to deidentify DNA. Genetic data is extremely specific, and it seems unlikely that it is possible to deidentify things that are so intrinsically identifiable. For particular datasets, e.g. about Parkinson’s, people with distinct features like a brain implant are much more easily identifiable. People with certain genetic characteristics would also be much easier to identify. Even when genomic datasets are considered deidentified they can be coded to allow re-identification.
Despite the challenges, it is possible to anonymise genomic data but it must be determined on a case-by-case basis. Relevant approaches could be similar to those used to anonymise census data, which swap information around so that the statistical outputs are the same, but people can’t be identified. Whilst a gene sequence is functionally unique to a person, just knowing that sequence isn’t sufficient to identify them; it’s a bit like being told “this is a password that someone, somewhere, uses”. However, data can be identified if it is linked with other data that people have shared. For example, if your siblings or cousins have shared their data and been identified then the genomes will be associated with yours and can be connected.
Data security aside, would you want to take a DNA test knowing that you might find out things about your family or about health conditions you could develop?#
The decision to take a DNA test is highly personal. If there were no data security considerations, some of us would take a test for health reasons, feeling that the more we know about our health the better. Identifying a condition early on allows more time to adapt lifestyles and mitigate deterioration. Knowing about conditions that shorten our lives might encourage us to dedicate more time to doing what we love. Aside from health, we would feel incentivised to contribute our genetic data if we knew it was going to be used for something beneficial for society such as good research.
A lot of us, however, would not take a test. Some of us do not trust our genetic data to be in the hands of private companies. Others wouldn’t want to know if we have genetic conditions; knowing might not affect the success rate of recovery. After being diagnosed with cervical cancer, Hannah Fry explored the statistics surrounding the way we diagnose and treat cancer. Fry asked if sometimes we are too quick to screen and treat the disease.
Once 23andMe identifies that you have a condition, we wondered what the next steps are and how you would be safeguarded. Discovering genetic health information without a genetic counsellor leaves you vulnerable and without guidance, which is unethical. If a medical professional was relaying a diagnosis to you, there would be steps in place to support you afterwards.
In addition to the social repercussions of taking DNA tests, analysing genetic data is not appealing to some of us because of concerns about its accuracy. Analysing DNA is a technically difficult problem, and questions have been raised about how accurate health predictions made from genetic data are. It may be the case that DNA is marginally influential over our health outside of certain genetic conditions. Genetic testing companies advertise the results of genetic tests as deeply meaningful, yet identical twins can receive different results from the same company or the same person can receive different results from different companies. Predictions about health may be better achieved by looking at lifestyle and family history.
The reports are best thought of as estimates based on imperfect data. Data that genetic testing companies infer results from is not representative; data comes from a specific group of people interested in their family history and for whose demographic there is recorded historical information. Until recently, 23andMe could only match people to three broad regions in sub-Saharan Africa. Whiteness is overrepresented in genetic testing, especially in places like the US, where the history and ancestry of enslaved or indigenous people was purposefully and systematically destroyed. For descendants of these people, it is difficult to trace roots back more than a few generations, as depicted in this comic.
What change would you like to see on the basis of this piece? Who has the power to make that change?#
As a medical device company - which 23andMe are - there should be stringent rules on how the data is used. Yet there seems to be no specific regulation for private companies in these areas and it is unclear how the sale of the company would be regulated. At the end of the day, the data is a rack of computers; it is unclear who would have the responsibility for them in a succession process. As the bankrupt company has been dissolved, they presumably can’t dispose of the assets. We were unsure if there are any formal measures to halt the export of hardware, or the data it contains, to a non-EU environment and arbitrary use made of its content. Even if there were obvious rules, the infrastructure to see them enforced does not currently exist. Big companies avoid retribution; after Cambridge Analytica Mark Zuckerberg rejected invitations to testify before UK MPs.
Similar regulatory issues have arisen with the UK biobank, which experienced many controversies including difficulty of obtaining informed consent, designating property rights, and generating profit. Even when regulation is successful, the consequences are insignificant for large companies. GlaxoSmithKline paid out US $3 million in a drug fraud scandal yet their profit in 2023 was US $38 billion.
Current regulation for handling data may inform how to address situations like 23andMe. Private companies holding sensitive information should have the same level of safekeeping as other institutions and implement processes such as those the NHS has for obtaining data. Getting data from the NHS is a highly regulated process requiring you to jump through many hoops before you are granted access. GDPR might be an avenue to protect us from situations when a company goes bust, as consumers could apply for the right to be forgotten. The export of data may be effectively blocked by GDPR if whoever is discharging the bankruptcy also inherits the data protection obligation.
In the case of 23andMe, the California consumer privacy act (CCPA) applies. Whether it is possible to sell or lease personal data from a bankruptcy depends on the language used in the privacy policy. Essentially, companies can do what they like with data during bankruptcy unless the privacy policy prohibits the transfer of personally identifiable information. In the US, if the privacy policy in question forbids this, a consumer privacy ombudsman is appointed to review the sale in light of applicable law. The regulation that 23andMe is subject to thus depends on their privacy policy. Predicting what will then happen with the data is dictated by how much of the data is considered non-identifying, and how much effort has gone into assessing residual risk on that data. The CCPA does not seem to say much about regulating data transfers, which GDPR is very specific about. We should thus consider the potential of the new owner moving the company to a state with weaker consumer data protection than California.
Whatever the applicable regulation is and whatever happens to the data, we would like to see more transparency introduced into the process. If someone has a compelling reason to access data, it is normally justifiable as legitimate interest. Legitimate interest and data processing impact assessments should be done, registered, and shared. Spending time to write down all the things you plan to do with data may not result in any of those things actually getting done, but it is still important to write them down.
Risks of misuse can be mitigated by implementing ethics committees and supporting them to be effective. In assessing how to interact with sensitive data, we must consider both benefits and risks. Just as nuclear fission is useful for power and not just destructive, there are lots of beneficial things that can be done with genetic data. In deciding our processes, we must look at both sides and ask ourselves if it is worth the destruction that could occur.
There are moral obligations to destroy the data if the company goes bankrupt, however, it is an extremely rich data source. When we think about it from a purely data science perspective it seems a shame to delete the data; we could get so many more insights if we increased the dimensions of the dataset. DNA testing could potentially help with targeted medicine. The CEO associated the brand with potential pharmaceutical cures, and the first stage of clinical trials did not go badly. 23andMe is a massive data repository which might not otherwise exist; opportunities to collect genetic data about so many people are few and far between under conventional channels. Usually, participants would have to go out of their way to get involved with studies. There are a lot of examples of genetic data being stored and used “correctly” around the world such as in the NHS/NHR which suggest there are ways that we could keep the data in an ethical way.
If the bankruptcy of 23andMe is not handled well, it puts a huge strain on public trust in gene sequencing work. The knock-on effect could be that people are less likely to donate genetic material for good research. To rebuilt trust and encourage participation, research could give people a tangible return such as a detailed report after their data has been analysed. Some studies don’t return any information for a long time, if they return anything at all; such a report would be a direct incentive to participate.
Attendees#
Huw Day, Data Scientist, Jean Golding Institute, University of Bristol, https://www.linkedin.com/in/huw-day/
Amy Joint, Programme Manager, ISRCTN Clinical Study Registry, https://www.linkedin.com/in/amyjoint/
Vanessa Hanschke, PhD student, Interactive AI, University of Bristol
Jessica Woodgate, PhD student, University of Bristol
Virginia Scarlett, Data and Information Specialist, HHMI Janelia, Ashburn VA USA, :coffee:
Adrianna Jezierska, PhD student, Business School and current AskJGI team
Paul Lee, investor
Kamilla Wells, Citizen Developer, Australian Public Service, Brisbane
Jarmila Regulova, London Borough of Tower Hamlets (hi all, I’m a new face here, a colleague recommended this meeting to me)
Brooke Morris, PhD Student, Diverse-Ability Lab (Bristol Interaction Group) (also new! invited by vanessa!)
Euan Bennet, Lecturer, University of Glasgow
Khadiza Laskor, PhD Student, University of Bristol
Emma Tonkin, Research Fellow, University of Bristol