New study explores loopholes in the protection and safety of your genetic information.
Although some areas of technology don’t appear to be evolving as rapidly as we might have expected, DNA sequencing is not among them. The Human Genome Project, which completed the first the full sequencing of the human genome took 13 years and was finally completed in 2003.
Now, less than 20 years later, do-it-yourself home DNA kits are widely available. Direct-to-consumer (DTC) genetic testing has become big business, a multimillion-dollar industry. Combined with the creation of online databases, it has allowed individuals to sequence their own genes and search databases for possible relatives. A new study by researchers at the University of California, Davis explores just how secure these online treasure troves of genetic information are.
These databases match individuals to relatives using a process called identity-by-descent (IBD). The process identifies segments of DNA that match between two individuals, indicating a likely familial relationship. The length of these matching segments usually indicating the strength of the relationship too (i.e. more closely related individuals will have more matching segments). Some of the shorter matching segments are referred to as identical-by-state (IBS) and indicate a more distant shared ancestry.
A problem, however, arises in that searching databases does not just identify who you might be related to, it also tells you how and where your genomes match. If you know your own genetic sequence (from the testing kit) and I tell you which segments match someone else’s, I am by default, telling you bits of their genotype. The DTC genetics companies would argue that people agree to this when signing up and that the matching serves as another layer of protection where closer relatives see more of your information, more distant ones see less, and strangers see nothing at all. However, what this study shows is that it is possible to access aspects of any random individual’s genome.
There are two approaches that could be taken to revealing this genetic information. The first involves using real genotype data available through resources like the 1000Genomes Project, and uploading them to various databases. The second approach involves designing artificial datasets tailored to identify the genotypes of users at specific sites of interest.
In this study, the authors published two examples of how they used real online databases to gather genetic information. The first method they labelled “IBS tiling”. This involves uploading multiple genotypes and recording where each one matched the “target individual”. By stringing together the matching segments, they are able to obtain significant genetic data on that target.
The second method they describe, IBS probing, is more specific. This approach, rather than identifying segments from a specific target individual, trawls through multiple individuals to find people with a specific variation of a gene. They do this by creating an artificial genotype in which all but the target area contains inert sequences, or sequences that are unlikely to match anyone. Therefore, whoever you do match, you can assume has the specific variation of the gene you are looking for.
They go on to describe a further method called “IBS baiting”, which can be used to determine if an individual has two matching alleles on each chromosome (homozygous) or whether they have two different alleles on each chromosome (heterozygous). The most basic implication of this is with the sex genes where females are homozygous (XX) and males are heterozygous (XY). Using the IBS baiting method, it would be possible to tell if a target is male or female.
The implications of this study are highly significant going forward. Over the last five years, the perception of online data has transformed. With the emergence of “Big Data”, your individual data and preferences have become a valuable commodity. However, whilst online browsing habits may reveal something about your preferences and thought processes, it is nothing compared to the vast mines of data contained within your genome.
This study demonstrates that the explosion in the market for DIY DNA testing kits has rapidly outpaced our ability to foresee the potential consequences. Millions of people have already uploaded their genetic information to various online databases. The authors were able to access sensitive data on these databases without employing any particular computer expertise. Some of the immediate consequences of the “hacking” of genetic data could include genetic discrimination. For example, insurance providers could (illegally) use genetic information to refuse coverage to high-risk patients, if they were able to access this kind of genetic data.
The study does make suggestions as to how some of these loopholes can be closed and the security of genetic data improved. The introduction of general data protection regulations in the European Union in May 2018 thrust the issue of online data protection into the limelight. This study adds a further layer to the GDPR debate and signals where the debate may be headed in the near future.
Written by Michael McCarthy
Reference: Michael D Edge, Graham Coop. Attacks on genetic privacy via uploads to genealogical databases. eLife, 2020;9:e51810 DOI: 10.7554/eLife.51810
Image by PublicDomainPictures from Pixabay