New study explores loopholes in the protection and safety of your genetic information.
Although some areas of technology don’t appear to be evolving as rapidly as we might have expected, DNA sequencing is not among them. The Human Genome Project, which completed the first the full sequencing of the human genome took 13 years and was finally completed in 2003.
Now, less than 20 years later, do-it-yourself home DNA kits are widely available. Direct-to-consumer (DTC) genetic testing has become big business, a multimillion-dollar industry. Combined with the creation of online databases, it has allowed individuals to sequence their own genes and search databases for possible relatives. A new study by researchers at the University of California, Davis explores just how secure these online treasure troves of genetic information are.
These databases match individuals to relatives using a process called identity-by-descent (IBD). The process identifies segments of DNA that match between two individuals, indicating a likely familial relationship. The length of these matching segments usually indicating the strength of the relationship too (i.e. more closely related individuals will have more matching segments). Some of the shorter matching segments are referred to as identical-by-state (IBS), which, depending on length may indicate likely IBD or a more distant shared ancestry.
A problem, however, arises in that searching databases does not just identify who you might be related to, it also tells you how and where your genomes match. If you know your own genetic sequence (from the testing kit) and I tell you which segments match someone else’s, I am by default, telling you bits of their genotype. The DTC genetics companies would argue that people agree to this when signing up and that the matching serves as another layer of protection where closer relatives see more of your information, more distant ones see less, and strangers see nothing at all.
There are two approaches that could be taken to revealing this genetic information. The first involves using real genotype data available through resources like the 1000Genomes Project, and uploading them to various databases. The second approach involves designing artificial datasets tailored to identify the genotypes of users at specific sites of interest.
In this study, the authors published two examples of how they used real online databases to gather genetic information. The first method they labelled “IBS tiling”. According to the study, IBS tiling “involves uploading many real genotypes in order to identify genotype information from many regions in many people.” By stringing together the matching segments, they are able to obtain significant genetic data on that target.
The second method they describe, IBS probing, is more specific. According to the study, IBS probing “involves uploading a dataset containing a long haplotype that includes an allele of interest, creating matches at this locus.” Unlike IBS tiling, IBS probing looks at a specific area of interest, for example, risk alleles for Alzheimer’s disease.
They go on to describe a further method called IBS baiting, which “involves uploading fake datasets with long runs of heterozygosity to induce phase-unaware methods for IBS calling to reveal genotypes.”
The implications of this study are highly significant going forward. Over the last five years, the perception of online data has transformed. With the emergence of “Big Data”, your individual data and preferences have become a valuable commodity. However, whilst online browsing habits may reveal something about your preferences and thought processes, it is nothing compared to the vast mines of data contained within your genome.
This study demonstrates that the explosion in the market for DIY DNA testing kits has rapidly outpaced our ability to foresee the potential consequences. Millions of people have already uploaded their genetic information to various online databases. The authors were able to access sensitive data on these databases. Some of the immediate consequences of the “hacking” of genetic data could include genetic discrimination. For example, insurance providers could (illegally) use genetic information to refuse coverage to high-risk patients, if they were able to access this kind of genetic data.
The study does make suggestions as to how some of these loopholes can be closed and the security of genetic data improved. The introduction of general data protection regulations in the European Union in May 2018 thrust the issue of online data protection into the limelight. This study adds a further layer to the GDPR debate and signals where the debate may be headed in the near future.
Written by Michael McCarthy
Reference: Michael D Edge, Graham Coop. Attacks on genetic privacy via uploads to genealogical databases. eLife, 2020;9:e51810 DOI: 10.7554/eLife.51810
Image by PublicDomainPictures from Pixabay