There’s a lot that we as a species could learn about ourselves by analyzing a crap-ton of people’s genomes. We could, for example, better understand how to treat health conditions with a genetic component, from cancer to depression.
Sounds great, right? But here’s the thing — in order to conduct that research, people need to be willing to share a whole lot of personal information with researchers. That might include their entire genetic code, plus any contextual information, like a medical history or dietary habits, which might prove useful for a study.
In an era where we can’t go a friggin week without some company or major database being hacked or leaked (Remember Equifax? that was just in September), it is very reasonable for people to be skeptical of sharing their information for some abstract scientific benefit.
After all, those databases would contain information about their medical past and future: risk factors for cancers and neurodegenerative diseases, and maybe a whole host other genetic conditions that a person might not want broadcasted to the world, or to potential employers.
This is the problem that a team of computer scientists and mathematicians from MIT and Cambridge think they may have helped solve. The team found a way to encrypt genetic data so that up to 23,000 people’s genetic codes could be analyzed at once while keeping them anonymous to as many people as possible.
The trick is to chop up the sensitive data and keep it on a network of separate servers, making the sensitive data more secure than if it was all in one place (no matter how secure that one place seems to be). The researchers claim that their new encryption tool could scale up and let researchers analyze up to a million genomes all at once — something that hasn’t been possible up until now because no existing system could store so much information securely.
Admittedly, staying truly anonymous while people study your genome still isn’t really possible — after all, your genome is basically a unique identifier that can’t substantively be changed — this database adds extra stopgaps so that very few people ever see a person’s entire genetic code. Researchers will be able to study whatever they’re interested without accessing more information than they need.
All in all, that means that genetic research can progress without some of the privacy and security risks that previously made it difficult to gather skeptical participants.
Ultimately, this system could mean more than just getting the public to trust researchers with their information — in general, people probably won’t read too much fine print, or ask too many questions about distributed data encryption. What this does mean is that the information that’s already out there can be handled more securely and ethically while still contributing to important discoveries, such as new basic knowledge or treatments for diseases. And if we can make progress without yet another major leak, then that’s pretty cool.