In Brief
A project out of China's National Health and Medicine Big Data Nanjing Center will sequence one million Chinese genomes. This project is an effort to identify population trends and the genetic basis of health disorders.

Genetic Database

Authorities in the city of Nanjing, the capital of China’s Jiangsu province, announced at the end of October that they would sequence the genes of 1 million individuals to build a genetic database of Chinese residents. This project is part of the National Health and Medicine Big Data Nanjing Center, a new data storage facility under construction in the region.

China News reports that the project will focus on population genetics, including identifying genes linked to cancer and rare and chronic diseases, as well as genes linked to brain development in children and the effects of environment on genetics. Four experts, both from China and abroad, will advise the project, including Harvard geneticist George Church — a pioneer of CRISPR/Cas9 gene editing and leader of the project to resurrect the woolly mammoth.

The Sanger Center, where large-scale human genome sequencing began in 1999. (Image Credit: the Sanger Institute)
The Sanger Center, a contributor to the Human Genome Project, where large-scale human genome sequencing began in 1999. Image Credit: the Sanger Institute Wellcome Trust

“When the facilities are ready, the designed capacity for DNA sequencing will be up to 400,000 to 500,000 samples per year,” Lan Qing, deputy director of the provincial health and family planning commission, told China News.

The Big Data Nanjing center will have a capacity of 52 petabytes, enough to contain the health records of 80 million individuals and videos from 174 of the province’s hospitals, Yicai Global reports. The center has already connected with several institutions, including Fudan University, Nanjing Medical University, and Peking Union Medical College Hospital.

The Spread of Sequencing

It took over a decade and $2.7 billion to sequence the very first human genome, completed by the Human Genome Project in 2003. Since then, genetic sequencing has rapidly become both cheaper and faster. An increasing number of companies and institutions are developing new methods of reading our most fundamental code, including small devices that can rapidly sequence partial genomes in the field.

According to Motherboard, two aspects of genome sequencing contribute to its cost: the “cost per megabase of DNA sequence,” or the cost to produce one ‘megabase,’ or a million base pairs of DNA, and the cost of actually sequencing a human-sized genome, which contains about 3,000 megabases, which includes the cost of running the computer programs that perform the sequencing.

Around January of 2008, the price of both of these aspects sharply declined with the introduction of next-generation sequencing technology, also known as high-throughput sequencing. While there are several platforms available, all of them sequence millions of small fragments in parallel and then use a complete genome as a reference to map where these fragments should go. Such technology can sequence an entire human genome in just a day.

However, all of this rapid and cheap genetic data is fairly useless if geneticists can’t pinpoint the genes — and common errors within them — that cause health issues. Enormous genetic databases can be extremely valuable to doctors and drug companies seeking cures, as the commercial sequencing company 23andMe has already found by monetizing their database.

As such, projects like this one in Nanjing could be essential to finding the genetic basis needed to diagnose and treat some of the worst diseases that continue to plague our species.