A storm of criticism has rained down a paper by genome-sequencing pioneer Craig Venter that claims to predict people’s physical traits from their DNA. Reviewers and even a co-author of the paper say that it overstates the ability to use a person’s genes to identify the individual, which could raise unnecessary fears about genetic privacy.
In the paper1, published on 5 September in the Proceedings of the National Academy of Sciences (PNAS), Venter and colleagues at his company Human Longevity, Inc. (HLI), based in San Diego, California, sequenced the whole genomes of 1,061 people of varying ages and ethnic backgrounds. Using the genetic data, along with high-quality 3D photographs of the participants’ faces, the researchers used an artificial intelligence approach to find small differences in DNA sequences, called SNPs, associated with facial features such as cheekbone height. The team also searched for SNPs that correlated with factors including a person’s height, weight, age, vocal characteristics and skin colour.
The approach correctly identified an individual out of a group of ten people randomly selected from HLI’s database 74% of the time. The findings, according to the paper, suggest that law-enforcement agencies, scientists and others who handle human genomes should protect the data carefully to prevent people from being identified by their DNA alone. “A core belief from the HLI researchers is that there is now no such thing as true deidentification and full privacy in publicly accessible databases,” HLI said in a statement.
But other geneticists, having studied the paper, say that in their opinion, the claim is vastly overblown. “I don't think this paper raises those risks, because they haven’t demonstrated any ability to individuate this person from DNA,” says Mark Shriver, an anthropologist at Pennsylvania State University in University Park. In a randomly selected group of ten people — especially one chosen from a data set as small and diverse as HLI’s — knowing age, sex and race alone rules out most of the individuals, he says.
To demonstrate this, computational biologist Yaniv Erlich of Columbia University in New York City looked at the age, sex and ethnicity data from HLI’s paper. In a study2 published 6 September on the preprint server bioRxiv, he calculated that knowing only those three traits was sufficient to identify an individual out of a group of ten people in the HLI data set 75% of the time. Erlich contends that there was no need to know anything about the people’s genomes. Furthermore, he says, HLI’s reconstructions of facial structure from SNPs are not highly specific — they tend to look as much like an individual as anyone of that person’s sex and race.
Before it was published in PNAS, the paper had been submitted to Science, says Shriver who reviewed the paper for that journal. He says that HLI’s actual data are sound, and he is impressed with the group’s novel method of determining age by sequencing the ends of chromosomes, which shorten over time. But he says that the paper doesn’t demonstrate that individuals can be identified by their DNA, as it claims to. “I think it totally misrepresents what they did and what they found,” he says.
HLI said that its paper states that using multiple parameters, of which a person's face is only one, to identify someone is possible based on work with over a thousand genomes. “It heralds that prediction will become increasingly precise,” says Heather Kowalski, an HLI spokesperson. HLI stated that it stands by its methodology and acknowledged that the sample set was small. The company added that “the HLI team is working on rebuttal to criticisms by Yaniv in BioRxiv [sic]”.
Shriver says that he and Erlich pointed out their concerns to the study authors in their reviews of the paper for Science. Both Shriver and Erlich say that the journal ultimately rejected the paper. (Science does not comment on unpublished studies.) The paper was then submitted to PNAS under an option that allows a member of the US National Academies of Science, Engineering, and Medicine, such as Venter, to choose the reviewers. Two of them are information-privacy experts and the remaining reviewer is a bioethicist.
PNAS confirmed that Venter chose all three reviewers for the study. HLI declined to comment on the PNAS review process for the paper.
Jason Piper, a computational biologist and a paper co-author who now works at Apple in Singapore, agrees that the paper misrepresents the findings that he and the other co-authors produced. Piper adds that his contract with the company waived his right to approve the manuscript before it was submitted, allowing HLI to present his data however it saw fit. HLI responded to that by confirming that “authors were given an opportunity to review and comment on the paper”.
Piper has since criticized the paper heavily on Twitter and says that, in his opinion, HLI has a potential conflict of interest in encouraging restricted access to DNA databases. HLI, a for-profit company, is attempting to build the world’s largest database of human genetic information.
“I think genetic privacy is very important, but the approach being taken is the wrong one,” Piper says. “In order to get more information out of the genome, people have to share.” A more useful approach, he says, would be to find a way to make genomic data public without allowing individuals to be identified.
In response to criticisms of the paper, the company responded with a statement saying that “HLI stands by the protection of genome data and the promotion of modern solutions for data exchange”. They added that the paper was intended to spur discussion about how to share genetic information while protecting a person's privacy.
Still, Erlich is concerned that Venter’s stature gives the paper extra weight in the eyes of policymakers, who may become overly concerned about DNA privacy. “New rules and regulations are based on papers like that,” he says. “It’s important when we deal with privacy risks to get the facts right.”