An analysis of 60,706 unrelated human exomes has identified more than 3230 genes with propensity to cause disease, as well as revealing that 183 of 192 variants previously linked to inherited diseases are not actually harmful.
'This is the deepest anyone has gone for any substantial part of the [human] genome,' said Jay Shendure of the University of Washington in Seattle, who was not involved in the research. The study, published in Nature, collated DNA sequence data from over two dozen disease-specific studies, and identified over 7.4 million DNA variants.
Exomes are regions of the genome which code for proteins, among numerous other biological functions. In humans, the exome comprises less than two percent of total genome, but these regions are where the majority of severe disease-causing mutations are found.
The collection of 60,706 exomes used in this study were gathered and analysed by the Exome Aggregation Consortium (ExAC), a collaboration of researchers from the Broad Institute of MIT and Harvard University. Although the data had been made publicly available back in 2014, this was the first study to analyse the exome variants.
Researchers discovered variants spaced at an average of eight bases in sections of the genome where variation is more likely. These findings could be used to identify the most likely disease-causing mutations by compare how common a patient's genetic variant is to how common their disease is.
The efficacy of the size of the dataset was demonstrated in that some variants were identified repeatedly, which would not have been observed in smaller datasets. The ExAC dataset is 10-fold larger than any prior exome database.
'The scale and diversity of the ExAC resource is invaluable,' Professor MacArthur, one of the ExAC principal investigators, said. 'It gives us the ability to discover extremely rare variants and offers an unparalleled window into the roots of rare genetic diseases.' More than half of the 7.4 million variants discovered are seen only once per 60,000 people.
The study also identified 3230 genes where harmful mutations would be more likely to contribute to disease, even if a normal copy of the gene was also present. 72 percent of these genes have never been linked to any known diseases.
Another study published earlier this year in the Genetics in Medicine journal, in which Professor MacArthur was also involved, used the ExAC collection as reference dataset for 7855 cases of cardiomyopathy, a group of inherited heart diseases previously linked to over 60 causative genes. The reanalysis revealed that many of these genetic variations are unlikely to contribute to cardiomyopathy.
The ExAC dataset has been compiled from exomes of individuals of African, East Asian, European, Latino and Southeast Asian ancestry. In the future, researchers plan to include data from underrepresented populations, including those from the Middle East and parts of Africa. The ExAC dataset plans to approximately double in size by this year's American Society of Human Genetics meeting in October.