Although the current human reference genome is the most accurate and complete vertebrate genome available, not one chromosome had been sequenced end-to-end and hundreds of gaps in the genome remained. Now the findings, published in Nature, have shown that accurate base-by-base sequencing of a human chromosome, and therefore of the whole human genome, is possible.
According to the US team behind the study, mapping whole chromosomes is like trying to solve an almost impossible puzzle. 'Imagine having to reconstruct a jigsaw puzzle, but each tiny piece contains less context for figuring out where it comes from. The same is true for sequencing the human genome. Until now, the pieces were too small, and there was no way to put the hardest parts of the puzzle together' said study co-leader Dr Adam Phillippy from the National Human Genome Research Institute in Bethesda, Maryland.
To precisely sequence a chromosome, Dr Phillippy and his team used a cell containing two identical X chromosomes to obtain a greater quantity of DNA for sequencing. Newly developed sequencing tools were then used to analyse long DNA segments, called 'ultra-long reads', from the chromosome. One technique, called nanopore sequencing, can detect changes in current flow as DNA molecules pass through a tiny hole, or 'nanopore', in a membrane, and avoids having to piece together short, repetitive pieces of DNA. A newly developed computer program was then deployed to assemble the many segments of generated sequences.
Dr Karen Miga, who co-led the study at the University of California, Santa Cruz Genomics Institute, then spearheaded efforts to perform a number of validation steps to check the accuracy of the sequence and complete the chromosome by filling in 29 gaps in its sequence (the largest being ~3.1 million base pairs of repetitive DNA in the middle of the chromosome, called the centromere). Lastly, a 'polishing' strategy was applied to ensure the sequence was accurate.
Completing the human genome is now of great importance. 'We're starting to find that some of these regions where there were gaps in the reference sequence are actually among the richest for variation in human populations, so we've been missing a lot of information that could be important to understanding human biology and disease,' said Dr Miga.
However, challenges remain. For example, chromosomes one and nine have repetitive DNA segments that are much longer than those found on the X chromosome. Nevertheless, the results of the study show that finishing chromosome sequences, and therefore the human genome, is now within reach.
The study was supported by the US National Institutes of Health and is part of the Telomere-to-Telomere (T2T) consortium that aims to generate a complete human genome in 2020.