Over 31,000 new transcripts found in blood cells from non-European populations are absent from current human gene reference maps.
Transcripts are the RNA molecules produced from DNA, which can differ between individuals, cells and tissues. A team of scientists in Barcelona, Spain, studied the RNA of 43 individuals across eight populations – Yoruba (Nigeria), Luhya (Kenya), Mbuti (Congo), Han Chinese, Indian Telugu, Peruvians in Lima, Ashkenazi Jewish and Utah Europeans – and published their findings in Nature Communications.
'Most gene sequencing so far has come from European individuals, so the reference catalogues we rely on may be missing genes or transcripts that exist only in non-European populations,' said Professor Roderic Guigó from the Centre for Genomic Regulation at the Barcelona Institute of Science and Technology and senior co-author of the study. 'If a genetic variant falls in one of these missing genes, we assume it has no biological effect. In some cases, that assumption may simply be wrong,' he added.
The researchers used long-read RNA sequencing to study the blood cells, which allowed them to read the entire length of the transcripts. This is in contrast to previous methods, which have used short fragments to piece together the data.
Most novel transcripts were found to belong to protein-coding genes, of which the researchers predict 41 percent lead to new proteins that have never been described before. They further noted that some of the transcripts come from novel gene loci, and some were found only in one of the populations studied.
These discoveries demonstrate that the interpretation of genetic data from current human gene reference maps may be biased towards European populations, as they are missing genes and transcripts from non-European populations.
The authors say that these newly found differences may help to understand population-level disease susceptibility, with some of the genetic expression differences uncovered in this study found in genes linked to the immune system and autoimmune diseases. However, the authors note that this could be a result of studying blood cells, which are highly genetically variable. Yet, it still may also help uncover why certain diseases are more common or act differently in some populations compared to others.
The researchers further recognise that this study focuses on a small fraction of human populations and only on a single cell type, and thus call on the scientific community to use this discovery as an inspiration to map the complete human pantranscriptome, the set of all genes and transcripts found in the human species. One such project – the human pangenome – was developed using the full genomes of 47 individuals, with 24 from Africa and 16 from the Americas (see BioNews 1189).
'We hope our study serves as a foundation and an invitation for the global scientific community to contribute data, methods, and diverse populations. Only through a collective effort will we achieve a truly complete and inclusive map of human biology, which is essential for fair and accurate genomic medicine,' said Dr Marta Melé from the Barcelona Supercomputing Centre and senior co-author of the study.

