The latest event organised by the Progress Educational Trust (PET) in partnership with Our Future Health brought together experts to explore what needs to be done – and why – to ensure that people of diverse ancestries and social backgrounds are represented in genomic data, and in health-related data more generally.
Chaired by PET director Sarah Norcross, the event was the second of three in a series focusing on large-scale biomedical databases and research resources (see BioNews 1288 and 1292). Norcross started by explaining that many research databases and biobanks that include genomic data are focussed on people with European ancestry, limiting their utility in other populations. There is a need to represent a wide range of human genomic diversity in research datasets, and to identify ways to ensure that people from diverse ancestries and with lower incomes are adequately represented.
The first speaker was Anika Ladva, head of community engagement at Our Future Health, which aims to recruit five million volunteers to create a comprehensive health data resource (see BioNews 1165). The overall goal of the project – the UK's largest health research initiative – is to develop new ways to prevent, detect and treat diseases, using genomic and other health data.
Ladva shared details of the recruitment pathway, in which individuals over the age of 18 who live near one of the project's clinics are first sent invitation letters. After reading further information on the website, people who choose to participate then register and provide consent, before completing a health questionnaire and a clinic visit where a blood sample and physical measurements are taken. Currently, Our Future Health is recruiting at clinics across England, Scotland and Wales, with plans to open clinics in Northern Ireland by the end of 2025.
Ladva emphasised the importance of developing recruitment strategies that overcome historical underrepresentation of ethnic minorities, lower socioeconomic groups and specific geographic areas. She explained that people in these groups often experience worse health outcomes – for example, individuals in lower socioeconomic groups have shorter life expectancy, and black men are more likely to develop prostate cancer.
Ladva outlined pilot initiatives to improve diversity, including walk-in clinics, reimbursement for participation costs and community engagement efforts. She stressed there was no 'one-size-fits-all' approach, and that challenges remain in engaging Bangladeshi, Pakistani, Black African, Black Caribbean and other communities. She said that future efforts would focus on three areas: enhancing awareness of health research and continuing to tailor messages to different communities; reducing practical barriers to participation including simplification of information; and strengthening community connections.
The next speaker, Sasha Henriques – a principal genetic counsellor at Guys and St Thomas's NHS Foundation Trust, and a researcher at Wellcome Connecting Science and at the Wellcome Sanger Institute – described the structural inequalities embedded in genomic data. She explained that issues with the inconsistent application of labels such as race, ethnicity, and genetic ancestry can perpetuate biases and exclusions within datasets. Other factors include a historical bias towards using datasets from European populations, for example if research has been carried out in institutions that have the best reputations.
Henriques carried out an ethnographic study, speaking to senior group leaders and scientists to further understand how they classify and categorise data. She found that sometimes, scientists were told not to use particular terms to describe certain groups, but did not fully understand the reasons why. She also found that groups may be defined in a particular way, for statistical analyses, that does not do justice to the complexities of most people's ancestry. A further issue is the lack of funding for additional recruitment needed to increase data diversity. Henriques is aiming to develop a framework rooted in the principles of social justice, which she hopes will empower scientists to 'think more broadly about leading the way in rethinking how we define our populations'.
Professor Segun Fatumo, chair of genomic diversity at Queen Mary University, of London highlighted the key role of African genomic data in accurate genetic risk prediction and advances in precision medicine. He gave the example of protective genetic variants identified in some individuals with African ancestry that are associated with lower levels of 'bad' cholesterol, and that contributed to the development of cholesterol-lowering drugs called PCSK9 inhibitors. He also explained that there is greater genetic diversity in African populations than in other populations, to the extent that there is more genetic variation between different populations in Africa than there is between African and European populations, or between African and Asian populations. As a consequence, clinically relevant findings from genetic studies in a non-African population might not be generalisable to African populations.
Professor Fatumo highlighted the severe underrepresentation of African genomes in global studies (see BioNews 1207), often overshadowed by African-American data, which poorly represents continental African diversity. He said that Africa is home to individuals from 3,000 different ethnic groups, who speak more than 2,000 different languages, and shared findings from a study carried out in Uganda that identified 9.5 million novel genetic variants. He explained that using African genomic data improves the predictive power and relevance of polygenic risk scores PRS for conditions such as diabetes and heart disease in African populations, and emphasised the need for more extensive, localised genomic data to be gathered through initiative such as the Nigerian 100K Genome Project.
The final speaker – Cornell Tech researcher Dr Divya Shanmugam – discussed the intersection of artificial intelligence (AI), machine learning and health equity. She highlighted data on atrial fibrillation diagnoses, which suggests that public health analyses underestimate the prevalence of disease among racial minorities, and so machine learning systems trained to predict these diagnoses will systematically underdiagnose racial minorities.
Dr Shanmugam argued that equitable AI systems depend on both diverse and high-quality data. She called for more granular race data collection and transparency in training and evaluating models, cautioning that health equity requires careful balancing of technological advances with ethical and systemic considerations. Without recognising existing structural inequalities, machine learning models risk perpetuating and exacerbating health disparities.
A wide-ranging discussion followed, exploring the need for greater diversity in genomic and health data collection and ways in which this could be achieved, in order to advance equitable precision medicine and health research.
PET is grateful to Our Future Health for supporting this event.





