Around one-fifth of scientific papers involving genomic data contain errors caused by the default settings in Microsoft Excel, according to a study.
Under its default settings, the spreadsheet software Excel is known to convert shortened gene names into dates and numbers. For example, the gene symbol for Membrane-Associated Ring Finger, MARCH1, is automatically converted to 01-03-2016 by default, and the gene SEPT2 is altered to 'September 2'.
After scanning 3597 scientific papers published between 2005 and 2015 from 18 different genomics journals, researchers writing for the academic institute Baker ID, in Melbourne, Australia identified 704 papers with gene name errors in their supplementary data sheets that were caused by Excel conversions.
Professor Assam El-Osta, lead author of the paper, explained: 'The errors were found specifically on the supplemental data sheets of academic studies'. He added supplemental pages contain 'important supporting data, rich with information', and that resolving these errors could be 'time-consuming'.
Notably, leading journals such as Nucleic Acids Research, Nature Genetics and Genome Biology – the journal in which the study was published – had the highest proportion (more than 20 percent) of errors. In contrast, less than 10 percent of papers from journals including Molecular Biology and Evolution, Bioinformatics, DNA Research, and Genome Biology and Evolution were affected.
The scientific community first mentioned the gene renaming errors a decade ago, but the problem has not been resolved. The authors of the paper reported an increase in errors by an annual rate of 15 percent, which has occurred over the past five years and has brought into question the thoroughness of the peer-review process.
Speaking to BBC News, Dr Ewan Birney, director of the European Bioinformatics Institute, said: 'What frustrates me is researchers are relying on Excel spreadsheets for clinical trials', adding that the Excel gene renaming issue has been known among the scientific community since 2004. He recommended that the program should only be considered for 'lightweight scientific analysis'.
The study reports that gene renaming errors also affected other spreadsheets including LibreOffice Calc and Apache OpenOffice Calc, but not, apparently, Google Sheets. The researchers conclude that for now issues can be avoided if reviewers, editors, and authors remain vigilant.
A spokesperson for Microsoft Excel told BBC News: 'Excel offers a wide range of options, which customers with specific needs can use to change the way their data is represented.'
Leave a Reply
You must be logged in to post a comment.