According to a recent report, many scientists and researchers have raised a complaint against the default autocorrect system on Microsoft Excel. Many scientists from genetic departments have been suffering from headaches due to excel autocorrect. They reported that the Excel autocorrect has been linked with the errors. Around one in five genetic journal papers contains errors on the program.
They found that the excel autocorrect program converts most of the genetic symbols or gene names wrongly into dates or numerical values. For example, SEPT2(Septin-2) has been converted into a date format like “September 2”. It also happens with MARCH 1 (Short form of Membrane Associated Ring Finger(C3HC4)1, E3 Ubiquitin Protein Ligase).
On the other hand, there are some researchers who published their journal papers in Genome Biology, says that the issue can be solved by formatting the Excel columns as text or one can use google sheets, where the gene names are stored exactly as they were given.
Quantifying Excel Autocorrect
In 2016, Mark Ziemann and his colleagues from Australia tried to resolve the excel autocorrect problem. Ziemann and his team found that around one in five top genomic journal papers contained gene name conversion errors in Excel spreadsheets.
Despite taking the issue into consideration and steps taken to fix the problem are still rife. Based on the analysis by Ziemann and his team, they found that around 11,000 articles published between 2014 to 2020 contained gene name errors.
Ziemann, a researcher at computational reproducibility in genetics, Australia, says that even a simple cross-check can help to detect autocorrect errors, but without those checks, the error can pile up based on the volume of data in spreadsheets.
How to Avoid Excel Autocorrect Mistakes?
Well, one of the ways to avoid autocorrect mistakes is to stop using spreadsheets. There are many spreadsheet tools such as LibreOffice and Gnumeric. You can use these tools as they have no issues and they are hard to audit.
Many computational biologists prefer to use scripted computer languages like python and R, as they don’t autocorrect the gene symbols. You can also get to know the source of errors. However, the user must be aware of the computer languages so that it will be easy for them to write the code and analyze the data.
If you are not familiar with such computer programs and languages, you can do a quick check before publishing or sharing the data.