Bugs in Bioinformatics Software: What They Are and Why You Should Care

9 Jan 2026 · 31 min read

Martin Kollmar

In an ideal world, bioinformatics tools would always produce correct results. In reality, nearly all non-trivial software contains bugs — even widely used tools. A general rule from software engineering is that typical codebases contain 15 to 50 errors per 1000 lines of code, and many of those won’t crash the program or trigger warnings but may silently produce incorrect results (Steve McConnell, author of Code Complete and Software Estimation: Demystifying the Black Art).

In bioinformatics this matters especially because workflows are multistep, data-dependent, and often opaque to users who treat tools as black boxes. When a bug or unexpected behavior exists at an early stage of analysis (alignment, assembly, annotation, variant calling), its effects can compound through the entire pipeline, skewing downstream conclusions without any obvious error flagging.

Here are concrete cases with documented impacts:

NCBI BLAST: Unexpected Parameter Behavior BLAST is probably the most cited sequence alignment tool in genomics. A widely reported issue involves the parameter -maxtargetseqs. Many users assumed it would return the top N best hits for a query sequence. In fact, in some versions BLAST returns the first N hits above a threshold rather than the highest scoring ones. This behavior depends on how the search database is ordered and can lead to inconsistent or biologically misleading hits across runs. It was reported as a bug and continues to be a source of confusion.

Impact: Workflows that rely on “best hit” assumptions — e.g., taxonomic assignments or functional inference — can produce different results for the same data just because of database order or BLAST version. That directly undermines reproducibility.

ExaML Phylogenetic Bugs under Specific Settings In evolutionary biology, tools like ExaML are used to compute phylogenies (trees) and support values (e.g., bootstraps). Independent work has identified real bugs in ExaML’s branch length scaling and bootstrap replicate generation under certain options.

Impact: Phylogenetic trees are widely used to infer evolutionary relationships and dates. Errors here can propagate into comparative analyses, dating estimates, and any downstream interpretation that treats the inferred tree as ground truth.

Genome Assembly Duplication Errors A recent study found widespread false gene duplications in genome assemblies caused by algorithms that didn’t handle heterozygosity or sequencing artefacts correctly. These algorithmic issues aren’t always “bugs” in the strict sense of crashing code, but they are incorrect results produced by tools unless users intervene appropriately.

Impact: Gene family expansion or contraction is often central to evolutionary inference, biomarker discovery, and functional genomics. False duplications can mislead those interpretations.

User-Written Scripts and Indexing Errors In everyday practice, many labs glue together custom scripts for tasks like quality trimming or variant filtering. A common bug type is off-by-one indexing or mis-handling of genomic coordinate systems (e.g., mixing 0-based and 1-based formats). Because such scripts often lack formal testing, they can quietly drop or mis-place important data, affecting allele calls, feature counts, and more.

Impact: These bugs often escape peer review because they happen in unpublished code. They can make results irreproducible if someone else repeats the workflow with slightly different input.

What about software bugs in genome annotation? I do not single out any specific tools, but anyone interested can contact me for concrete examples. During our time in academia, we tested most of the widely used annotation software and pipelines, and virtually all of them showed reproducible flaws. Some tools even produce different results depending on the operating system, which is almost never reported in publications and creates serious reproducibility problems when combined with changing software versions. Other issues include incorrect output formatting and errors in the parsing and summarization of raw results into final reports. How many people really go back to the raw data and check what those summaries actually say? Most papers just take the summary numbers at face value, which is exactly how shaky or inflated conclusions slip through.

Conclusion

The examples above show three ways bugs affect reproducibility and downstream analysis:

Silent Errors versus Obvious Crashes Many bugs don’t crash the program but produce plausible-looking but incorrect results. That’s the worst kind because a dataset can go through analysis, get written into a paper, and be cited — all while being wrong.

Version Dependence and Environmental Differences Small changes in software version, compiler behavior (e.g., floating-point handling), or database order can change outcomes. Even mathematically identical code may produce different results on different machines because of floating-point math and optimization differences. For complex bioinformatics workflows, that means the same workflow on different systems isn’t guaranteed to give the same results.

Cascade Effects Through Pipelines Modern analyses chain many tools (e.g., quality control → alignment → variant calling → annotation). According to statistical models of pipeline error rates, if each component i has some probability of error P₁, the overall chance of a downstream error increases dramatically as the number of components grows. Early bugs can steer all subsequent steps down the wrong path.

Tags: