Mendle logo
Log In
Mendle logo
Log In
Glossary
byGoEnomics
Privacy PolicyImprint
Mendle logo
Log In
Mendle logo
Log In
blog

BUSCO completeness myths debunked

5 May 2025 · 11 min read
M
Martin Kollmar
Share
BUSCO completeness myths debunked

The BUSCO completeness check assesses the completeness and quality of a genome, transcriptome, or proteome assembly by searching for highly conserved single-copy orthologs that are expected to be present in a given lineage. Completeness is assessed by analysing presence versus absence (‘missing‘), quality is assessed by analysing complete versus partial matches (‘fragments‘). By providing data sets for specific taxonomic lineages, BUSCO helps to assess evolutionary completeness. From an evolutionary perspective, you would expect that a ‘single-copy gene‘ in a more general taxon should also be a single-copy gene in a more specific taxon.

overlap of busco lineages

Accordingly, the data sets of the more general taxa should have (and do have) fewer genes, and the expectation would be that the data sets of the more specific taxa would contain all the genes of the more general taxa. However, the latter is not the case. There is a large overlap between the datasets, but the datasets of the more general taxa also contain genes that are not present in the datasets of the more specific taxa.

overlap of busco lineages

The example shows mammalian lineages, but the same is true for any comparison of more general and more specific lineages.

Does this matter at all? It is important not to compare apples with oranges. This happens when the data set used and the version of the data set are not specified. A ‘duplicated’, ‘fragmented’ or ‘missing’ gene in the analysis with e.g. the mammalian dataset is not the same ‘duplicated’, ‘fragmented’ or ‘missing’ gene in the analysis with a dataset of a sublineage. The completeness values of major lineages and more specific lineages are not comparable or consecutive.

Tags:

Related Posts

Why should you use tools that generate random predictions for functional annotations?
blog

Why should you use tools that generate random predictions for functional annotations?

M
Martin Kollmar
BUSCO completeness myths debunked, part 3
blog

BUSCO completeness myths debunked, part 3

M
Martin Kollmar
BUSCO completeness myths debunked, part 2
blog

BUSCO completeness myths debunked, part 2

M
Martin Kollmar
What’s the Smallest Gene in Your Body?
blog

What’s the Smallest Gene in Your Body?

M
Martin Kollmar
Functional genome annotation, dos and don’ts
blog

Functional genome annotation, dos and don’ts

M
Martin Kollmar
Decoding the genetic code with cognate tRNA genes
blog

Decoding the genetic code with cognate tRNA genes

M
Martin Kollmar