Citations#

If you use ggCaller, please cite our preprint:

Horsfield, S.T., Croucher, N.J., Lees, J.A. “Accurate and fast graph-based pangenome annotation and clustering with ggCaller” bioRxiv 2023.01.24.524926 (2023). doi: https://doi.org/10.1101/2023.01.24.524926

ggCaller relies on a number of other tools. In addition, please cite:

DBG building and querying#

Bifrost: Holley, G., Melsted, P. “Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs.” Genome Biol 21(249) (2020). https://doi.org/10.1186/s13059-020-02135-8

FM-index generation and querying#

Kseq: seqtk: https://github.com/lh3/seqtk
SDSL v3: Succinct Data Structure Library 3.0

Gene scoring and overlap penalisation#

Balrog: Sommer M.J., Salzberg S.L. “Balrog: A universal protein model for prokaryotic gene prediction.” PLoS Comput Biol 17(2): e1008727 (2021). https://doi.org/10.1371/journal.pcbi.1008727
Eigen v3: Guennebaud, G., Jacob, B. et al. “Eigen v3” (2010). http://eigen.tuxfamily.org
Boost graph library: Siek, J., Lee, L.Q. & Lumsdaine, A. “Boost graph library” (2002) https://www.boost.org/doc/libs/1_79_0/libs/graph/doc/index.html

Pairwise gene comparisons#

Edlib: Šošić, M., Šikić, M. “Edlib: a C/C++ library for fast, exact sequence alignment using edit distance.” Bioinformatics 33(9) (2017). https://doi.org/10.1093/bioinformatics/btw753

Gene annotation#

DIAMOND: Buchfink B., Reuter K., Drost H.G. “Sensitive protein alignments at tree-of-life scale using DIAMOND”, Nature Methods 18:366–368 (2021). https://doi.org/10.1038/s41592-021-01101-x
HMMER3: Eddy S.R. “A New Generation of Homology Search Tools Based on Probabilistic Inference.” Genome Inform., 23:205-211 (2009).

Alignment and variant calling:#

MAFFT: Katoh, K., Misawa, K., Kuma, K. & Miyata, T. “MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.” Nucleic Acids Research. 30 (14), 3059–3066 (2002). https://doi.org/10.1093/nar/gkf436
SNP-sites: Page, A.J., Taylor, B., Delaney, A.J., Soares, J., Seemann, T., Keane, J.A. & Harris, S.R. “SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microbial genomics.” 2 (4), e000056 (2016). https://doi.org/10.1099/mgen.0.000056
RapidNJ: Simonsen, M., Pedersen, C. “Rapid computation of distance estimators from nucleotide and amino acid alignments” Proceedings of the ACM Symposium on Applied Computing (2011) https://doi.org/10.1145/1982185.1982208

Clustering and pangenome analysis#

Panaroo: Tonkin-Hill, G., MacAlasdair, N., Ruis, C. et al. “Producing polished prokaryotic pangenomes with the Panaroo pipeline.” Genome Biol 21(180) (2020). https://doi.org/10.1186/s13059-020-02090-4