Citations#

If you use ggCaller, please cite our preprint:

Horsfield, S.T., Croucher, N.J., Lees, J.A. “Accurate and fast graph-based pangenome annotation and clustering with ggCaller” bioRxiv 2023.01.24.524926 (2023). doi: https://doi.org/10.1101/2023.01.24.524926

ggCaller relies on a number of other tools. In addition, please cite:

DBG building and querying#

FM-index generation and querying#

Gene scoring and overlap penalisation#

Pairwise gene comparisons#

Gene annotation#

  • DIAMOND: Buchfink B., Reuter K., Drost H.G. “Sensitive protein alignments at tree-of-life scale using DIAMOND”, Nature Methods 18:366–368 (2021). https://doi.org/10.1038/s41592-021-01101-x

  • HMMER3: Eddy S.R. “A New Generation of Homology Search Tools Based on Probabilistic Inference.” Genome Inform., 23:205-211 (2009).

Alignment and variant calling:#

  • MAFFT: Katoh, K., Misawa, K., Kuma, K. & Miyata, T. “MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.” Nucleic Acids Research. 30 (14), 3059–3066 (2002). https://doi.org/10.1093/nar/gkf436

  • SNP-sites: Page, A.J., Taylor, B., Delaney, A.J., Soares, J., Seemann, T., Keane, J.A. & Harris, S.R. “SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microbial genomics.” 2 (4), e000056 (2016). https://doi.org/10.1099/mgen.0.000056

  • RapidNJ: Simonsen, M., Pedersen, C. “Rapid computation of distance estimators from nucleotide and amino acid alignments” Proceedings of the ACM Symposium on Applied Computing (2011) https://doi.org/10.1145/1982185.1982208

Clustering and pangenome analysis#