Software
GTDB-Tk
GTDB-Tk is available at HPC2N, users are encourage to cite GTDB-Tk and the third-party dependencies as described in References.
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.
GTDB-Tk is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes. The GTDB-Tk is open source and released under the GNU General Public License (Version 3).
On HPC2N we have GTDB-Tk available as a module on Kebnekaise.
To use GTDB-Tk, add the module to your environment. Use the command:
module spider GTDB-Tk
Read the page about modules to see how to load the required module.
The corresponding database location is predefined when loading the module, the GTDBTK_DATA_PATH environment variable points to the default database for the version of GTDB-Tk that is loaded. The mash_db file is also pre-created and is most easily referred to by using
--mash_db $GTDBTK_DATA_PATH-mash_db
Submit file example
To use GTDB-Tk in a submit file we suggest to use this as the base:
#!/bin/bash #SBATCH -A <your-project-id> #SBATCH -J <your-job-name> #SBATCH -t <hh:mm:ss> #SBATCH -c <number-of-cores-to-use> ml purge > /dev/null 2>&1 # Clean environment from outside interference ml foss/2022a GTDB-Tk/2.3.2 # Change these as per instruction from "ml spider GTDB-Tk/required-version" gtdbtk arguments --cpus $SLURM_CPUS_ON_NODE
The important part of the above submit file is the "--cpus $SLURM_CPUS_ON_NODE" argument which will make sure gtdbtk runs with the allocated number of cores.