GTDB-Tk

Software name: 
GTDB-Tk
Policy 

GTDB-Tk is available at HPC2N, users are encourage to cite GTDB-Tk and the third-party dependencies as described in References.

General 

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

Description 

GTDB-Tk is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes. The GTDB-Tk is open source and released under the GNU General Public License (Version 3).

Availability 

On HPC2N we have GTDB-Tk available as a module on Kebnekaise.

Usage at HPC2N 

To use GTDB-Tk, add the module to your environment. Use the command:

module spider GTDB-Tk

Read the page about modules to see how to load the required module.

 

The corresponding database location is predefined when loading the module, the GTDBTK_DATA_PATH environment variable points to the default database for the version of GTDB-Tk that is loaded. The mash_db file is also pre-created and is most easily referred to by using

--mash_db $GTDBTK_DATA_PATH-mash_db

Submit file example

To use GTDB-Tk in a submit file we suggest to use this as the base:

#!/bin/bash
#SBATCH -A <your-project-id>
#SBATCH -J <your-job-name>
#SBATCH -t <hh:mm:ss>
#SBATCH -c <number-of-cores-to-use>

ml purge > /dev/null 2>&1  # Clean environment from outside interference
ml foss/2022a GTDB-Tk/2.3.2  # Change these as per instruction from "ml spider GTDB-Tk/required-version"

gtdbtk arguments --cpus $SLURM_CPUS_ON_NODE

The important part of the above submit file is the "--cpus $SLURM_CPUS_ON_NODE" argument which will make sure gtdbtk runs with the allocated number of cores.

Updated: 2024-10-10, 12:39