Community

Important The DGX Station A100 has a 7.68 TB internal data storage. Data flow should be managed accordingly to best suit the needs of the user. The location and volume of the data, network connection speed as well as the basecalling model selected will all have an impact on the choice of data location and flow. Please refer to the "Data management" section of this guide.
Basecalling a single folder

Guppy ships with a "basecaller supervisor" that will launch many basecalling clients in parallel, to mitigate issues with reading .fast5 files. Each client has an ID, numbered upwards from zero, which allows their output files to be created in parallel.
To launch the supervisor, enter the following command:

guppy_basecaller_supervisor --input_path <input_folder> --save_path <output_folder> --config <config> --port ipc:///tmp/.guppy/5556 --num_clients <num_clients>
Optional action
Depending on your analysis pipeline, the following additional options may be useful:

--bam_out Output BAM files in addition to FASTQ.
--compress_fastq Compress output FASTQ files so that they become fastq.gz.
Choosing num_clients

If using the Fast basecall model, we recommend to set num_clients to 50. For all other models, set num_clients to 20.
The output folder structure will look similar to this when basecalling is complete:

--- /save_folder/ | fastq_runid_6dce0a5_client0_0_0.fastq | fastq_runid_6dce0a5_client1_0_0.fastq | fastq_runid_6dce0a5_client2_0_0.fastq | guppy_basecaller_0_log-2019-11-25_15-11-53.log | guppy_basecaller_1_log-2019-11-25_15-11-53.log | guppy_basecaller_2_log-2019-11-25_15-11-53.log | guppy_basecaller_supervisor_log-2019-11-25_15-11-53.log | sequencing_summary_0.txt | sequencing_summary_1.txt | sequencing_summary_2.txt | sequencing_telemetry_0.js | sequencing_telemetry_1.js | sequencing_telemetry_2.js
Optional action
In some cases, downstream analysis tools require merging files together.

For example, to merge FASTQ files together (change FASTQ to fastq.gz to merge gzipped FASTQ files instead), enter the following command:

cat save_folder/*.fastq > merged.fastq

To merge sequencing_summary files together, enter the following command:

awk 'NR == 1 { print }; FNR > 1 { print }' save_folder/sequencing_summary* > merged_sequencing_summary.txt
Basecalling many data folders

The most scalable way to basecall many data folders is to use a single basecall client for each data folder.
To launch multiple basecaller clients, enter the following command:

guppy_basecaller_supervisor --input_path <input_folder> --save_path <output_folder> --config <config> --port ipc:///tmp/.guppy/5556

Note: each output folder should be unique, unless using the --client_id argument.
Optional action
In addition to the options outlined above, you can use the following arguments:

--client_id <id> Append <id> as part of the output filename. For example, sequencing_summary.txt would become sequencing_summary_<id>.txt. Use this when having multiple clients output to the same folder.

Approximate basecall speeds

Below are approximate basecall speeds, in Gbases per hour, that you should be able to attain using the Guppy setup outlined above. Actual speeds will vary depending on the type of data you have: for example, shorter reads will basecall more slowly as they are less efficient to move through the basecall server and process on the GPU. Basecall speeds will also decrease if operations are requested that require additional processing time, such as barcoding or alignment.

Guppy configuration	Approximate speed, Gbases/hour; basecall supervisor with 50 clients
dna_r9.4.1_450bps_fast_prom.cfg	150
dna_r9.4.1_450bps_hac_prom.cfg	55
dna_r9.4.1_450bps_sup_prom.cfg	15

Discover nanopore sequencing

Explore products

Research areas

Investigations

Techniques

Company

News & Events

Global partners

Basecalling

Discover nanopore sequencing

Explore products

Discover nanopore sequencing

Explore products

Research areas

Investigations

Techniques

Research areas

Investigations

Techniques

Company

News & Events

Global partners

Company

News & Events

Global partners

London Calling 2024

Basecalling

Cookies Notice