-
Basecalling a single folder
Guppy ships with a "basecaller supervisor" that will launch many basecalling clients in parallel, to mitigate issues with reading .fast5 files. Each client has an ID, numbered upwards from zero, which allows their output files to be created in parallel.
-
To launch the supervisor, enter the following command:
guppy_basecaller_supervisor --input_path <input_folder> --save_path <output_folder> --config <config> --port ipc:///tmp/.guppy/5556 --num_clients <num_clients>
-
Optional actionDepending on your analysis pipeline, the following additional options may be useful:
--bam_out
Output BAM files in addition to FASTQ.
--compress_fastq
Compress output FASTQ files so that they become fastq.gz. -
Choosing num_clients
If using the Fast basecall model, we recommend to set
num_clients
to 50. For all other models, setnum_clients
to 20. -
The output folder structure will look similar to this when basecalling is complete:
--- /save_folder/
| fastq_runid_6dce0a5_client0_0_0.fastq
| fastq_runid_6dce0a5_client1_0_0.fastq
| fastq_runid_6dce0a5_client2_0_0.fastq
| guppy_basecaller_0_log-2019-11-25_15-11-53.log
| guppy_basecaller_1_log-2019-11-25_15-11-53.log
| guppy_basecaller_2_log-2019-11-25_15-11-53.log
| guppy_basecaller_supervisor_log-2019-11-25_15-11-53.log
| sequencing_summary_0.txt
| sequencing_summary_1.txt
| sequencing_summary_2.txt
| sequencing_telemetry_0.js
| sequencing_telemetry_1.js
| sequencing_telemetry_2.js
-
Optional actionIn some cases, downstream analysis tools require merging files together.
For example, to merge FASTQ files together (change FASTQ to fastq.gz to merge gzipped FASTQ files instead), enter the following command:
cat save_folder/*.fastq > merged.fastq
To merge
sequencing_summary
files together, enter the following command:awk 'NR == 1 { print }; FNR > 1 { print }' save_folder/sequencing_summary* > merged_sequencing_summary.txt
-
Basecalling many data folders
The most scalable way to basecall many data folders is to use a single basecall client for each data folder.
-
To launch multiple basecaller clients, enter the following command:
guppy_basecaller_supervisor --input_path <input_folder> --save_path <output_folder> --config <config> --port ipc:///tmp/.guppy/5556
Note: each output folder should be unique, unless using the
--client_id
argument. -
Optional actionIn addition to the options outlined above, you can use the following arguments:
--client_id <id>
Append<id>
as part of the output filename. For example,sequencing_summary.txt
would becomesequencing_summary_<id>.txt
. Use this when having multiple clients output to the same folder. -
Approximate basecall speeds
Below are approximate basecall speeds, in Gbases per hour, that you should be able to attain using the Guppy setup outlined above. Actual speeds will vary depending on the type of data you have: for example, shorter reads will basecall more slowly as they are less efficient to move through the basecall server and process on the GPU. Basecall speeds will also decrease if operations are requested that require additional processing time, such as barcoding or alignment.
Guppy configuration Approximate speed, Gbases/hour; basecall supervisor with 50 clients dna_r9.4.1_450bps_fast_prom.cfg 150 dna_r9.4.1_450bps_hac_prom.cfg 55 dna_r9.4.1_450bps_sup_prom.cfg 15