-
Dorado basecalling software
Dorado is a data processing toolkit that contains Oxford Nanopore Technologies' basecalling algorithms, and several bioinformatic post-processing features. It is run from the command line in Windows, Mac OS X, and on multiple Linux platforms. A selection of configuration files allow basecalling of DNA and RNA libraries, made with Oxford Nanopore Technologies current sequencing kits, in a varied range of flow cells.
The Dorado toolkit contains:
- The basecaller: The Dorado basecaller implements a neural networks algorithm that allows raw data to be transformed into canonical bases of DNA or RNA, and several types of modified bases.
- Alignment: The user can provide a reference file in FASTA or minimap2 index format. If so, the reads are aligned against this reference via the integrated minimap2 aligner using the standard Oxford Nanopore Technologies preset parameters.
- Modified basecalling: It is possible to use Dorado to identify certain types of modified bases: currently 5mC, 5hmC, 4mC + 5mC and 6mA for DNA and m6A and pseudouridine for RNA. This requires the use of a specific basecalling model which is trained to identify both modified and unmodified bases.
GPU basecalling
Dorado is heavily-optimised for NVIDIA A100 and H100 GPUs and will deliver maximal performance on systems with these GPUs.
Dorado has been tested extensively and supported on the following systems:
Platform GPU/CPU Windows (G)V100, A100, H100 Apple M1, M1 Pro, M1 Max, M1 Ultra Linux (G)V100, A100, H100 Systems not listed above but which have NVIDIA GPUs with ≥8 GB VRAM and architecture from Volta onwards have not been widely tested but are expected to work. AWS Benchmarks on NVIDIA GPUs are available here.
-
Dorado availability
The Dorado basecalling software is available free of charge to the Nanopore Community and on GitHub. More details on installing and running the software are found in the Dorado GitHub repository.