-
Bioinformatics analysis
If basecalling is not performed during live sequencing, raw sequencing data (.POD5 format) can be processed post-sequencing.
This can be achieved using the tool Dorado, which enables basecalling and subsequent alignment to a reference genome.
Dorado can also detect modified bases by using the modified-bases option (e.g.
--modified-bases 5mCG_5hmCG
). This will integrate methylation tags directly into the aligned BAM file. We also recommend applying a minimum QScore cutoff (--min-Qscore <min_QScore>
), which serves as a quality control measure to ensure only high-quality reads are used in downstream processes.1. The command below demonstrates how to initiate basecalling with Dorado, followed by sorting, and indexing the output using Samtools. Please see the Dorado documentation here for further details.
Dorado basecaller <model> <input_POD5> --reference <REF> --min-qscore <min_QScore> | samtools sort -o <OUTPUT_BAM> - && samtools index <OUTPUT_BAM>
For example to SUP basecall with 5mCG and 5hmCG detected in CpG context, and with a QScore filter of 10 we can use:
Dorado basecaller sup --modified-bases 5mCG_5hmCG input.pod5 --reference ref.fasta --min-qscore 10 | samtools sort -o output.bam > - && samtools index output.bam
2. It is also recommended to remove reads that have a poor alignment score i.e. 10. This can be achieved as follows:
samtools view -q <min_map_q> -bh -o <OUTPUT_BAM> <INPUT_BAM> && samtools index -@ <threads> <OUTPUT_BAM>
3. The output from Dorado basecaller can be demultiplexed into per-barcode BAMs using Dorado demux. E.g.
Dorado demux --output-dir <output-dir> --no-classify <input-bam>
4. You may optionally omit methylation information from read ends using modkit adjust-mods or modkit tools with
--edge-filter
option. This may help increase methylation call precision, as the very end of reads, approximately 27 bases, may suffer from loss in methylated bases due to the chemistry used to repair ends in library preparation (see our know-how document for further details).modkit adjust-mods --edge-filter 0 27 <IN_BAM> <OUTPUT_BAM>
The modified .bam file can be used with external tools that use a .bam file as input for further data analysis and exploration.
-
Post-basecalling analysis
There are several options for further analysing your basecalled data:
1. EPI2ME workflows
For in-depth data analysis, Oxford Nanopore Technologies offers a range of bioinformatics tutorials and workflows available in EPI2ME. The platform provides a vehicle where workflows deposited in GitHub by our Research and Applications teams can be showcased with descriptive texts, functional bioinformatics code and example data.
2. Research analysis tools
Oxford Nanopore Technologies' Research division has created a number of analysis tools, which are available in the Oxford Nanopore GitHub repository. The tools are aimed at advanced users, and contain instructions for how to install and run the software. They are provided as-is, with minimal support.
3. Community-developed analysis tools
If a data analysis method for your research question is not provided in any of the resources above, please refer to the Bioinformatics section of the Resource centre. Numerous members of the Nanopore Community have developed their own tools and pipelines for analysing nanopore sequencing data, most of which are available on GitHub. Please be aware that these tools are not supported by Oxford Nanopore Technologies, and are not guaranteed to be compatible with the latest chemistry/software configuration.