-
Ensure you have installed the latest version of Dorado. To perform basecalling and methylation calling using Remora, open a terminal and enter the following commands:
dorado basecaller {models_path}/dna_r10.4.1_e8.2_400bps_hac@v4.2.0 \
--modified-bases-models {models_path}/dna_r10.4.1_e8.2_400bps_hac@v4.2.0_5mCG_5hmCG@v2 \
--device cuda:all \
--reference {reference} {input_folder} | samtools view -e '[qs] >= {qscore_filter}' \
--output {out_pass_bam} \
--unoutput {out_fail_bam}
- We recommend using the high accuracy model (hac) for RRMS sequencing runs. However, if using the super accurate model (sup), ensure you are utilizing the correct model in the above command.
- We recommend setting the qscore filter to 10.
-
Optional actionAlternatively, basecalling can be performed using the wf-human-variation nextflow pipeline:
Please note this method should only be used by experienced users.
nextflow run https://github.com/epi2me-labs/wf-human-variation \
--fast5_dir {input_pod5s} \
--dorado_ext pod5 \
--basecaller_cfg dna_r10.4.1_e8.2_400bps_hac@v4.2.0 \
--remora_cfg dna_r10.4.1_e8.2_400bps_hac@v4.2.0_5mCG_5hmCG@v2 \
--basecaller_basemod_threads 4 \
--cuda_device 'cuda:all' \
--out_dir {output_folder} \
--ref {reference} -profile singularity \
--sample_name {sample_name} \
--basecaller_chunk_size 150
Note: For non-local executions we recommend using the following setting for profile:
-profile singularity,discrete_gpus
Additional options can also be enabled to perform relevant downstream analysis:
--mod
--snp
--sv
To use these options we recommend providing the target .bed file using the option:
--bed
For more information on the wf-human-variation nextflow pipeline please visit the wf-human-variation github page. -
Index the merged BAM file:
samtools index -@ 8 {out_pass_bam}
This will create a single sorted and indexed BAM file ({out_pass_bam}) that contains canonical bases as well as per-read modifications and can be loaded into IGV. To visualise the per-read modification calls in IGV, load the BAM file and set "colour reads as" to "modifications".This BAM file can be used to check the on-target coverage achieved during the Reduced Representation Methylation Sequencing (RRMS) run:
mosdepth -x -t 8 -n -b {target_bed} {prefix} {out_pass_bam}
-
To create strand-aggregated methylation frequencies for all genomic positions (CpGs), run:
modkit pileup --cpg --combine-strands --bedgraph \ --threads 10 --prefix {prefix} \ --ref {reference_fasta} \ {out_bam} {out_folder}
The traditional preset will report CpG methylation frequencies by aggregating calls from the forward and reverse strand. A different bedgraph file will be created for each of the modifications present, in this case 5mC and 5hmC. The tool can be found in the following repository: https://github.com/nanoporetech/modkit.
-
Convert the BEDGRAPH file to a TSV file that is compatible with the DMR tool DSS:
awk -v OFS='\t' 'BEGIN{print "chr","pos","N","X"}{print $1,$2,$5,($4*$5)}' {out_folder}/{prefix}_m_CG0_combined.bedgraph > {out_mod_bed_agg_filt_DSS}
-
Obtain BIGWIG format useful for IGV visualisation:
bedtools sort -i {out_folder}/{prefix}_m_CG0_combined.bedgraph | cut -f 1-4 > {out_folder}/{prefix}_m_CG0_combined_sort.bedgraph bedGraphToBigWig {out_folder}/{prefix}_m_CG0_combined_sort.bedgraph {reference_chrSize} {out_mod_bed_agg_filt_bigwig}
-
Repeat the above steps for all your samples.
For detection of differentially methylated regions use DSS as described here: https://bioconductor.org/packages/release/bioc/vignettes/DSS/inst/doc/DSS.html
-
Benchmarking results
For information about benchmarking the performance of RRMS for human samples, please see our RRMS performance document.