-
Ensure you have installed the latest version of Guppy. To perform basecalling and methylation calling using Remora, open a terminal and enter the following commands:
guppy_basecaller \
-i {input_fast5s} -s {output_folder} \
-c dna_r9.4.1_450bps_modbases_5mc_cg_hac.cfg \
--align_ref {reference_fasta} \
--device auto \
--compress_fastq --bam_out --recursive \
--num_callers 5 --cpu_threads_per_caller 4
-
Concatenate all BAM files output by Guppy into one:
samtools cat {output_folder}/pass/*bam | \
samtools sort -@ 8 -o {out_bam} -
-
Index the merged BAM file:
samtools index -@ 8 {out_bam}
This will create a single sorted and indexed BAM file ({out_bam}) that contains canonical bases as well as per-read modifications and can be loaded into IGV. To visualise the per-read modification calls in IGV, load the BAM file and set "colour reads as" to "modifications".This BAM file can be used to check the on-target coverage achieved during the Reduced Representation Methylation Sequencing (RRMS) run:
mosdepth -x -t 8 -n -b {target_bed} {prefix} {out_bam}
-
To create strand-specific and strand-aggregated methylation frequencies for all genomic positions (CpGs), run:
modbam2bed -m 5mC -e --cpg -t 8 -a 0.2 -b 0.8 \
--aggregate --prefix {prefix} \
{reference_fasta} \
{out_bam} > {out_mod_bed}
This will create two BEDMETHYL files: one will report methylation frequencies per genomic position and per strand, the second file will include the prefix specified and will report methylation frequencies by aggregating calls from the forward and reverse strand. The tool can be found in the following repository: https://github.com/epi2me-labs/modbam2bed
-
Filter reference CpG positions without canonical or modified calls (e.g. deletions from the reference) and genomic positions without calls from both strands:
cat {prefix}_cpg.acc.bed | csvtk filter2 -H -t \
-f '$11 != "nan" && $6 != "+" && $6 != "-"' > {out_mod_bed_agg_filt}
-
Convert the BEDMETHYL file to a TSV file that is compatible with the DMR tool DSS:
awk -v OFS='\t' 'BEGIN{print "chr","pos","N","X"}{print $1,$2,($12+$13),$13}' {out_mod_bed_agg_filt} > {out_mod_bed_agg_filt_DSS}
-
Convert the BEDMETHYL file to a BEDGRAPH file that will be used for obtaining the BIGWIG format useful for IGV visualisation:
awk -v OFS='\t' '{print $1,$2,$3,$11}' {out_mod_bed_agg_filt} | \
sort -k1,1 -k2,2n > {out_mod_bed_agg_filt_bedgraph}
bedGraphToBigWig {out_mod_bed_agg_filt_bedgraph} {reference_chrSize} {out_mod_bed_agg_filt_bigwig}
-
Repeat the above steps for all your samples.
For detection of differentially methylated regions use DSS as described here: https://bioconductor.org/packages/release/bioc/vignettes/DSS/inst/doc/DSS.html
-
Benchmarking results
For information about benchmarking the performance of RRMS, please see our RRMS performance document.