-
Additional resources for analysing your ASFV data
Note: Information subject to change, please refer to No part gets left behind: Tiled nanopore sequencing of whole ASFV genomes stitched together using Lilo by Amanda Warr et al., 2021 for most recent updates.
-
Bioinformatic processing of ASFV genomes sequenced with shotgun sequencing
The shotgun sequencing data were basecalled and demultiplexed using MinKNOW (v19.06.8) using Fast basecalling.
Following basecalling the reads were aligned to an ASFV genome using minimap2 to identify ASFV reads, the .fast5 files for these reads were extracted using fast5_subset from the ont_fast5_api and these again using high accuracy basecalling (this reduces basecalling time, suitable for when working with lower spec laptops/computers that have low GPU capacity).
The reads were assembled with Flye (v2.6) - additional Flye resources available following the link: Assembly of long, error-prone reads using repeat graphs and polished three times with Medaka (v0.7.1).
Comparisons of quantity of data produced and the proportion of which were ASFV reads were done using NanoComp (v1.28.1) - additional Nanopack resources available following the link: NanoPack: visualizing and processing long-read sequencing data. -
Bioinformatic processing of ASFV genomes from tiled amplicons with Lilo
The data were basecalled and demultiplexed using Guppy (v5.0.14) using high or super accuracy model on a GPU.
The snakemake pipeline Lilo was used, taking the following steps:
- Use Porechop (v0.2.3) to remove any sequencing adapters or barcodes that have made it through demultiplexing.
- Align to a reference with minimap2 (v2.22) and samtools (v1.12) and separate reads into amplicons by alignment position with bedtools (v2.30.0).
- Select reads of the expected amplicon length (+/-5%) and subset to 300X
- Select the read with highest average base quality within +/-1% of the median length of reads for the amplicon to be the “reference” using bioawk v1 and remove any amplicons with fewer than 40 reads (targeting the median length allows for flexibility for large insertions or deletions).
- Pool amplicon reads and references back into their original non-overlapping pools.
- Polish the pools three times with Medaka (v1.4.4) and combine resulting polished amplicons.
- Align to the reference with minimap2 and remove soft clipped bases (these likely represent missed barcodes or adapters).
- Run porechop to remove primers from the amplicons.
- Merge the amplicons with scaffold_builder (v2.3).
The required input to Lilo are demultiplexed reads in FASTQ format in a directory named “raw/”, a reference FASTA, a .bed file of primer alignments (as output by primal scheme), and a .csv of primer sequences (if there are ambiguous bases it is advised to expand them first) and a config file, described on the GitHub page. It is adaptable to any species (with a single genome fragment/chromosome) with any tiled primer scheme. The pipeline outputs a FASTA file containing the assembled genome.
-
ARTIC assemblies
A subset of genomes were also assembled using the ARTIC pipeline (v1.2.1) following the bioinformatics SOP using the Medaka method.
-
Quality control of assembled genomes
Quast (v5.0.2) was used to compare the assembled genomes to the most closely related publicly available ASFV assembly according to BLAST alignment (MN715134.1) - links and additional resources available following the link: Short and Long-Read Sequencing Survey of the Dynamic Transcriptomes of African Swine Fever Virus and the Host Cells.
Samples where both WGS and tiled sequencing were used were compared for overall structure using nucmer (v4.0.0beta2) - links and aditional reources available following the link: MUMmer4: A fast and versatile genome alignment system. -
Phylogeny
The phylogeny analysis was limited to the tiled genomes, as these were the most accurate assemblies, and publicly available genomes. These were aligned using Mafft (v7.467) and maximum likelihood trees constructed using iqtree (v2.0.5).