-
Recommended pipeline analysis
The wf-artic is a bioinformatics workflow for the analysis of ARTIC sequencing data prepared using the Midnight protocol. The bioinformatics workflow is orchestrated by the Nextflow software. Nextflow is a publicly available and open-source project that enables the execution of scientific workflows in a scalable and reproducible way. The use of the Nextflow software has been integrated into the EPI2ME Labs software that we recommend for running our downstream analysis methods.
Alternative methods for downstream analysis are available using your device terminal or command line, however we only suggest this for experienced users.
Demultiplexed sequence reads are processed using the ARTIC Field Bioinformatics software that has been modified for the analysis of FASTQ sequences prepared using Oxford Nanopore Rapid Sequencing kits. The other modification to the ARTIC workflow is the use of a primer scheme that defines the sequencing primers used by the Midnight protocol and their genomic locations on the SARS-CoV-2 genome.
The wf-artic workflow includes other analytical steps that include cladistic analysis using Nextclade and strain assignment using Pangolin. The data facets included in the report are parameterised and additional information such as plots of depth-of-coverage across the reference genome is optional.
The complete source for wf-artic is linked, and the Nextflow software will download the scripts and logic flow from this location.
The wf-artic workflow needs to be started manually as outlined below in 'Running a Midnight analysis using EPI2ME Labs'.
-
Software set-up and installation
The EPI2ME Labs application provides a clean interface to accessing bioinformatics workflows, and is our recommended method in performing your post-sequencing analysis.
Follow the instructions in the EPI2ME Labs Installation guide to install the application on your device.
For more information on how to use EPI2ME Labs, refer to the EPI2ME Labs Quick Start guide.
-
Installing and updating the wf-artic workflow in EPI2ME Labs:
Ensure you have installed the wf-artic workflow prior to the first analysis set-up.
In the EPI2ME Labs home page, scroll down to the "Install workflows" section and click on epi2me-labs/wf-artic:
If you have already installed the wf-artic workflow, ensure you are using the latest version.
Updating the workflow can be done directly through EPI2ME Labs by navigating to the wf-artic workflow page and clicking Update Workflow:
-
Demultiplexing of multiple barcoded samples
The wf-artic analysis requires FASTQ sequence data that has already been demultiplexed.
Reads will be demultiplexed during sequencing if you are following the recommended "Required settings in MinKNOW". However, demultiplexing can also be done post-sequencing using the MinKNOW software.
For more information and guides on demultiplexing using MinKNOW, refer to the "Post-run analysis" section in our MinKNOW Protocol.
The expected input for wf-artic is a folder of folders as shown below. Each of the barcode folders should contain the FASTQ sequence data and files may either be uncompressed or gzipped.
$ tree -d MidnightFastq/
MidnightFastq/
├── barcode01
├── barcode02
├── barcode03
├── barcode04
├── barcode05
├── barcode06
└── unclassified
-
Running a Midnight analysis using EPI2ME Labs
-
Open the EPI2ME Labs application on your device.
-
Open the "Workflows" tab in the EPI2ME Labs application and click on the "wf-artic" workflow:
-
In the "wf-artic" workflow page, select "Run this workflow" to open analysis set-up:
-
Complete the wf-artic run set-up:
Select your data input file location. Please note, this folder must contain the demultiplexed FASTQ files of your sequencing run.
Expand the Primer Scheme Selection tab and set the Scheme version to Midnight-ONT/V3.
Expand the Advanced Options tab and set the Medaka model to the basecalling model used in your sequencing run.
Expand the Extra configuration tab and set the Run name for your wf-artic analysis.
Click Launch workflow at the bottom of the page to begin your analysis.
-
Completed analysis and result files
The wf-artic analysis outputs will be written to the Working Directory folder specified in the EPI2ME Labs Settings tab.
The location of this folder is specified in the wf-artic run Instance parameters preceeded byout_dir
.However, these files can also be accessed directly in the EPI2ME Labs application from the completed analysis page for your run:
These outputs include:
all_consensus.fasta
A multi-FASTA format sequence file containing the consensus sequence for each of the samples investigated. This consensus sequence has been prepared for the whole SARS-CoV-2 genome, not just the spike protein region. The consensus sequence masks the non-spike regions and regions of low sequence coverage with N residues.all_variants.vcf.gz
A gzipped VCF file that describes all high-quality genetic variants called by medaka from the sequenced samples.all_variants.vcf.gz.tbi
An index file for the gzipped VCF file.consensus_status.txt
A tab delimited file that reports whether a consensus sequence has been successfully prepared for a sample, or not.wf-artic-report.html
A report summarising these data. This HTML format report also includes the output of the Nextclade software that can be used for a visual inspection of, for example, primer drop out or other qualitative consensus sequence aspects.
Other files are included in the
work-directory
. This includes per sample VCF files of all genetic variants prior to filtering and other sequences. -
Housekeeping and disk usage
The "Working Directory" can be specified in the EPI2ME Labs "Settings" tab and defines where the workflow intermediate files and outputs are stored.
This folder will accumulate a significant number of files that correspond to raw BAM files, other larger intermediates and analysis results files. We recommend this folder to be routinely cleared.