-
POD5 output
POD5 is an Oxford Nanopore-developed file format which stores nanopore data in an accessible way and replaces the legacy .fast5 format. This output also reads and writes data faster, uses less compute and has smaller raw data file size than .fast5.
For more information about the POD5 schema and contents, refer to POD5 file format.
-
.fast5 output
.fast5 is a legacy file type that is used to write out nanopore sequencing data, and can still be selected as an output type in MinKNOW. .fast5 is a type of HDF5 file, which is designed to contain all information needed for analysing nanopore sequencing data and tracking it back to its source. Read .fast5 files contain raw sequencing data for each read, with a default of 4000 reads per file.
For more information about the .fast5 schema and contents, refer to the Oxford Nanopore Technologies .fast5 API.
-
Default read file location
Windows
C:\data\
Mac OS X
/Library/MinKNOW/
Linux
/var/lib/minknow/
-
Intermediate folder
The files in the intermediate folder store unprocessed raw signal data. Once raw signal processing is complete, POD5 or .fast5 files are generated and stored in the tmp folder, where local basecalling can proceed. These files are removed as processing proceeds or at the end of the run.
If the system encounters an issue, such as running out of space, the unprocessed data will not be cleared and will remain in the intermediate folder. Due to the real-time streaming nature of the system, this data cannot be processed after the run is stopped.
-
FASTQ output
FASTQ files are text files that contain sequence data for each read, and associated per-base quality scores. FASTQ files can be generated in MinKNOW, Dorado, and Guppy. The default is to write out 4000 reads per FASTQ file, although this number is configurable.
A single read sequence in a FASTQ file is described in four lines:
- Line 1 begins with a '@' and is followed by a header containing information about the sequencing run.
- Line 2 is the basecalled sequence (using A, C, T, G and N).
- Line 3 contains a '+'.
- Line 4 encodes the per-base quality scores for the sequence in Line 2.
An example of a FASTQ file is shown below:
@75be78f7-bd62-4972-92d2-aba16f465b0d runid=ff83cfafb0cb3bfc28ac370b841f59798ab3d63a sampleid=RB02_lambda_ovn1 read=19343 ch=53 start_time=2019-12-23T13:44:31Z
CGGTATTACTTCGTTCAGTTTCGGACAGGTGTTTTAACC[...]TCGTACCTAT
+
'%+-($&&&&'(':+7)-%(&$$.%##))868;;87/9;[...]68(*(2)/%$
-
BAM output
BAM files are output by MinKNOW and stand-alone Guppy software if alignment has been performed on the basecalled dataset. BAM files are also output when using the modified base models in MinKNOW and Dorado.