-
Overview
The NVIDIA DGX Station A100 has 7.68 TB of internal data storage. Data flow should be managed accordingly to best suit the needs of the end users. The location and volume of the data, network connection speed as well as the basecalling model selected will all have an impact on optimal data flow.
-
Data volumes
100 Gbases of sequencing data in .fast5 format typically occupies approximately 1 Tbyte of storage. Variables such as read length will alter this ratio.
Gbases .fast5 storage Gbytes FASTQ storage Gbytes 50 500 50 100 1000 100 200 2000 200 -
Basecalling speeds based on data location
For the fastest basecalling speed, .fast5 data should be stored locally on the DGX Station A100. Depending on the basecalling model used and the networked storage available, users could consider basecalling data from networked storage. The table below compares the % basecalling speed vs model speed for basecalling data from local SSD storage when Guppy is run with the suggested parameters.
Note that these benchmarks are for the previous model of the DGX Station A100 (160 GB). Benchmarks for the 320 GB model will be released soon.
dna_r9.4.1_450bps_fast_prom dna_r9.4.1_450bps_hac_prom dna_r9.4.1_450bps_sup_prom DGX Station A100 local SSD storage 100% 100% 100% High performance enterprise storage 16.2% 98.8% 99.3% Basic fibre networked storage (e.g. Synology) 23.7% 99.6% 97.3% -
Data flow for basecalling post-run from the PromethION
If you choose to basecall from .fast5 files locally on the DGX Station A100 to reduce time copying .fast5 data, we recommend creating only a temporary copy of the .fast5 data on the DGX Station A100 and deleting the .fast5 data after basecalling. Move only the analysed (FASTQ) data off the station.