-
Parameters for expert users
There are additional advanced options for expert users. Experimenting with these parameters may significantly impact the performance or accuracy of the basecaller:
Data features
- Calibration strand reference file (
--calib_reference
): Provide a FASTA file to override the reference calibration strand. - Calibration strand candidate minimum sequence length (
--calib_min_sequence_length
): Minimum sequence length for reads to be considered candidate calibration strands. - Calibration strand candidate maximum sequence length (
--calib_max_sequence_length
): Maximum sequence length for reads to be considered candidate calibration strands. - Calibration strand minimum coverage (
--calib_min_coverage
): Minimum reference coverage of candidate strand required for a read to pass calibration strand detection. - DNA Adapter trimming threshold (
--trim_threshold
): Threshold above which data will be trimmed (in standard deviations of current level distribution). - DNA Adapter trimming minimum events (
--trim_min_events
): Adapter trimmer minimum stride intervals after stall that must be seen. - DNA Adapter trimming maximum search length (
--max_search_len
): Maximum number of samples from the beginning of the read to search through for the stall. - Override automatic read scaling (
--override_scaling
): Flag to manually provide scaling parameters rather than estimating them from each read. See the--scaling_med
and--scaling_mad
options below. Note that if--ignore_scaling_from_read_files
is not set, scaling overrides will only apply to reads which did not have scaling information stored in the source file. - Manual read scaling median (
--scaling_med
): Median current value to use for manual scaling. - Manual read scaling median absolute deviation (
--scaling_mad
): Median absolute deviation to use for manual scaling. - Adapter Trimming strategy (
--trim_strategy
): Trimming strategy to apply to the raw signal before basecalling (must be one ofdna
,rna
ornone
). The adapter looks different in the signal depending on whether DNA or RNA is being basecalled, so the two cases require a different adapter trimming algorithm. This should be set automatically by the config file, and usually it is not required to set this at the command line. - RNA Adapter Trimming Window size (
--dmean_win_size
): Window size for coarse stall event detection. This parameter,–-dmean_threshold
and–-jump_threshold
are used to override how the RNA adapter trimming code operates. Generally, users should not need to change these unless they are familiar with how RNA adapter trimming works. - RNA Adapter Trimming threshold (
--dmean_threshold
): Threshold for coarse stall event detection. - RNA Adapter Trimming jump threshold (
--jump_threshold
): Threshold level for RNA stall detection. - Disable event table transmission (
--disable_events
): Flag to disable the transmission of event tables when receiving reads back from the basecall server. If the event tables are not required for downstream processing (e.g. for 1D2) then it is more efficient to disable them. - Enable poly-T/non-sequence adapter-based read scaling (
--pt_scaling
): Flag to enable polyT/adapter max detection for read scaling. This will be used in preference to read median/median absolute deviation to perform read scaling if the poly-T to non-sequence adapter current level change can be detected. - Poly-T scaling median offset (
--pt_median_offset
): Set polyT median offset for setting read scaling median (default 2.5) - Poly-T scaling range scale (
--adapter_pt_range_scale
): Set polyT/adapter range scale for setting read scaling median absolute deviation (default 5.2) - Poly-T scaling minimum adapter drop (
--pt_required_adapter_drop
): Set minimum required current drop from adapter max to polyT detection. (default 30.0) - Poly-T scaling minimum read start index (
--pt_minimum_read_start_index
): Set minimum index for read start sample required to attempt polyT scaling. (default 30) - Noisiest-section scaling maximum read size (
--noisiest_section_scaling_max_size
): Set the maximum size of a read (in samples) for which noisiest-section signal scaling is performed. For short reads, greater accuracy can be achieved by only using the noisiest section of the signal to calculate the signal median and median absolute deviation. These values are then used when scaling the read signal. Defaults to 0. - Read ID whitelist (
--read_id_list
): A filename for a text file containing a whitelist of read IDs (one per line, no whitespace). If this option is specified, Guppy will only basecall reads from the input which have read IDs that are in the read whitelist. - Barcoding configuration file (
--barcoding_config_file
): A filename from which to load the barcoding configuration, allowing users to override all barcoding parameters without specifying them at the command line. Defaults to 'configuration.cfg'. - Sample sheet (
--sample_sheet
): A filename for a MinKNOW-compatible CSV format sample sheet, containingflow_cell_id
,experiment_id
and optionallybarcode
, orinternal_barcode
andexternal_barcode
, orrapid_barcode
andfip_barcode
, used to identify a particular classification of read. Thealias
column will then be used by Guppy to rename the output files and folders based on the other classification values. Note that MinKNOW sample sheets can omit theflow_cell_id
as long as they contain aposition_id
, but to be used with Guppy, the sample sheet must contain aflow_cell_id
. - Load scaling from read files (
--load_scaling_info_from_read_files
): Flag to enable loading scaling offset and scale information from source read files, if it exists. If this flag is set, Guppy will use the stored values in the input files, instead of computing scaling values for reads. - Use quantile scaling (
--use_quantile_scaling
): When enabled, Guppy will calculate scaling values from the raw signal using quantile scaling instead of the default (med-mad). - Beam cut (
--beam_cut
): Beam score cutoff for beam search decoding. - Beam width (
--beam_width
): Beam width to use in beam search decode.
Optimisation
- Model file (
-m
or--model_file
): A path to a JSON RNN model file to use instead of the model specified in the configuration file. - Adapter scaling model file (
--as_model_file
): Path to JSON model file for adapter scaling. - Chunk size (
--chunk_size
): Set the size of the chunks of data which are sent to the basecaller for analysis. Chunk size is specified in signal blocks, so the total chunk size in samples will bechunk_size * event_stride
. - Chunk overlap (
--overlap
): The overlap between adjacent chunks, specified in signal blocks. An overlap is required for chunks to be stitched back into a continuous read. - Max chunks per runner (
--chunks_per_runner
): The maximum number of chunks which can be submitted to a single neural network runner before it starts computation. Increasing this figure will increase GPU basecalling performance when it is enabled. - Number of GPU runners per device (
--gpu_runners_per_device
): The number of neural network runners to create per CUDA device. Increasing this number may improve performance on GPUs with a large number of compute cores, but will increase GPU memory use. This option only affects GPU calling. - CPU threads per caller (
--cpu_threads_per_caller
): The number of CPU threads to create for each caller to use. Increasing this number may improve performance on CPUs with a large number of cores, but will increase system load. This option only affects CPU calling. - Stay penalty (
--stay_penalty
): Scaling factor to apply to stay probability calculation during transducer decode. - Q-score offset (
--qscore_offset
): Override the q-score offset to apply when calibrating output q-scores for the read. There is an offset and scale (see--qscore_scale
below) that are applied to the output base probabilities in the FASTQ for a basecall, to make the q-scores as close as possible to the Phred quality scores. Once a basecall model has been trained, these scores are calculated and added to the config files. - Q-score scale (
--qscore_scale
): Override the q-score scale to apply when calibrating output q-scores for the read. - Use built-in GPU kernels (
--builtin_scripts
): Set this flag to false to disable built-in GPU kernels, allowing custom kernels to be used (see--kernel_path
). - GPU Kernel source path (
--kernel_path
): Path to GPU kernel files, which will be used if--builtin_scripts
is set to false. - Number of adapter scalers (
--as_num_scalers
): Number of parallel scalers for adapter scaling. - Reads per scaler (
--as_reads_per_runner
): Maximum reads per runner for adapter scaling. - CPU threads per adapter scaler (
--as_cpu_threads_per_scaler
): Number of CPU worker threads per adapter scaler. - GPU adapter scaling runners per device (
--as_gpu_runners_per_device
): Number of runners per GPU device for adapter scaling. - Num alignment threads (
--num_alignment_threads
): Number of worker threads to use for alignment. - Num barcoding threads (
--num_barcoding_threads
): Number of worker threads to use for barcoding. - Num modified base basecaller threads (
--num_base_mod_threads
): The number of threads to use for Remora modified base detection in GPU basecalling mode. - Num read splitting threads (
--num_read_splitting_threads
): Number of worker threads to use for read splitting. - Num read splitting buffers (
--num_read_splitting_buffers
): Number of GPU memory buffers to allocate to perform read splitting. Controls level of parallelism on GPU for read splitting using mid adapter detection. - Disable pings (
--disable_pings
): Flag to disable sending any telemetry information to Oxford Nanopore Technologies. See the "Ping information" section for a summary of what is included in the Guppy telemetry. - Telemetry URL (
--ping_url
): Override the default URL for sending telemetry pings. - Ping segment duration (
--ping_segment_duration
): Duration in minutes of each ping segment. - Read batch size (
--read_batch_size
): The maximum batch size, in reads, for grouping input files. This controls the granularity at which resume can operate. Note that this value may be exceeded if individual input files contain more than this many reads. Output files for each batch will be contain a maximum of--records_per_fastq
entries. - Int8 inference mode (
--int8_mode
): Enable quantised int8 mode for kernels which support it. - Log speed frequency (
--log_speed_frequency
): How often to print out basecalling speed.
- Calibration strand reference file (
-
Overriding configuration parameters from the command-line
Guppy configuration files specify many of the optional parameters discussed previously. For example, the basecalling section of a configuration file could look like this:
Basic configuration file for ONT Guppy basecaller software.
Basecalling.
model_file = template_r9.5_450bps_5mer_raw.jsn
chunk_size = 1000
runners = 20
chunks_per_runner = 20
overlap = 50
qscore_offset = -0.06
qscore_scale = 1.16
builtin_scripts = 1
The parameters specified in the configuration file can be overwritten from the command-line by arguments of the form
--parameter value
, e. g.
guppy_basecaller --config dna_r9.5_450bps.cfg --runners 40 [other options]
Command-line parameters always take priority over config file parameters, so running Guppy with these arguments would override the
runners
setting from the config file, forcing it to 40. This facilitates small changes to parameters. Please note that no spaces are allowed in arguments, but the argument can be wrapped in quotes. For example, to run Guppy with two GPU devices, you would set the devices like so:
guppy_basecaller --device "cuda:0 cuda:1" [other options]