-
Guppy basecall server
Guppy includes an additional executable called
guppy_basecall_server
which provides basecalling as a network-enabled service. The basecall server may be useful in situations where a set of compute resources such as GPUs need to be shared between several concurrently-running basecalling clients. It is enables client applications to perform basecalling by communicating with the server via the ZMQ socket interface. ONT products which support multiple flow cells typically use Guppy in a server configuration in order to share the embedded GPUs between all flow cells.The server is launched as follows:
guppy_basecall_server --config <config file> --log_path <log file folder> --port 5555 [--allow_non_local] [--use_tcp]
The basecall server requires a basecalling config file, like the stand-alone basecaller. It also requires a
--log_path
to be specified, which will be used to output the server execution log. The final required parameter is--port
, which specifies the path to a local Unix socket file (on supported systems) or the socket port number on which the server will listen for connections. The--port
parameter may also be set toauto
, in which case the server will generate a path in the system temporary folder for socket file connections or provide an available port number for TCP connections.
guppy_basecall_server --config <config file> --log_path <log file folder> --port auto
To force the use of a TCP connection, pass the optional flag
--use_tcp
on the command line (this flag has no effect on unsupported platforms, e.g. Windows). By default, the server only listens for TCP connections on the localhost interface. The optional flag--allow_non_local
is used to permit connections to the server from addresses other than localhost - this flag also implies--use_tcp
on supported systems.On startup the server will output something similar to the following:
ONT Guppy basecall server software version 3.0.3+7e7b7d0
config file: /opt/ont/guppy/data/dna_r9.4.1_450bps_fast.cfg
model file: /opt/ont/guppy/data/template_r9.4.1_450bps_fast.jsn
log path: /tmp
chunk size: 1000
chunks per runner: 48
max queued reads: 2000
num basecallers: 1
num socket threads: 1
gpu device: cuda:0
kernel path:
runners per device: 2
Starting server on port: 5555
or:
ONT Guppy basecall server software version 3.0.3+7e7b7d0
config file: /opt/ont/guppy/data/dna_r9.4.1_450bps_fast.cfg
model file: /opt/ont/guppy/data/template_r9.4.1_450bps_fast.jsn
log path: /tmp
chunk size: 1000
chunks per runner: 48
max queued reads: 2000
num basecallers: 1
num socket threads: 1
gpu device: cuda:0
kernel path:
runners per device: 2
Starting server on port: ipc:///tmp/ae89-314a-54a4-be67
The server may take several seconds to fully launch, but once the "Starting server" line is output, the server is ready for connections.If the server fails to start due to being improperly configured, it will exit with exit code 2, and details about what went wrong will be output to the log file. Some examples of things that could go wrong include:
- Required command line parameters were not provided.
- The configuration file does not exist, or it specifies a model file that does not exist.
- The CUDA device specified does not exist, or is unavailable.
- The CUDA device does not have enough memory to support the requested configuration.
- The path that the log files should be written to cannot be accessed for writing.
In general, any automated software that is responsible for starting the server should check for a return code of 2, and if present, this means that subsequent attempts to start the server with the same input parameters will also fail if the problem is not addressed.
If the server crashes due to an exception being thrown within the software, details of the error will appear in the logs. In this case, the return code will be 1. Any other return codes (other than 0, which indicates normal shutdown) will indicate that the server has crashed in a way that may have prevented any information about the nature of the error being logged properly.
Once the server is running, it can be used to basecall by running the Guppy basecall client application. This is exactly the same as launching the Guppy basecaller application locally, except a socket file path or connection port is specified:
guppy_basecall_client --input_path reads --save_path output_folder/basecall --config dna_r9.4.1_450bps_fast.cfg --port ~/my_socket_files/socket1
Note that socket files only permit local connections.To use a TCP socket connection, add the
--use_tcp
flag in the same way as when launching the server:
guppy_basecall_client --input_path reads --save_path output_folder/basecall --config dna_r9.4.1_450bps_fast.cfg --port 5555 --use_tcp
If only a port is specified to the Guppy basecall client as above, Guppy will assume the server is running on the local host. However, it is also possible to specify an address or hostname:
guppy_basecall_client --input_path reads --save_path output_folder/basecall --config dna_r9.4.1_450bps_fast.cfg --port 192.168.0.64:5555 --use_tcp
or
guppy_basecall_client --input_path reads --save_path output_folder/basecall --config dna_r9.4.1_450bps_fast.cfg --port my_basecall_server:5555 --use_tcp
In this case, the connection can be made to a remote server. Note that to allow connections from clients specified in this way, the server must be launched with the--allow_non_local
command-line flag. If the server was launched with--allow_non_local
, the client must use--use_tcp
, even if this flag was not passed to the server.Note: Basecalling performance may be compromised by network bandwidth when using a remote server. It is possible for multiple clients to connect to a basecall server simultaneously and the server will distribute processing resource between them using a fair queuing system.
Note: Read trimming and file output will be performed on the client, so any parameters to control those steps must be specified when launching the client, not the server.
-
Basecall server-specific parameters
To start the basecall server you will need to specify the path for logging.
- Logging path (
--log_path
): The path to the folder to save a basecall log. The logs contain all the messages that are output to the terminal, plus additional informational messages. For example, the log will contain a record for each input file which is loaded and each file which is written out. Any error or warning messages generated during the run will also go in the log, which can be used for diagnosing problems. If the user specifies the--verbose
flag, an additional verbose log file is written out. Thelog_path
is only set for the server (as it has no other output files), but theguppy_basecaller
app also emits logs, which go into thesave_path
- Maximum queue size (
--max_queued_reads
): Maximum number of reads to queue per client. When running in client/server mode, the client will load files from disk and send them immediately to the server for basecalling. If the client can load and send reads faster than the basecaller can process them, queued reads will pile up on the basecall server, increasing memory consumption. To avoid this problem,--max_queued_reads
specifies a maximum number of reads that an individual client can have in flight on the server at once. This has a default value of 2000, which is sufficient for MinION Mk1B and GridION setups with a single client attached. When running multiple clients, the number should be reduced to prevent excessive memory usage. - Allow non-local connections (
--allow_non_local
): By default the server will only accept connections from clients on localhost. Pass this flag to allow incoming connections on other interfaces. - High-priority read threshold (
--high_priority_threshold
): Number of high-priority chunks to process for each medium-priority chunk. The default is 10. - Medium-priority read threshold (
--medium_priority_threshold
): Number of medium-priority chunks to process for each low-priority chunk. The default is 4. - Maximum IPC message block size (
--max_block_size
): Maximum block size (in samples) of messages. Reads over the maximum size will be sent in multiple parts. The default is 256,000. - Number of threads for IPC message handling (
--ipc_threads
): Number of threads to use for inter-process communication. The default is 2.
- Logging path (
-
Basecall client-specific parameters
- Server connection hostname and port (
-p
or--port
): Specify a hostname and port for connecting to basecall service (ie 'myserver:5555'), or port only (ie '5555'), in which case localhost is assumed. This is the port used to communicate between the basecall client and server. The client and server both need to use the same port number or they will not be able to connect to each other. - Client ID (
--client_id
): An identifier for the Guppy Client instance. If supplied, this identifier will be included in any files output by the Guppy Client. This may be used to guarantee unique filenames in the case that multiple Guppy Client processes are writing to the same output folder. This can used when there are multiple Guppy clients processing reads at the same time. To avoid the clients overwriting each other’s files, giving each one a unique client ID will allow it to label its output files with the ID and make them unique per client. - Client connection timeout (
--conn_timeout_ms
): Connection timeout in milliseconds before the server considers the client as disconnected. Set to zero to disable the server auto-disconnecting the client. - Max server read failure count (
--max_server_read_failures
): Maximum times to try resending in-flight reads when the server repeatedly crashes. - Server file loading timeout (
--server_file_load_timeout
): Timeout in seconds to wait for the server to load a requested data file (e.g. a basecalling model or alignment index). This may need increasing if very large alignment references are being requested. The default is 180s.
- Server connection hostname and port (
-
Guppy basecall supervisor
Guppy includes an executable called
guppy_basecaller_supervisor
.A single
guppy_basecaller_client
will struggle to read files fast enough to supply aguppy_basecall_server
especially in a multiple GPU system. To improve the GPU utilisation, it is necessary to have multiple clients connecting to the basecall server. Theguppy_basecall_supervisor
application is provided to simplify the process of connecting multiple clients to a server while all reading from the same input location and writing to the same save path.This supervisor application ensures that:
- All files from the input location are distributed amongst the child basecaller clients, and
- Each client is launched with a unique client_id guaranteeing all files written to the save folder will be uniquely named.
Once all basecaller clients have completed the supervisor exits, a return code of zero indicating success.
Usage
The basecaller supervisor is launched with exactly the same parameters as the Guppy basecaller running in client mode, but with the addition of a
--num_clients
parameter.For example, to launch three Guppy basecallers running in client mode all processing the same input location and writing to the same save location:
(The following assumes that the basecall server has already been launched and is listening on TCP port 5555)
guppy_basecaller_supervisor --num_clients 5 --input_path reads --save_path ./save_folder/ --config dna_r9.4.1_450bps_fast.cfg --port 5555 --use_tcp
Note: the output will be written by each client individually and is not merged. In particular, it is worth noting that there will be one sequencing summary per client.
Depending on your requirements some further processing may be necessary in order to merge the sequencing summary files.
Example output files:
--- /save_folder/
| fastq_runid_6dce0a5_client0_0_0.fastq
| fastq_runid_6dce0a5_client1_0_0.fastq
| fastq_runid_6dce0a5_client2_0_0.fastq
| guppy_basecaller_0_log-2019-11-25_15-11-53.log
| guppy_basecaller_1_log-2019-11-25_15-11-53.log
| guppy_basecaller_2_log-2019-11-25_15-11-53.log
| guppy_basecaller_supervisor_log-2019-11-25_15-11-53.log
| sequencing_summary_0.txt
| sequencing_summary_1.txt
| sequencing_summary_2.txt
| sequencing_telemetry_0.js
| sequencing_telemetry_1.js
| sequencing_telemetry_2.js
Command-line configuration arguments
Any configuration parameters currently passed to theguppy_basecaller
, e.g.--num_callers
,--ipc_threads
,--gpu_runners_per_device
,--chunks_per_runner
, etc., should also be suitable for theguppy_basecall_supervisor
as these will be directly forwarded to the clients.To choose an optimum value for the
--num_clients
parameter, some trial and error is necessary, for example start withnum_clients 1
and increase until no further benefit is noticed. The output from the supervisor may well be useful in determining this as it reports the samples/second, i.e.
Caller time: 5405 ms, Samples called: 186589921, samples/s: 3.45217e+07
For more detailed metrics, the--progress_stats_frequency
argument can be used, although this reports bases called/second as opposed to samples. Below is some sample output withprogress_stats_frequency 5
Found 38 input read files to process.
Processing ...
[PROG_STAT_HDR] time elapsed(secs), time remaining (estimate), total reads processed, total reads (estimate), interval(secs), interval reads processed, interval bases processed, bases/sec
[PROG_STAT] 5.00439, 10.8428, 12, 38, 5.00439, 12, 66073, 13203.0
[PROG_STAT] 10.0091, 8.10263, 21, 38, 5.00466, 9, 61161, 12220.8
[PROG_STAT] 15.0133, 1.76627, 34, 38, 5.0041, 13, 71410, 14270.3
[PROG_STAT] 17.1152, 0, 38, 38, 2.10173, 4, 35785, 17026.5
Caller time: 17530 ms, Samples called: 2157249, samples/s: 123060
All instances of guppy_basecaller completed successfully.
Notes- The intended usage is that the supervisor will be running clients that connect to a server, therefore it is necessary to supply the
--port
argument. - If the Guppy basecall server was launched with the
--use_tcp
and/or--allow_non_local
options then--use_tcp
should also be supplied when launching the supervisor. - Since the child Guppy basecall clients are using a server for the actual basecalling, the
--device
argument should NOT be supplied.