-
Recommendations for assembly
Flye is recommended as an assembly tool for Kit 12 genome assembly (https://github.com/fenderglass/Flye).
We have observed that assembling haplotypes separately significantly improves genome contiguity, although each assembly only uses half the data.
-
There are three Flye parameters that we recommend are tuned for the best performance with Kit 12 sequence data:
- Configuring the command line parameter
--min-overlap 10000
should deliver a modest improvement in assembly contiguity when using libraries optimised for read length. - It is recommended that the
--nano-corr
parameter is set (to specify that the sequences are "corrected"). This provides a significant improvement to assembly NG50 compared to when the--nano-raw
(uncorrected sequence) setting is used. We have observed NG50 increases from 58 Mb to 67 Mb for collapsed assemblies, when assembling both haplotypes at once. - We typically adjust the "
asm_corrected_reads.cfg
file in theflye/config/bin_cfg/
folder to increase haplotype-specific assembly NG50s and to remove any major misjoins.
a. enable homopolymer compressed scoring (
hpc_scoring_on = 1
)
b. increase theminimizer_window
to 10
c. decrease therepeat_graph_ovlp_divergence
to 0.005 increases haplotype-specific assembly NG50s to 84 Mb/84 Mb and removes all major misjoins - Configuring the command line parameter