Horticulture Research

Browse Articles

Letter to the Editor|28 Feb 2023|OPEN

The high-quality genome of lotus reveals tandem duplicate genes involved in stress response and secondary metabolites biosynthesis

Huanhuan Qi¹ , Feng Yu¹ , Jiao Deng^1,2 and Liangsheng Zhang³ ^, , Pingfang Yang,¹ ^,

¹State Key Laboratory of Biocatalysis and Enzyme Engineering, School of Life Sciences, Hubei University, Wuhan 430026, China
²Research Center of Buckwheat Industry Technology, Guizhou Normal University, Guiyang 550001, China
³College of Agriculture & Biotechnology, Zhejiang University, Hangzhou 310058, China
*Corresponding author. E-mail: zls83@zju.edu.cn,yangpf@hubu.edu.cn

Horticulture Research 10,
Article number: uhad040 (2023)
doi: https://doi.org/10.1093/hr/uhad040
Views: 1924

Received: 29 Dec 2022
Accepted: 23 Feb 2023
Published online: 28 Feb 2023

Abstract

Dear Editor,

Lotus (Nelumbo nucifera Gaertn.), which has been domesticated and cultivated for several thousands of years and endowed with religious and cultural symbolism [1], belongs to the Nelumbo genus Nelumbonaceae family. As an early eudicot, it is not only essential for plant phylogeny but also widely used as a vegetable, a medicinal herb, and for ornamental use. It contains abundant functional compounds, such as flavonoids and benzylisoquinoline alkaloids (BIAs), which are used to treat diverse diseases. A high-quality genome assembly is necessary to facilitate its breeding and efficient usage. Until now, two versions of the ‘China Antique’ (CA, wild lotus germplasm) genome have been released, which were assembled mainly through Illumina short-reads and annotated by transcriptome of short-reads data (CA v1, CA v2) [2, 3]. Recently, the genome of a cultivar ‘Taikonglian No.3’ (TK) has also been assembled [4]. Here based on 163.29 Gb PacBio long reads (N50 = 31.66 kb) data, the genome of CA was re-assembled using FALCON (v1.8.1) (https://github.com/PacificBiosciences/FALCON), with a total size of 817.9 Mb and contig N50 of 44.31 Mb (Fig. 1A). Additionally, a total of 90.23 Gb of clean Hi-C data was used to anchor the genome sequence through ALLHiC (https://github.com/tangerzhang/ALLHiC), and 807.37 Mb (98.7%) was anchored onto eight chromosomes (Table S1, see online supplementary material). The whole genome consisted of 70 scaffolds with N50 being 110.63 Mb (Fig. 1A)，of which eight were assembled as chromosomes, while the other 62 scaffolds could not be unambiguously assembled onto any of the eight chromosomes. Hereafter, it is named CA v3. About 99.29% of the Illumina reads were successfully aligned to CA v3, with a coverage of 99.83%. Benchmarking Universal Single-Copy Orthologs pipelines (BUSCO) demonstrated that 98.82% of the 1614 expected embryophytic genes could be aligned to the CA v3 genome. Furthermore, the Long Terminal Repeat (LTR) Assembly Index (LAI) value [5] and mapping rate of transcriptome data in 7 tissues were higher in CA v3 (Fig. 1A; Table S2, see online supplementary material). Compared to CA v2 and TK genomes, CA v3 had the best continuity and completeness (Fig. 1A).