Browse Articles

Article|14 Sep 2023|OPEN
A telomere-to-telomere reference genome provides genetic insight into the pentacyclic triterpenoid biosynthesis in Chaenomeles speciosa
Shaofang He1,2 ,† , Duanyang Weng3 ,† , Yipeng Zhang1 ,† , , Qiusheng Kong4 , Keyue Wang1 , Naliang Jing1 , Fengfeng Li1 , Yuebin Ge5 , Hui Xiong5 , Lei Wu2 , De-Yu Xie6 , Shengqiu Feng1 , Xiaqing Yu7 , , Xuekui Wang1 , and Shaohua Shu1 , , Zhinan Mei,1
1College of Plant Science & Technology, Huazhong Agricultural University, Wuhan 430070, China
2Wuhan Carboncode Biotechnologies Co., Ltd., Wuhan 430070, China
3Sinopharm Zhonglian Pharmaceutical Co., Ltd., Wuhan 430070, China
4College of Horticulture & Forestry, Huazhong Agricultural University, Wuhan 430070, China
5School of Pharmaceutical Science, South-Central Minzu University, Wuhan 430074, China
6Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC 27695, USA
7College of Horticulture, Nanjing Agricultural University, Nanjing 210095, China
*Corresponding author. E-mail:,,
Shaofang He and Duanyang Weng,Yipeng Zhang contributed equally to the study.

Horticulture Research 10,
Article number: uhad183 (2023)
Views: 58

Received: 30 Dec 2022
Accepted: 03 Sep 2023
Published online: 14 Sep 2023


Chaenomeles speciosa (2n = 34), a medicinal and edible plant in the Rosaceae, is commonly used in traditional Chinese medicine. To date, the lack of genomic sequence and genetic studies has impeded efforts to improve its medicinal value. Herein, we report the use of an integrative approach involving PacBio HiFi (third-generation) sequencing and Hi-C scaffolding to assemble a high-quality telomere-to-telomere genome of C. speciosa. The genome comprised 650.4 Mb with a contig N50 of 35.5 Mb. Of these, 632.3 Mb were anchored to 17 pseudo-chromosomes, in which 12, 4, and 1 pseudo-chromosomes were represented by a single contig, two contigs, and four contigs, respectively. Eleven pseudo-chromosomes had telomere repeats at both ends, and four had telomere repeats at a single end. Repetitive sequences accounted for 49.5% of the genome, while a total of 45 515 protein-coding genes have been annotated. The genome size of C. speciosa was relatively similar to that of Malus domestica. Expanded or contracted gene families were identified and investigated for their association with different plant metabolisms or biological processes. In particular, functional annotation characterized gene families that were associated with the biosynthetic pathway of oleanolic and ursolic acids, two abundant pentacyclic triterpenoids in the fruits of C. speciosa. Taken together, this telomere-to-telomere and chromosome-level genome of C. speciosa not only provides a valuable resource to enhance understanding of the biosynthesis of medicinal compounds in tissues, but also promotes understanding of the evolution of the Rosaceae.