Browse Articles

Article|01 Aug 2021|OPEN
MFCIS: an automatic leaf-based identification pipeline for plant cultivars using deep learning and persistent homology
Yanping Zhang1, Jing Peng1, Xiaohui Yuan1,2, Lisi Zhang3, Dongzi Zhu3, Po Hong3, Jiawei Wang3, Qingzhong Liu3 & Weizhen Liu1,2,
1School of Computer Science and Technology, Wuhan University of Technology, Wuhan, Hubei, China
2Chongqing Research Institute, Wuhan University of Technology, Chongqing, China
3Shandong Key Laboratory of Fruit Biotechnology Breeding, Shandong Institute of Pomology, Taian, Shandong, China

Horticulture Research 8,
Article number: 172 (2021)
doi: 10.1038/hortres.2021.172
Views: 130

Received: 12 Oct 2020
Revised: 05 May 2021
Accepted: 21 May 2021
Published online: 01 Aug 2021


Recognizing plant cultivars reliably and efficiently can benefit plant breeders in terms of property rights protection and innovation of germplasm resources. Although leaf image-based methods have been widely adopted in plant species identification, they seldom have been applied in cultivar identification due to the high similarity of leaves among cultivars. Here, we propose an automatic leaf image-based cultivar identification pipeline called MFCIS (Multi-feature Combined Cultivar Identification System), which combines multiple leaf morphological features collected by persistent homology and a convolutional neural network (CNN). Persistent homology, a multiscale and robust method, was employed to extract the topological signatures of leaf shape, texture, and venation details. A CNN-based algorithm, the Xception network, was fine-tuned for extracting high-level leaf image features. For fruit species, we benchmarked the MFCIS pipeline on a sweet cherry (Prunus avium L.) leaf dataset with >5000 leaf images from 88 varieties or unreleased selections and achieved a mean accuracy of 83.52%. For annual crop species, we applied the MFCIS pipeline to a soybean (Glycine max L. Merr.) leaf dataset with 5000 leaf images of 100 cultivars or elite breeding lines collected at five growth periods. The identification models for each growth period were trained independently, and their results were combined using a score-level fusion strategy. The classification accuracy after score-level fusion was 91.4%, which is much higher than the accuracy when utilizing each growth period independently or mixing all growth periods. To facilitate the adoption of the proposed pipelines, we constructed a user-friendly web service, which is freely available at