Browse Articles

Article|11 Feb 2022|OPEN
Deep-learning-based in-field citrus fruit detection and tracking
Wenli Zhang1 , , Jiaqi Wang1 , Yuxin Liu1 , Kaizhen Chen1 , Huibin Li2 , Yulin Duan2 , Wenbin Wu2 , Yun Shi2 and Wei Guo,3 ,
1Information department, Beijing University of Technology, Beijing, 100022, China
2Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing 100081, China
3International Field Phenomics Research Laboratory, Institute for Sustainable Agro-ecosystem Services, The University of Tokyo, Tokyo 188-0002, Japan
*Corresponding author. E-mail:,

Horticulture Research 9,
Article number: uhac003 (2022)
Views: 21

Received: 20 Mar 2021
Accepted: 12 Dec 2021
Published online: 11 Feb 2022


Fruit yield estimation is crucial for establishing fruit harvest and marketing strategies. Recently, computer vision and deep learning techniques have been used to estimate citrus fruit yield and have exhibited notable fruit detection ability. However, computer-vision-based citrus fruit counting has two key limitations: inconsistent fruit detection accuracy and double-counting of the same fruit. Using oranges as the experimental material, this paper proposes a deep-learning-based orange counting algorithm using video sequences to help overcome these problems. The algorithm consists of two sub-algorithms, OrangeYolo for fruit detection and OrangeSort for fruit tracking. The OrangeYolo backbone network is partially based on the YOLOv3 algorithm, which has been improved upon to detect small objects (fruits) at multiple scales. The network structure was adjusted to detect small-scale targets while enabling multiscale target detection. A channel attention and spatial attention multiscale fusion module was introduced to fuse the semantic features of the deep network with the shallow textural detail features. OrangeYolo can achieve mean Average Precision (mAP) values of 0.957 in the citrus dataset, higher than the 0.905, 0.911, and 0.917 achieved with the YOLOv3, YOLOv4, and YOLOv5 algorithms. OrangeSort was designed to alleviate the double-counting problem associated with occluded fruits. A specific tracking region counting strategy and tracking algorithm based on motion displacement estimation were established. Six video sequences taken from two fields containing 22 trees were used as the validation dataset. The proposed method showed better performance (Mean Absolute Error (MAE) = 0.081, Standard Deviation (SD) = 0.08) than video-based manual counting and produced more accurate results than the existing standards Sort and DeepSort (MAE = 0.45 and 1.212; SD = 0.4741 and 1.3975).