High-Quality Construction of Agricultural Pest and Disease Image Datasets: A Review
Journal Title: Smart Agriculture - Year 2023, Vol 5, Issue 3
Abstract
[Significance] The scientific dataset of agricultural pests and diseases is the foundation for monitoring and warning of agricultural pests and diseases. It is of great significance for the development of agricultural pest control, and is an important component of developing smart agriculture. The quality of the dataset affecting the effectiveness of image recognition algorithms, with the discovery of the importance of deep learning technology in intelligent monitoring of agricultural pests and diseases. The construction of high-quality agricultural pest and disease datasets is gradually attracting attention from scholars in this field. In the task of image recognition, on one hand, the recognition effect depends on the improvement strategy of the algorithm, and on the other hand, it depends on the quality of the dataset. The same recognition algorithm learns different features in different quality datasets, so its recognition performance also varies. In order to propose a dataset evaluation index to measure the quality of agricultural pest and disease datasets, this article analyzes the existing datasets and takes the challenges faced in constructing agricultural pest and disease image datasets as the starting point to review the construction of agricultural pest and disease datasets. [Progress] Firstly, disease and pest datasets are divided into two categories: private datasets and public datasets. Private datasets have the characteristics of high annotation quality, high image quality, and a large number of inter class samples that are not publicly available. Public datasets have the characteristics of multiple types, low image quality, and poor annotation quality. Secondly, the problems faced in the construction process of datasets are summarized, including imbalanced categories at the dataset level, difficulty in feature extraction at the dataset sample level, and difficulty in measuring the dataset size at the usage level. These include imbalanced inter class and intra class samples, selection bias, multi-scale targets, dense targets, uneven data distribution, uneven image quality, insufficient dataset size, and dataset availability. The main reasons for the problem are analyzed by two key aspects of image acquisition and annotation methods in dataset construction, and the improvement strategies and suggestions for the algorithm to address the above issues are summarized. The collection devices of the dataset can be divided into handheld devices, drone platforms, and fixed collection devices. The collection method of handheld devices is flexible and convenient, but it is inefficient and requires high photography skills. The drone platform acquisition method is suitable for data collection in contiguous areas, but the detailed features captured are not clear enough. The fixed device acquisition method has higher efficiency, but the shooting scene is often relatively fixed. The annotation of image data is divided into rectangular annotation and polygonal annotation. In image recognition and detection, rectangular annotation is generally used more frequently. It is difficult to label images that are difficult to separate the target and background. Improper annotation can lead to the introduction of more noise or incomplete algorithm feature extraction. In response to the problems in the above three aspects, the evaluation methods are summarized for data distribution consistency, dataset size, and image annotation quality at the end of the article. [Conclusions and Prospects] The future research and development suggestions for constructing high-quality agricultural pest and disease image datasets based are proposed on the actual needs of agricultural pest and disease image recognition:(1) Construct agricultural pest and disease datasets combined with practical usage scenarios. In order to enable the algorithm to extract richer target features, image data can be collected from multiple perspectives and environments to construct a dataset. According to actual needs, data categories can be scientifically and reasonably divided from the perspective of algorithm feature extraction, avoiding unreasonable inter class and intra class distances, and thus constructing a dataset that meets task requirements for classification and balanced feature distribution. (2) Balancing the relationship between datasets and algorithms. When improving algorithms, consider the more sufficient distribution of categories and features in the dataset, as well as the size of the dataset that matches the model, to improve algorithm accuracy, robustness, and practicality. It ensures that comparative experiments are conducted on algorithm improvement under the same evaluation standard dataset, and improved the pest and disease image recognition algorithm. Research the correlation between the scale of agricultural pest and disease image data and algorithm performance, study the relationship between data annotation methods and algorithms that are difficult to annotate pest and disease images, integrate recognition algorithms for fuzzy, dense, occluded targets, and propose evaluation indicators for agricultural pest and disease datasets. (3) Enhancing the use value of datasets. Datasets can not only be used for research on image recognition, but also for research on other business needs. The identification, collection, and annotation of target images is a challenging task in the construction process of pest and disease datasets. In the process of collecting image data, in addition to collecting images, attention can be paid to the collection of surrounding environmental information and host information. This method is used to construct a multimodal agricultural pest and disease dataset, fully leveraging the value of the dataset. In order to focus researchers on business innovation research, it is necessary to innovate the organizational form of data collection, develop a big data platform for agricultural diseases and pests, explore the correlation between multimodal data, improve the accessibility and convenience of data, and provide efficient services for application implementation and business innovation.
Authors and Affiliations
GUAN Bolun,ZHANG Liping,ZHU Jingbo,LI Runmei,KONG Juanjuan,WANG Yan,DONG Wei
Rice Disease and Pest Recognition Method Integrating ECA Mechanism and DenseNet201
ObjectiveTo address the problems of low efficiency and high cost of traditional manual identification of pests and diseases, improve the automatic recognition of pests and diseases by introducing advanced technical means...
Identification Method of Wheat Field Lodging Area Based on Deep Learning Semantic Segmentation and Transfer Learning
[Objective] Lodging constitutes a severe crop-related catastrophe, resulting in a reduction in photosynthesis intensity, diminished nutrient absorption efficiency, diminished crop yield, and compromised crop quality. The...
Traceability Model of Plantation Agricultural Products Based on Blockchain and InterPlanetary File System
[Objective]The InterPlanetary File System (IPFS) is a peer-to-peer distributed file system, aiming to establish a global, open, and decentralized network for storage and sharing. Combining the IPFS and blockchain technol...
Visible/NIR Spectral Inversion of Malondialdehyde Content in JUNCAO Based on Deep Convolutional Gengrative Adversarial Network
[Objective] JUNCAO, a perennial herbaceous plant that can be used as medium for cultivating edible and medicinal fungi. It has important value for promotion, but the problem of overwintering needs to be overcome when pla...
An Rapeseed Unmanned Seeding System Based on Cloud-Terminal High Precision Maps
[Objective]Unmanned seeding of rapeseed is an important link to construct unmanned rapeseed farm. Aiming at solving the problems of cumbersome manual collection of small and medium-sized field boundary information in the...