Computational botany is a fast growing field of study where technologies are increasingly researched and developed for the benefit of experts such as taxonomists, or even regular plant enthusiasts. Automated plant identification images is at the forefront of such interest and demand. However, the current state-of-the-art systems have a significant drawback whereby they require large datasets to ensure a reliably trained model. This work sets out to investigate the potential of synthetically generated fine-grained leaf images using StyleGAN2-ADA as the training data of deep learning classification models to eliminate the data insufficiency issue. These synthetic images were then used to train the notable InceptionV3 and Xception models for plant species classification on real leaf images. This project investigated the impact of synthesized image quality, quantity, and diversity along with GAN model training length and class-awareness, in producing reliable classifiers from limited real image data. The research results have shown that synthetic leaf images with fine-grained details and large variations can be used to train classifiers with comparable results to models trained using limited real data alone. Moreover, a notable 8% accuracy increment of InceptionV3 is recorded when synthetic images are used to expand the small dataset. It indicates that synthetic images are able to not only improve on classification models when faced with limited real data, but also provide a convenient approach to generate virtually unlimited amounts of reliable training data from a single image generation model. These findings have profound implications for the field of computational botany to overcome the challenge of data insufficiency in automated systems. The ability to generate realistic and diverse synthetic images also opens doors for various applications in automated plant identification, environmental monitoring, and agriculture.
Yap et al. (Wed,) studied this question.