Accurate crop yield prediction is a cornerstone for food security, agricultural planning, and evidence-based policy design. In this work, we develop a network-enhanced machine learning framework that combines district similarity structures and crop co-occurrence patterns with rich temporal features to forecast yields for multiple crops across India. The empirical analysis relies on 52 years of district-level agricultural data (1966–2017) from 311 districts and focuses on six key crops: rice, wheat, maize, groundnut, cotton, and sugarcane. We construct two complementary network representations: a district similarity network derived from long-term yield trajectories (311 nodes, 2,996 edges, 6.2% density) and a crop co-occurrence network spanning 23 crops (253 edges). From these networks, we compute several centrality indicators and integrate them with temporal covariates, including lagged yields, rolling statistics, volatility measures, and diversification indices. We used a strict time-series cross-validation setup to compare simple baselines (Naive, Rolling Mean) with more advanced models (Ridge Regression, Random Forest, Gradient Boosting), both with and without network-based features. Among all evaluated models, Random Forest achieved the strongest performance for every crop, yielding R 2 values above 0.94 (rice: 0.988, wheat: 0.976, maize: 0.971, groundnut: 0.946, cotton: 0.969, sugarcane: 0.986). Statistical tests showed that the advanced models significantly outperformed the baselines for five of the six crops ( p 0.05). However, network features contributed less than 1% to overall feature importance, indicating that temporal patterns are the main drivers of prediction. Together with temporal stability checks and residual diagnostics, this evaluation setup offers a solid framework for agricultural forecasting and for designing practical crop yield prediction and decision-support systems. This study is primarily positioned as a rigorous benchmarking and methodological validation framework rather than a performance breakthrough, providing empirical evidence on the relative value of different feature-engineering strategies and establishing best practices for time-series cross-validation in agricultural machine learning. The finding that static network features provide negligible incremental value beyond temporal covariates is itself a significant contribution, guiding practitioners toward investments in data quality rather than complex network constructions.
Shinyclimensa et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: