March 3, 2026Open Access

Penalized weighted generalized estimation equations for high-dimensional longitudinal data with informative cluster size

Key Points

The penalized weighted GEE approach shows significant improvement in estimation consistency amid informative cluster size.
Key evidence indicates that the approach achieves asymptotic equivalence to the Oracle estimator under specific conditions.
Utilizing a new weighted GEE method, the analysis effectively addresses challenges posed by high-dimensional longitudinal data.
Further validation through simulations confirms the approach's superiority compared to existing methods, supporting broader applicability.

Abstract

High-dimensional longitudinal data have become increasingly prevalent in recent studies, and penalized generalized estimating equations (GEEs) are often used to model such data. However, the desirable properties of the GEE method can break down when the outcome of interest is associated with cluster size, a phenomenon known as informative cluster size. In this article, we address this issue by formulating the effect of informative cluster size and proposing a novel weighted GEE approach to mitigate its impact, while extending the penalized version for high-dimensional settings. We show that the penalized weighted GEE approach achieves consistency in both model selection and estimation. Theoretically, we establish that the proposed penalized weighted GEE estimator is asymptotically equivalent to the Oracle estimator, assuming that the true model is known. This result indicates that the penalized weighted GEE approach retains the excellent properties of the GEE method and is robust to informative cluster sizes, thereby extending its applicability to more complex situations. Simulations and a real data application further demonstrate that the penalized weighted GEE outperforms the existing alternative methods.

Bookmark

View Full Paper

Bookmark

View Full Paper

Penalized weighted generalized estimation equations for high-dimensional longitudinal data with informative cluster size

Key Points

Abstract

Cite This Study