What does this research mean for the field?

Logistic regression is the most effective predictive model for assessing the prevalence of domestic violence in India, contradicting the belief that more advanced machine learning algorithms are superior. Novelty: ClaimNovelty.CONTRADICTORY. Consensus alignment: ConsensusAlignment.CHALLENGES_CONSENSUS.

What question did this study set out to answer?

The study aims to develop a machine learning model to predict domestic violence incidence in India.

March 13, 2026Open Access

Predictive Modeling of Domestic Violence in India: A Machine Learning Perspective

Key Points

The study aims to develop a machine learning model to predict domestic violence incidence in India.
Used nationally representative data from the National Family Health Survey (NFHS-5).
Conducted exploratory data analysis and feature selection.
Developed and evaluated seven different supervised classification algorithms.
Identified key predictors contributing to domestic violence.
Logistic regression was identified as the most effective predictive model.
Factors such as partner's control, alcohol consumption, and women's socio-demographic characteristics were significant predictors.
Findings suggest advanced algorithms do not always outperform traditional models.

Abstract

Abstract Domestic violence is a deep-rooted societal issue impacting all aspects of women’s lives. The WHO reported that nearly 30% of women globally experience some form of violence. Early identification through a predictive model can lead to timely interventions. With this focus, the study aims to build a predictive machine-learning model to assess the incidence of domestic violence in India. Utilizing nationally representative data from the latest National Family Health Survey (NFHS-5), the study aims to identify and understand the patterns and key determinants of domestic violence in India. The statistical methodology involves exploratory data analysis, feature selection, model development and evaluation, and identifying feature importance. Seven different supervised classification machine learning algorithms are used, namely Logistic Regression, Support Vector Machine, Artificial Neural Networks, Random Forest, XG-Boost, Naive Bayes, and K-Nearest Neighbor. Results exhibit that the Logistic regression model is a more effective predictive model among all utilized machine learning models for assessing the prevalence of domestic violence in India. This finding contradicts the misconception that advanced machine learning algorithms consistently outperform the logistic regression model. Predictors like the partner’s control over the woman, alcohol consumption by the partner, the woman’s characteristics like their age, education, family history of violence, number of family members living together, and some of the socio-demographic predictors like region, caste, and wealth index are identified as major contributors to the incidence of domestic violence. We expect that the insights from the perception of machine learning will provide more impactful and technology-driven information to enhance interventions and policy initiatives aimed at eliminating domestic violence against women.

Bookmark

View Full Paper

Cite This Study

Verma et al. (Tue,) studied this question.

synapsesocial.com/papers/69b3ace502a1e69014ccf024 https://doi.org/https://doi.org/10.1007/s44199-026-00165-y

Bookmark

View Full Paper