Abstract Domestic violence is a deep-rooted societal issue impacting all aspects of women’s lives. The WHO reported that nearly 30% of women globally experience some form of violence. Early identification through a predictive model can lead to timely interventions. With this focus, the study aims to build a predictive machine-learning model to assess the incidence of domestic violence in India. Utilizing nationally representative data from the latest National Family Health Survey (NFHS-5), the study aims to identify and understand the patterns and key determinants of domestic violence in India. The statistical methodology involves exploratory data analysis, feature selection, model development and evaluation, and identifying feature importance. Seven different supervised classification machine learning algorithms are used, namely Logistic Regression, Support Vector Machine, Artificial Neural Networks, Random Forest, XG-Boost, Naive Bayes, and K-Nearest Neighbor. Results exhibit that the Logistic regression model is a more effective predictive model among all utilized machine learning models for assessing the prevalence of domestic violence in India. This finding contradicts the misconception that advanced machine learning algorithms consistently outperform the logistic regression model. Predictors like the partner’s control over the woman, alcohol consumption by the partner, the woman’s characteristics like their age, education, family history of violence, number of family members living together, and some of the socio-demographic predictors like region, caste, and wealth index are identified as major contributors to the incidence of domestic violence. We expect that the insights from the perception of machine learning will provide more impactful and technology-driven information to enhance interventions and policy initiatives aimed at eliminating domestic violence against women.
Verma et al. (Tue,) studied this question.