This work presents an interdisciplinary methodological framework for the study of language through the integration of Natural Language Processing (NLP), Digital Humanities, and social theory. Drawing on philosophical and anthropological perspectives, the proposed approach conceptualizes language not only as data, but as a cultural, cognitive, and social phenomenon embedded in real-world contexts. The methodology emphasizes a balanced integration of computational techniques with qualitative interpretation, enabling the analysis of textual corpora while preserving semantic, contextual, and ethical dimensions of language use. It outlines principles for corpus construction, annotation, model selection, and interpretability, with particular attention to transparency, reproducibility, and responsible use of language technologies. This contribution also introduces a set of applied use cases illustrating how the framework can be employed across diverse domains, including social discourse analysis, cultural narratives, and interdisciplinary research settings. In alignment with open science practices, the project is designed as a modular and evolving research package, encompassing methodological documentation, datasets, and software components. The framework aims to support collaborative research between computer science, the humanities, and the social sciences, offering a flexible and extensible foundation for the development of NLP applications grounded in interdisciplinary theory and empirical analysis.
Eraña-Diaz et al. (Tue,) studied this question.