January 1, 2018Open Access

Discovering Sociolinguistic Associations with Structured Sparsity

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors' geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. By imposing a composite ℓ1,∞ regularizer, we obtain structured sparsity, driving entire rows of coefficients to zero. We perform two regression studies. First, we use term frequencies to predict demographic attributes; our method identifies a compact set of words that are strongly associated with author demographics. Next, we conjoin demographic attributes into features, which we use to predict term frequencies. The composite regularizer identifies a small number of features, which correspond to communities of authors united by shared demographic and linguistic properties

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jacob Eisenstein

Twitter (United States)

Noah A. Smith

University of North Carolina at Chapel Hill

Eric P. Xing

Mohamed bin Zayed University of Artificial Intelligence

Actions

Institutions

Carnegie Mellon University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discovering Sociolinguistic Associations with Structured Sparsity

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study