Pulse Journal Club Active Debates Trending Explore Researchers

Join discussions, follow papers, and never miss your next session.

Download on theApp Store

© Synapse Social LLC, 2026

Home Explore Journal Club Trending

⌘+K

A Log-Linear Model for Unsupervised Text Normalization | Synapse

January 1, 2013Open Access

A Log-Linear Model for Unsupervised Text Normalization

Key Points

Key points are not available for this paper at this time.

Abstract

We present a unified unsupervised statistical model for text normalization. The relation-ship between standard and non-standard to-kens is characterized by a log-linear model, permitting arbitrary features. The weights of these features are trained in a maximum-likelihood framework, employing a novel se-quential Monte Carlo training algorithm to overcome the large label space, which would be impractical for traditional dynamic pro-gramming solutions. This model is im-plemented in a normalization system called UNLOL, which achieves the best known re-sults on two normalization datasets, outper-forming more complex systems. We use the output of UNLOL to automatically normalize a large corpus of social media text, revealing a set of coherent orthographic styles that under-lie online language variation. 1

Ask AI

Helpful

Bookmark

Share

View Full Paper

Ask AI

Helpful

Bookmark

Share

View Full Paper

Cite This Study

Yang et al. (Tue,) studied this question.

synapsesocial.com/papers/6a1f062b9bbe36aec96b6a59 https://doi.org/https://doi.org/10.18653/v1/d13-1007

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

1Insertion, Deletion, or Substitution? Normalizing Text Messages without Pre-categorization nor Supervision2011 · 108 citations
2What to do about bad language on the internet2013 · 336 citations
3Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs2011 · 213 citations
4A Latent Variable Model for Geographic Lexical Variation2018 · 607 citations
5Monte Carlo Smoothing for Nonlinear Time Series2004 · 545 citations