January 1, 2013Open Access

Discriminative Improvements to Distributional Sentence Similarity

Key Points

Key points are not available for this paper at this time.

Abstract

Matrix and tensor factorization have been applied to a number of semantic relatedness tasks, including paraphrase identification.The key idea is that similarity in the latent space implies semantic relatedness.We describe three ways in which labeled data can improve the accuracy of these approaches on paraphrase classification.First, we design a new discriminative term-weighting metric called TF-KLD, which outperforms TF-IDF.Next, we show that using the latent representation from matrix factorization as features in a classification algorithm substantially improves accuracy.Finally, we combine latent features with fine-grained n-gram overlap features, yielding performance that is 3% more accurate than the prior state-of-the-art.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yangfeng Ji

Karlsruhe Institute of Technology

Jacob Eisenstein

Twitter (United States)

Actions

Institutions

Georgia Institute of Technology

Active Technologies (Italy)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discriminative Improvements to Distributional Sentence Similarity

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study