Key points are not available for this paper at this time.
Matrix and tensor factorization have been applied to a number of semantic relatedness tasks, including paraphrase identification.The key idea is that similarity in the latent space implies semantic relatedness.We describe three ways in which labeled data can improve the accuracy of these approaches on paraphrase classification.First, we design a new discriminative term-weighting metric called TF-KLD, which outperforms TF-IDF.Next, we show that using the latent representation from matrix factorization as features in a classification algorithm substantially improves accuracy.Finally, we combine latent features with fine-grained n-gram overlap features, yielding performance that is 3% more accurate than the prior state-of-the-art.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yangfeng Ji
Karlsruhe Institute of Technology
Jacob Eisenstein
Twitter (United States)
Georgia Institute of Technology
Active Technologies (Italy)
Building similarity graph...
Analyzing shared references across papers
Loading...
Ji et al. (Tue,) studied this question.
synapsesocial.com/papers/6a10b6fb8102eb4b66ee3b38 — DOI: https://doi.org/10.18653/v1/d13-1090