Los puntos clave no están disponibles para este artículo en este momento.
Finding good representations of text documents is crucial in information retrieval and classification systems. Today the most popular document representation is based on a vector of word counts in the document. This representation neither captures dependencies between related words, nor handles synonyms or polysemous words. In this paper, we propose an algorithm to learn text document representations based on semi-supervised autoencoders that are stacked to form a deep network. The model can be trained efficiently on partially labeled corpora, producing very compact representations of documents, while retaining as much class information and joint word statistics as possible. We show that it is advantageous to exploit even a few labeled samples during training.
Building similarity graph...
Analyzing shared references across papers
Loading...
Marc ' Aurelio Ranzato
Supélec
Martin Szummer
University of Cambridge
New York University
Microsoft Research (United Kingdom)
Building similarity graph...
Analyzing shared references across papers
Loading...
Ranzato et al. (Tue,) studied this question.
synapsesocial.com/papers/6a0f22dd14089a5783bdbe73 — DOI: https://doi.org/10.1145/1390156.1390256