January 1, 2008

Semi-supervised learning of compact document representations with deep networks

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Finding good representations of text documents is crucial in information retrieval and classification systems. Today the most popular document representation is based on a vector of word counts in the document. This representation neither captures dependencies between related words, nor handles synonyms or polysemous words. In this paper, we propose an algorithm to learn text document representations based on semi-supervised autoencoders that are stacked to form a deep network. The model can be trained efficiently on partially labeled corpora, producing very compact representations of documents, while retaining as much class information and joint word statistics as possible. We show that it is advantageous to exploit even a few labeled samples during training.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Marc ' Aurelio Ranzato

Supélec

Martin Szummer

University of Cambridge

Actions

Institutions

New York University

Microsoft Research (United Kingdom)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Semi-supervised learning of compact document representations with deep networks

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study