Key points are not available for this paper at this time.
A major challenge in supervised sentence compression is making use of rich feature representations because of very scarce parallel data.We address this problem and present a method to automatically build a compression corpus with hundreds of thousands of instances on which deletion-based algorithms can be trained.In our corpus, the syntactic trees of the compressions are subtrees of their uncompressed counterparts, and hence supervised systems which require a structural alignment between the input and output can be successfully trained.We also extend an existing unsupervised compression method with a learning module.The new system uses structured prediction to learn from lexical, syntactic and other features.An evaluation with human raters shows that the presented data harvesting method indeed produces a parallel corpus of high quality.Also, the supervised system trained on this corpus gets high scores both from human raters and in an automatic evaluation setting, significantly outperforming a strong baseline.
Filippova et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: