November 23, 2015

CQADupStack

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

This paper presents a benchmark dataset, CQADupStack, for use in community question-answering (cQA) research. It contains threads from twelve StackExchange subforums, annotated with duplicate question information. We provide pre-defined training and test splits, both for retrieval and classification experiments, to ensure maximum comparability between different studies using the set. Furthermore, it comes with a script to manipulate the data in various ways. We give an analysis of the data in the set, and report benchmark results on a duplicate question retrieval task using well established retrieval models.

CQADupStack

Puntos clave

Resumen

Cite This Study