June 1, 2008

Interpreting TF-IDF term weights as making relevance decisions

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

A novel probabilistic retrieval model is presented. It forms a basis to interpret the TF-IDF term weights as making relevance decisions. It simulates the local relevance decision-making for every location of a document, and combines all of these “local” relevance decisions as the “document-wide” relevance decision for the document. The significance of interpreting TF-IDF in this way is the potential to: (1) establish a unifying perspective about information retrieval as relevance decision-making; and (2) develop advanced TF-IDF-related term weights for future elaborate retrieval models. Our novel retrieval model is simplified to a basic ranking formula that directly corresponds to the TF-IDF term weights. In general, we show that the term-frequency factor of the ranking formula can be rendered into different term-frequency factors of existing retrieval systems. In the basic ranking formula, the remaining quantity - log p (r¯| t ∈ d ) is interpreted as the probability of randomly picking a nonrelevant usage (denoted by r¯) of term t . Mathematically, we show that this quantity can be approximated by the inverse document-frequency (IDF). Empirically, we show that this quantity is related to IDF, using four reference TREC ad hoc retrieval data collections.

Me gusta

Guardar

Cite This Study

Wu et al. (Sun,) studied this question.

synapsesocial.com/papers/6a0eb2a306ecbe833447b588 https://doi.org/https://doi.org/10.1145/1361684.1361686

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Me gusta

Guardar