January 1, 1998

Clumping properties of content‐bearing words

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Information Retrieval Systems identify content bearing words, and possibly also assign weights, as part of the process of formulating requests. For optimal retrieval efficiency, it is desirable that this be done automatically. This paper defines the notion of serial-clustering of words in text, and explores the value of such clustering as an indicator of a words bearing content. This approach is flexible in the sense that it is sensitive to context: a term may be assessed as content-bearing within one collection, but not another. Our approach, being numerical, may also be of value in assigning weights to terms in requests. Experimental support is obtained from natural text databases in three different languages. 1. Introduction and Background Automatic Information Retrieval (IR) has in the past been based on global word-counts --- the only indicators previously available for assessing the content-bearing strength of words. But the advent of full text databases has created new possibi...

Me gusta

Guardar