Los puntos clave no están disponibles para este artículo en este momento.
There have been a number of recent papers on aligning parallel texts at the sentence level, e. g. , Brown et al (1991), Gale and Church (to appear), Isabelle (1992), Kay and Rösenschein (to appear), Simard et al (1992), Warwick-Armstrong and Russell (1990). On clean inputs, such as the Canadian Hansards, these methods have been very successful (at least 96% correct by sentence). Unfortunately, if the input is noisy (due to OCR and/or unknown markup conventions), then these methods tend to break down because the noise can make it difficult to find paragraph boundaries, let alone sentences. This paper describes a new program, charₐlign, that aligns texts at the character level rather than at the sentence/paragraph level, based on the cognate approach proposed by Simard et al.
Kenneth Church (Fri,) studied this question.