Key points are not available for this paper at this time.
Maximum entropy language modeling techniques combine different sources of statistical dependence, such as syntactic relationships, topic cohesiveness and collocation frequency, in a unified and effective language model. These techniques however are also computationally very intensive, particularly during model estimation, compared to the more prevalent alternative of interpolating several simple models, each capturing one type of dependency. In this paper we present ways which significantly reduce this complexity by reorganizing the required computations. We show that in case of a model with N-gram constraints, each iteration of the parameter estimation algorithm requires the same amount of computation as estimating a comparable back-off N-gram model. In general, the computational cost of each iteration in model estimation is linear in the number of distinct histories seen in the training corpus, times a model-class dependent factor. The reorganization focuses mainly on reducing this...
Jun et al. (Mon,) studied this question.