Key points are not available for this paper at this time.
We propose a framework for modeling sequence motifs based on the Maximum Entropy principle (MEP). We recommend approximating short sequence motif distributions with the Maximum Entropy Distribution (MED) consistent with loworder marginal constraints estimated from available data, which may include dependencies between non-adjacent as well as adjacent positions. Many Maximum Entropy models (MEMs) are specified by simply changing the set of constraints, and are utilized to discriminate between signals and decoys. Classificiation performance using different MEMs gives insight into the relative importance of dependencies between different positions. We apply our framework to large datasets of RNA splicing signals. Our best models outperform previous probabilistic models in the discrimination of human 5 ’ (donor) and 3 ’ (acceptor) splice sites from decoys. Finally, we suggest mechanistically-motivated ways of comparing models.
Yeo et al. (Thu,) studied this question.