January 1, 2008

Learning Mixtures of Product Distributions Using Correlations and Independence.

Key Points

Key points are not available for this paper at this time.

Abstract

We study the problem of learning mixtures of distributions, a natural formalization of clustering. A mixture of distributions is a collection of distributions D = D1,. . . DT, and � mixing weights, w1,. . . , wT such that i wi = 1. A sample from a mixture is generated by choosing i with probability wi and then choosing a sample from distribution Di. The problem of learning the mixture is that of finding the parameters of the distributions comprising D, given only the ability to sample from the mixture. In this paper, we restrict ourselves to learning mixtures of product distributions. The key to learning the mixtures is to find a few vectors, such that points from different distributions are sharply separated upon projection onto these vectors. Previous techniques use the vectors corresponding to the top few directions of highest variance of the mixture. Unfortunately, these directions may be directions of high noise and not directions along which the distributions are separated. Further, skewed mixing weights amplify the effects of noise, and as a result, previous techniques only work when the separation between the input distributions is large relative to the imbalance in the mixing weights. In this paper, we show an algorithm which successfully learns mixtures of distributions with a separation condition that depends only logarithmically on the skewed mixing weights. In particular, it succeeds for a separation between the centers that is Θ (σ √ T log Λ), where σ is the maximum directional standard deviation of any distribution in the mixture, T is the number of distributions, and Λ is polynomial in T, σ, log n and the imbalance in the mixing

Ask AI

Helpful

Bookmark