API Reference

Core Methods

Simple Expectation-Maximization fitting of mixtures of probability densities.

mixem.em(data, distributions, initial_weights=None, max_iterations=100, tol=1e-15, tol_iters=10, progress_callback=<function simple_progress>)[source]

Fit a mixture of probability distributions using the Expectation-Maximization (EM) algorithm.

Parameters:
  • data (numpy.ndarray) – The data to fit the distributions for. Can be an array-like or a numpy.ndarray
  • distributions (list of mixem.distribution.Distribution) – The list of distributions to fit to the data.
  • initial_weights (numpy.ndarray) – Inital weights for the distributions. Must be the same size as distributions. If None, will use uniform initial weights for all distributions.
  • max_iterations (int) – The maximum number of iterations to compute for.
  • tol (float) – The minimum relative increase in log-likelihood after tol_iters iterations
  • tol_iters (int) – The number of iterations to go back in comparing log-likelihood change
  • progress_callback (function or None) – A function to call to report progress after every iteration.
Return type:

tuple (weights, distributitons, log_likelihood)

mixem.probability(data, weights, distributions)[source]

Compute the probability for data of the mixture density model given by weights and a list of distributions

Specifying Distributions

class mixem.distribution.Distribution[source]

Base class for a mixEM probability distribution.

To define your own new distribution, all methods of this class will have to be implemented.

estimate_parameters(data, weights)[source]

Estimate the probabilities’ parameters using weighted maximum-likelihood estimation and update parameters in-place.

Parameters:
  • data (numpy.ndarray) – The data \(x\) to estimate parameters for. A \((N \times D)\) numpy.ndarray where N is the number of examples and D is the dimensionality of the data
  • weights – The weights \(\gamma\) for individual data points. A N-element numpy.ndarray where N is the number of examples.

Choose those parameters \(\phi\) that maximize the weighted log-likelihood function:

\[ll_\gamma(x|\phi) = \sum_{n=1}^N \gamma_{n} \log [P(x|\phi)]\]

Generally, this will involve differentiating the log-likelihood function for all parameters. You can set the derivative of the gradient to 0 and try to solve for the parameter to find a closed-form solution for the maximum-likelihood estimate or use a numerical optimizer to find the maximum-likelihood parameter.

Once parameter estimates are found, update the attributes in place.

log_density(data)[source]

Compute the log-probability density \(\log P(x|\phi)\)

Parameters:data (numpy.ndarray) – The data \(x\) to compute a probability density for. A \((N \times D)\) numpy.ndarray where N is the number of examples and D is the dimensionality of the data
Returns:The log-probability for observing the data, given the probability distribution’s parameters
Return type:float
class mixem.distribution.ExponentialDistribution(lmbda)[source]

Exponential distribution with parameter (lambda).

class mixem.distribution.GeometricDistribution(p)[source]

Geometric distribution with parameter (p).

class mixem.distribution.NormalDistribution(mu, sigma)[source]

Univariate normal distribution with parameters (mu, sigma).

class mixem.distribution.MultivariateNormalDistribution(mu, sigma)[source]

Multivariate normal distribution with parameters (mu, Sigma).

class mixem.distribution.LogNormalDistribution(mu, sigma)[source]

Univariate log-normal distribution with parameters (mu, sigma).