API Reference¶
Core Methods¶
Simple Expectation-Maximization fitting of mixtures of probability densities.
-
mixem.
em
(data, distributions, initial_weights=None, max_iterations=100, tol=1e-15, tol_iters=10, progress_callback=<function simple_progress>)[source]¶ Fit a mixture of probability distributions using the Expectation-Maximization (EM) algorithm.
Parameters: - data (numpy.ndarray) – The data to fit the distributions for. Can be an array-like or a
numpy.ndarray
- distributions (list of
mixem.distribution.Distribution
) – The list of distributions to fit to the data. - initial_weights (numpy.ndarray) – Inital weights for the distributions. Must be the same size as distributions. If None, will use uniform initial weights for all distributions.
- max_iterations (int) – The maximum number of iterations to compute for.
- tol (float) – The minimum relative increase in log-likelihood after tol_iters iterations
- tol_iters (int) – The number of iterations to go back in comparing log-likelihood change
- progress_callback (function or None) – A function to call to report progress after every iteration.
Return type: tuple (weights, distributitons, log_likelihood)
- data (numpy.ndarray) – The data to fit the distributions for. Can be an array-like or a
Specifying Distributions¶
-
class
mixem.distribution.
Distribution
[source]¶ Base class for a mixEM probability distribution.
To define your own new distribution, all methods of this class will have to be implemented.
-
estimate_parameters
(data, weights)[source]¶ Estimate the probabilities’ parameters using weighted maximum-likelihood estimation and update parameters in-place.
Parameters: - data (numpy.ndarray) – The data \(x\) to estimate parameters for. A \((N \times D)\)
numpy.ndarray
where N is the number of examples and D is the dimensionality of the data - weights – The weights \(\gamma\) for individual data points. A N-element
numpy.ndarray
where N is the number of examples.
Choose those parameters \(\phi\) that maximize the weighted log-likelihood function:
\[ll_\gamma(x|\phi) = \sum_{n=1}^N \gamma_{n} \log [P(x|\phi)]\]Generally, this will involve differentiating the log-likelihood function for all parameters. You can set the derivative of the gradient to 0 and try to solve for the parameter to find a closed-form solution for the maximum-likelihood estimate or use a numerical optimizer to find the maximum-likelihood parameter.
Once parameter estimates are found, update the attributes in place.
- data (numpy.ndarray) – The data \(x\) to estimate parameters for. A \((N \times D)\)
-
log_density
(data)[source]¶ Compute the log-probability density \(\log P(x|\phi)\)
Parameters: data (numpy.ndarray) – The data \(x\) to compute a probability density for. A \((N \times D)\) numpy.ndarray
where N is the number of examples and D is the dimensionality of the dataReturns: The log-probability for observing the data, given the probability distribution’s parameters Return type: float
-
-
class
mixem.distribution.
ExponentialDistribution
(lmbda)[source]¶ Exponential distribution with parameter (lambda).
-
class
mixem.distribution.
GeometricDistribution
(p)[source]¶ Geometric distribution with parameter (p).
-
class
mixem.distribution.
NormalDistribution
(mu, sigma)[source]¶ Univariate normal distribution with parameters (mu, sigma).