Report a bug
If you spot a problem with this page, click here to create a GitHub issue.
Improve this page
Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using a local clone.


Online variational Bayes for latent Dirichlet allocation

References Hoffman, Matthew D., Blei, David M. and Bach, Francis R.. "Online Learning for Latent Dirichlet Allocation.." Paper presented at the meeting of the NIPS, 2010.

Ilya Yaroshenko
struct LdaHoffman(F) if (isFloatingPoint!F);
Batch variational Bayes for LDA with mini-batches.
this(size_t K, size_t W, size_t D, F alpha, F eta, F tau0, F kappa, F eps = 1e-05, TaskPool tp = taskPool());
size_t K theme count
size_t W dictionary size
size_t D approximate total number of documents in a collection.
F alpha Dirichlet document-topic prior (0.1)
F eta Dirichlet word-topic prior (0.1)
F tau0 tau0 ≧ 0 slows down the early iterations of the algorithm.
F kappa kappa belongs to (0.5, 1], controls the rate at which old values of lambda are forgotten. lambda = (1 - rho(tau)) lambda + rho lambda', rho(tau) = (tau0 + tau)^(-kappa). Use kappa = 0 for Batch variational Bayes LDA.
F eps Stop iterations if ||lambda - lambda'||_l1 < s * eps, where s is a documents count in a batch.
TaskPool tp task pool
void updateBeta();
@property Slice!(F*, 2) beta();
Posterior over the topics
@property Slice!(F*, 2) lambda();
Parameterized posterior over the topics.
const @property F tau();

@property void tau(F v);
Count of already seen documents. Slows down the iterations of the algorithm.
size_t putBatch(SliceKind kind, C, I, J)(Slice!(ChopIterator!(J*, Series!(I*, C*)), 1, kind) n, size_t maxIterations);
Accepts mini-batch and performs multiple E-step iterations for each document and single M-step.
This implementation is optimized for sparse documents, which contain much less unique words than a dictionary.
Slice!(ChopIterator!(J*, Series!(I*, C*)), 1, kind) n mini-batch, a collection of compressed documents.
size_t maxIterations maximal number of iterations for s This implementation is optimized for sparse documents, ingle document in a batch for E-step.