word2vec comes in two flavours:

  • the Continuous Bag-of-Words (CBOW) model
  • predicts target words (e.g. ‘mat’) from source context words (‘the cat sits on the’) smooths over much of the distributional information by treating entire context as one observation (better for smaller datasets)
  • the Skip-gram model
  • predicts source context words from target words treats each context-target pair as a new observation (better for larger datasets)

Neural probabilistic language models

  • use the maximum likelihood principle to maximize probability of next target word given previous words