Embedding
Last updated
Last updated
First, represent words using one-hot vectors
Suppose the dictionary contains unique words (vocabulary = ).
Then the one-hot vectors are -dimensional.
Second, map the one-hot vectors to low dimensional vectors by
This document introduces the concept of embeddings, gives a simple example of how to train an embedding in TensorFlow, and explains how to view embeddings with the TensorBoard Embedding Projector (live example). The first two parts target newcomers to machine learning or TensorFlow, and the Embedding Projector how-to is for users at all levels.
An alternative tutorial on these concepts is available in the Embeddings section of Machine Learning Crash Course.
An embedding is a mapping from discrete objects, such as words, to vectors of real numbers. For example, a 300-dimensional embedding for English words could include:
The individual dimensions in these vectors typically have no inherent meaning. Instead, it's the overall patterns of location and distance between vectors that machine learning takes advantage of.
Embeddings are important for input to machine learning. Classifiers, and neural networks more generally, work on vectors of real numbers. They train best on dense vectors, where all values contribute to define an object. However, many important inputs to machine learning, such as words of text, do not have a natural vector representation. Embedding functions are the standard and effective way to transform such discrete input objects into useful continuous vectors.
Embeddings are also valuable as outputs of machine learning. Because embeddings map objects to vectors, applications can use similarity in vector space (for instance, Euclidean distance or the angle between vectors) as a robust and flexible measure of object similarity. One common use is to find nearest neighbors. Using the same word embeddings as above, for instance, here are the three nearest neighbors for each word and the corresponding angles:
This would tell an application that apples and oranges are in some way more similar (45.3° apart) than lemons and oranges (48.3° apart).
is parameter matrix which can be learned from training data.
is the one-hot vector of the i-th word in dictionary.