Embedding

Embedding

March 13, 2024 | seedling, permanent

tags :

What is Word Embedding? #

raw text data ->[AI embedding model]-> numerical representation of the original text. representation can capture both meaning of the word and context.

ref In very simple terms, Word Embeddings are the texts converted into numbers and there may be different numerical representations of the same text. But before we dive into the details of Word Embeddings, the following question should come to mind:

  • Word embedding is a technique used to represent words or phrases as dense vectors in a continuous vector space.
  • Unlike vectorization, where each dimension corresponds to a specific term, word embeddings capture semantic relationships between words.
  • In a word embedding space, similar words are closer to each other. The vectors are learned based on the context in which words appear in a large corpus of text.

Word2Vec #

A popular word embedding technique that learns vector representations by predicting the surrounding words of a target word. It creates vectors in such a way that words with similar contexts have similar embeddings.

GloVe #

(Global Vectors for Word Representation): Another widely used word embedding technique that combines both local and global statistics to create word vectors. It leverages co-occurrence statistics from the entire corpus.

BERT #

(Bidirectional Encoder Representations from Transformers): A transformer-based model that generates contextualized word embeddings.

  • It takes into account the entire context of a word in a sentence, leading to more sophisticated word representations.

Why do we need Word Embeddings? #

As we know that many Machine Learning Algorithm and almost all Deep Learning Architectures are not capable of processing strings or plain text in their raw form. In a broad sense, they require numerical numbers as inputs to perform any sort of task, such as classification, regression, clustering, etc. Also, from the huge amount of data that is present in the text format, it is imperative to extract some knowledge out of it and build any useful applications.

sentence-transformers #

LangChain #


Go to random page

Previous Next