Sparse Vector

May 18, 2024 | seedling, permanent

tags: ML, LLM, Full Text Search

Summary #

A sparse vector is a vector having a relatively small number of nonzero elements.

BM25 #

# pip install pinecone-text
from pinecone_text.sparse import BM25Encoder

corpus = ["The quick brown fox jumps over the lazy dog",
          "The lazy dog is brown",
          "The fox is brown"]

# Initialize BM25 and fit the corpus.
bm25 = BM25Encoder()
bm25.fit(corpus)

doc_sparse_vector = bm25.encode_documents("The brown fox is quick")

# result
# "indices": [102, 16, 18, ...], "values": [0.21, 0.11, 0.15, ...]

OpenSearch #

AWS re:Invent 2023 - Improve your search with vector capabilities in OpenSearch Service

OCR of Images #

2024-05-01_16-33-53_screenshot.png #

Sparse vectors for the neural plugin Chair .901 .930 Find me a coZy place Couch to sit by the fire Republic chair couch Ottoman .802 Television .311 iPhone iPad Gorilla Federalism Ottoman .013 .014 .021 Orca Porpoise Television Phone iPad Porpoise .001 Orca .002 Gorilla Federalism .010 Republic .009 2 Wb relnvent

2024-05-01_16-35-06_screenshot.png #

Ranking with sparse vectors relevance E - ofod tEqnd Improves BM25 scoring, while preserving text relevance Uses less RAM Faster than dense vector ranking 10.2% to 17.4% improvement over BM25 on the BEIR test 14.7% improvement over dense vector ranking elnvent

OCR of Images #

2024-05-01_16-33-53_screenshot.png #

Sparse vectors for the neural plugin Chair .901 .930 Find me a coZy place Couch to sit by the fire Republic chair couch Ottoman .802 Television .311 iPhone iPad Gorilla Federalism Ottoman .013 .014 .021 Orca Porpoise Television Phone iPad Porpoise .001 Orca .002 Gorilla Federalism .010 Republic .009 2 Wb relnvent

2024-05-01_16-35-06_screenshot.png #

Ranking with sparse vectors relevance E - ofod tEqnd Improves BM25 scoring, while preserving text relevance Uses less RAM Faster than dense vector ranking 10.2% to 17.4% improvement over BM25 on the BEIR test 14.7% improvement over dense vector ranking elnvent

Sparse Vector

Summary #

BM25 #

OpenSearch #

OCR of Images #

2024-05-01_16-33-53_screenshot.png #

2024-05-01_16-35-06_screenshot.png #

OCR of Images #

2024-05-01_16-33-53_screenshot.png #

2024-05-01_16-35-06_screenshot.png #

Links to this note