Sparse Vector
- tags
- ML, LLM, Full Text Search
Summary #
A sparse vector is a vector having a relatively small number of nonzero elements.
BM25 #
# pip install pinecone-text
from pinecone_text.sparse import BM25Encoder
corpus = ["The quick brown fox jumps over the lazy dog",
"The lazy dog is brown",
"The fox is brown"]
# Initialize BM25 and fit the corpus.
bm25 = BM25Encoder()
bm25.fit(corpus)
doc_sparse_vector = bm25.encode_documents("The brown fox is quick")
# result
# "indices": [102, 16, 18, ...], "values": [0.21, 0.11, 0.15, ...]
OpenSearch #
AWS re:Invent 2023 - Improve your search with vector capabilities in OpenSearch Service


OCR of Images #
2024-05-01_16-33-53_screenshot.png #

Sparse vectors for the neural plugin Chair .901 .930 Find me a coZy place Couch to sit by the fire Republic chair couch Ottoman .802 Television .311 iPhone iPad Gorilla Federalism Ottoman .013 .014 .021 Orca Porpoise Television Phone iPad Porpoise .001 Orca .002 Gorilla Federalism .010 Republic .009 2 Wb relnvent
2024-05-01_16-35-06_screenshot.png #

Ranking with sparse vectors relevance E - ofod tEqnd Improves BM25 scoring, while preserving text relevance Uses less RAM Faster than dense vector ranking 10.2% to 17.4% improvement over BM25 on the BEIR test 14.7% improvement over dense vector ranking elnvent
OCR of Images #
2024-05-01_16-33-53_screenshot.png #

Sparse vectors for the neural plugin Chair .901 .930 Find me a coZy place Couch to sit by the fire Republic chair couch Ottoman .802 Television .311 iPhone iPad Gorilla Federalism Ottoman .013 .014 .021 Orca Porpoise Television Phone iPad Porpoise .001 Orca .002 Gorilla Federalism .010 Republic .009 2 Wb relnvent
2024-05-01_16-35-06_screenshot.png #

Ranking with sparse vectors relevance E - ofod tEqnd Improves BM25 scoring, while preserving text relevance Uses less RAM Faster than dense vector ranking 10.2% to 17.4% improvement over BM25 on the BEIR test 14.7% improvement over dense vector ranking elnvent