Sparse Vector

Sparse Vector

May 18, 2024 | seedling, permanent

tags
ML, LLM, Full Text Search

Summary #

A sparse vector is a vector having a relatively small number of nonzero elements.

BM25 #

creating with this algorithm

# pip install pinecone-text
from pinecone_text.sparse import BM25Encoder

corpus = ["The quick brown fox jumps over the lazy dog",
          "The lazy dog is brown",
          "The fox is brown"]

# Initialize BM25 and fit the corpus.
bm25 = BM25Encoder()
bm25.fit(corpus)

doc_sparse_vector = bm25.encode_documents("The brown fox is quick")

# result
# "indices": [102, 16, 18, ...], "values": [0.21, 0.11, 0.15, ...]

OpenSearch #

AWS re:Invent 2023 - Improve your search with vector capabilities in OpenSearch Service

OCR of Images #

2024-05-01_16-33-53_screenshot.png #

Sparse vectors for the neural plugin Chair .901 .930 Find me a coZy place Couch to sit by the fire Republic chair couch Ottoman .802 Television .311 iPhone iPad Gorilla Federalism Ottoman .013 .014 .021 Orca Porpoise Television Phone iPad Porpoise .001 Orca .002 Gorilla Federalism .010 Republic .009 2 Wb relnvent

2024-05-01_16-35-06_screenshot.png #

Ranking with sparse vectors relevance E - ofod tEqnd Improves BM25 scoring, while preserving text relevance Uses less RAM Faster than dense vector ranking 10.2% to 17.4% improvement over BM25 on the BEIR test 14.7% improvement over dense vector ranking elnvent

OCR of Images #

2024-05-01_16-33-53_screenshot.png #

Sparse vectors for the neural plugin Chair .901 .930 Find me a coZy place Couch to sit by the fire Republic chair couch Ottoman .802 Television .311 iPhone iPad Gorilla Federalism Ottoman .013 .014 .021 Orca Porpoise Television Phone iPad Porpoise .001 Orca .002 Gorilla Federalism .010 Republic .009 2 Wb relnvent

2024-05-01_16-35-06_screenshot.png #

Ranking with sparse vectors relevance E - ofod tEqnd Improves BM25 scoring, while preserving text relevance Uses less RAM Faster than dense vector ranking 10.2% to 17.4% improvement over BM25 on the BEIR test 14.7% improvement over dense vector ranking elnvent


Links to this note

Go to random page

Previous Next