BGE-M3

BGE-M3

May 25, 2024 | seedling, permanent

tags
Hugging Face , Chinese Open Source

Embedding #

hugging face, ref In this project, we introduce BGE-M3, which is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.

  • Multi-Functionality: It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval.
  • Multi-Linguality: It can support more than 100 working languages. Including English and Arabic.
  • Multi-Granularity: It is able to process inputs of different granularities, spanning from short sentences to long documents of up to 8192 tokens.

The model was designed by the Beijing Academy of Artificial Intelligence, and is their state-of-the-art embedding model for multilingual data, supporting more than 100 working languages. It was not yet benchmarked on the MTEB leaderboard as of 22/02/2024.

<2024-02-25 Sun> #

sentence-transformers #

from sentence_transformers import SentenceTransformer
sentences_1 = ["样例数据-1", "样例数据-2"]
sentences_2 = ["样例数据-3", "样例数据-4"]
model = SentenceTransformer('BAAI/bge-large-zh-v1.5')
embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)
similarity = embeddings_1 @ embeddings_2.T
print(similarity)

FlagEmbedding #

examples

OCR of Images #

2024-03-24_11-57-09_screenshot.png #

Hugging Face Search models, datasets, users... Models Datasets Spaces Posts Docs Pricing Log In Sign Up HAN BAAI bge-m3 a 9 like 568 P Sentence Similarity sentencetransformers 0 PyTorch 8 Safetensors xlm-roberta feature-extraction Inference Endpoints - 5 papers m License: mit Model card E Files and versions Community 31 3 Deploy V <> Use in sentence-transformers a Edit model card Downloads last month 1,638,268 For more details please refer to our github repo: https:lIgithub.com/FlagOpen/FlagEmbedding BGE-M3 (paper, code)  Safatancare Aadal cize 56OM arame Tancar 522 +* In this project, we introduce BGE-M3, which is distinguished for its versatility in Multi- 4 Inference API @ 212 Sentence Similarity Source Sentence Functionality, Multi-Linguality, and Multi-Granularity. Multi-Functionality: It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and Your sentence here... sparse retrieval. Sentences to compare to Yoursentence here... Multi-Linguality: It can support more than 100 working languages.


Links to this note

Go to random page

Previous Next