Multilingual Embeddings

Multilingual Embeddings

May 25, 2024 | seedling, permanent

tags
English, Arabic

Multilingual Embedding #

MTEB leadership dashboard

how to find the best multilingual embeddings models? github repo where this evaluation is done

Top 5 multilingual models:

Open Source embedding models #

evaluation of open source embedding models

e5-mistral-7b-instruct #

Multilingual-E5-large #

BGE-M3 #

OCR of Images #

2024-02-25_12-35-33_screenshot.png #

Overall Bitext Mining Classification Clustering Pair Classification Reranking Retrieval STS Summarization English Chinese Polish Retrieval English Leaderboard P Metric: Normalized Discounted Cumulative Gain @ k (ndcg_at_10) Languages: English Rank A Model Average A ArguAna A ClimateFEVER A COADupstackRetrieval A DBPedia A FEVER 1 e5-mistral-TB-instruct 56.89 61.88 38.35 42.97 48.89 87.84 2 voyaselite-92-instruct 56.6 70.28 31.95 46.2 39.79 91.35 3 vovaselite-9l-instruct 55.58 58.73 37.47 45.11 43.42 89.71 4 text-embedding-3-large 55.44 58.05 30.27 47.54 44.76 87.94 5 Cohere-embed-english-v3.0 55 61.52 38.43 41.53 43.36 88.97

2024-02-25_12-36-37_screenshot.png #

Name Sequence Length Embedding Dimension Cohere/Cohereembed-multilingual-va. 512 1024 Coheme/Cohereembed-multilingun-lght-vd0 512 384 antioat/muitilingtaledlange 514 1024 text-embeddimg-3-large 8191 3072 text-embedding-ada-002 8191 1536 paraphrase-multilingualMimiL.M-L12-2 512 384 Comparison of Embedding Models Specifications. Image by author.

2024-02-25_14-16-35_screenshot.png #

Search this file... Embedding model Embedding size Context size Size (GB) MTEB Rank (Feb 24) Release date / b-mstral-7b-instruct 4096 32768 14 4 04/01/2024 A mutlinguares-large-nsruct 1024 514 1.12 10 08/02/2024 4 BGE-M3 1024 8192 2.27 NA 29/01/2024 5 nomic-embed-text-Vi 768 8192 0.55 22 10/02/2024 embeddings.models.csy hosted with by GitHub view raw Selected open-source embedding models


No notes link to this note