Multi-lingual Multi-domain Embedding Benchmark

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

Shitao authored a paper 10 days ago

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

nan authored a paper 2 months ago

Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents

nan authored a paper 2 months ago

Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings

View all activity

MMEB's activity

Shitao

authored a paper 10 days ago

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

Paper • 2412.14475 • Published 11 days ago • 51

nan

authored 4 papers 2 months ago

Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents

Paper • 2310.19923 • Published Oct 30, 2023 • 13

Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings

Paper • 2402.17016 • Published Feb 26 • 5

Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models

Paper • 2406.14848 • Published Jun 21 • 3

jina-embeddings-v3: Multilingual Embeddings With Task LoRA

Paper • 2409.10173 • Published Sep 16 • 28

bwang0911

authored 4 papers 3 months ago

Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models

Paper • 2406.14848 • Published Jun 21 • 3

Shitao

authored a paper 3 months ago

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17 • 108

michael-guenther

authored a paper 3 months ago

jina-embeddings-v3: Multilingual Embeddings With Task LoRA

Paper • 2409.10173 • Published Sep 16 • 28

zl101

authored a paper 4 months ago

MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery

Paper • 2409.05591 • Published Sep 9 • 29

michael-guenther

authored a paper 4 months ago

Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever

Paper • 2408.16672 • Published Aug 29 • 7

bwang0911

posted an update 7 months ago

Post

2530

we are very proud to introduce jinaai/jina-clip-v1, aka "jina-embeddings-multimodal".

The OpenAI CLIP openai/clip-vit-base-patch32 have nice performance to align text and image modality, that user can perform cross-modal text image retrieval or image classification on top of it. However, due to the training data and recipe, it can not:

1. model longer sequence of text inputs (77 token constraint).
2. align text representations (CLIP Text Tower is weak for text search).

In our latest publication, Jina CLIP: Your CLIP Model Is Also Your Text Retriever (2405.20204) , we proposed a multi-task, multi-objective learning scheme. The produced CLIP model shows:

1. Stronger cross-modal performance against OpenAI sets, 2% and 6% improvement on cross-modal retrieval recall@5.
2. Text tower of the JinaCLIP is a strong text encoder, reach the same performance as jinaai/jina-embeddings-v2-base-en, 165% improvement on MTEB[BEIR] recall@5.
3. Image tower of the JinaCLIP also shows strong performance in image-image search (CBIR), 12% recall improvement on Cifar100 test set.

If you are working on MuRAG (multimodal-retrieval argumented generation), try it out!

nan

authored a paper 7 months ago

Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published May 30 • 34

michael-guenther

authored a paper 7 months ago

Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published May 30 • 34

bwang0911

authored a paper 7 months ago

Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published May 30 • 34

Shitao

authored a paper 7 months ago

BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Paper • 2402.03216 • Published Feb 5 • 4

Shitao

authored a paper 8 months ago

Extending Llama-3's Context Ten-Fold Overnight

Paper • 2404.19553 • Published Apr 30 • 33

bwang0911

posted an update 9 months ago

Post

3053

In the vector search setup, we normally combine a fast embedding model and an accurate but slow reranker model.

The newly released @jinaai rerankers are small in size and almost as accurate as our base reranker. This means given a time constraint, it can scoring more candidate documents from embedding models and have a better chance to feed LLM the correct context for RAG generation.

These models are available on Huggingface and has been integrated into the latest SentenceTransformers 2.7.0. Check it out!

jinaai/jina-reranker-v1-turbo-en
jinaai/jina-reranker-v1-tiny-en

1 reply

AI & ML interests

Recent Activity

Team members 6

MMEB's activity