PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 9 items • Updated Nov 27, 2024 • 53
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection Paper • 2411.12946 • Published Nov 20, 2024 • 20
jina-embeddings-v3 Collection Multilingual multi-task general text embedding model • 6 items • Updated Sep 19, 2024 • 19
jina-embeddings-v3: Multilingual Embeddings With Task LoRA Paper • 2409.10173 • Published Sep 16, 2024 • 28
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25, 2024 • 87
Jina CLIP: Your CLIP Model Is Also Your Text Retriever Paper • 2405.20204 • Published May 30, 2024 • 34
Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings Paper • 2402.17016 • Published Feb 26, 2024 • 5
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents Paper • 2310.19923 • Published Oct 30, 2023 • 14
Generalist embedding models are better at short-context clinical semantic search than specialized embedding models Paper • 2401.01943 • Published Jan 3, 2024 • 6
jina-embeddings-v2 Collection The V2 family of Jina Embeddings supports encoding large documents with 8k sequence length. • 8 items • Updated Sep 17, 2024 • 15
Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models Paper • 2307.11224 • Published Jul 20, 2023 • 6