Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT Paper • 2402.07440 • Published Feb 12, 2024 • 1
DistilDIRE: A Small, Fast, Cheap and Lightweight Diffusion Synthesized Deepfake Detection Paper • 2406.00856 • Published Jun 2, 2024 • 11
NeMo Curator - Classifier Models Collection Classifier models that can be used in NeMo Curator for labelling/filtering datasets. • 9 items • Updated 19 days ago • 10
Jina CLIP: Your CLIP Model Is Also Your Text Retriever Paper • 2405.20204 • Published May 30, 2024 • 34
MMTEB Collection Our contribution to the Massive Multilingual Text Embedding Benchmark (MMTEB). Retrieval and reranking benchmarks in 16 languages. • 4 items • Updated Jun 6, 2024 • 1
Arctic-embed Collection A collection of text embedding models optimized for retrieval accuracy and efficiency • 8 items • Updated 27 days ago • 17
ColPali Models Collection Pre-trained checkpoints for the ColPali model. • 8 items • Updated 24 days ago • 3
Small LMs Text Embedding Collection Contrastive fine-tuned version of Language Models up to 2B parameters using LoRA • 3 items • Updated May 8, 2024 • 4
Matryoshka Embedding Models Collection https://huggingface.co/blog/matryoshka • 14 items • Updated Jun 4, 2024 • 15
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24, 2024 • 57
GTE models Collection General Text Embedding Models Released by Tongyi Lab of Alibaba Group • 19 items • Updated 12 days ago • 18
view article Article Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth By mlabonne • Jul 29, 2024 • 260
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models Paper • 2407.09025 • Published Jul 12, 2024 • 129
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs Paper • 2406.15319 • Published Jun 21, 2024 • 62