Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.20204

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6 • 21
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6 • 9
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7 • 33
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7 • 19

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published 19 days ago • 60
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Paper • 2406.06469 • Published 19 days ago • 22
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Paper • 2406.04271 • Published 23 days ago • 25
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published 25 days ago • 35

Multimodal text-image embeddings

Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published 30 days ago • 27
jinaai/jina-clip-v1

Feature Extraction • Updated 19 days ago • 97.5k • 163

Text Embeddings

This collection will consist of articles on text embeddings. In the future, it will be updated with new approaches.

Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 77
Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

Paper • 2405.06932 • Published May 11 • 15
Gecko: Versatile Text Embeddings Distilled from Large Language Models

Paper • 2403.20327 • Published Mar 29 • 47
Multilingual E5 Text Embeddings: A Technical Report

Paper • 2402.05672 • Published Feb 8 • 16

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24 • 25
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24 • 11
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20 • 44
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21 • 26

Applied Machine Learning Papers

Reading List (Mainly Focused of VLM's and Diffusion Models)

Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

Paper • 2310.03502 • Published Oct 5, 2023 • 75
Scalable Diffusion Models with Transformers

Paper • 2212.09748 • Published Dec 19, 2022 • 10
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Paper • 2311.15127 • Published Nov 25, 2023 • 9
Learning Transferable Visual Models From Natural Language Supervision

Paper • 2103.00020 • Published Feb 26, 2021 • 8

To read... eventually

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 123
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19 • 47
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6 • 9
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 63

InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions

Paper • 2401.13313 • Published Jan 24 • 4
BAAI/Bunny-v1_0-4B

Text Generation • Updated 5 days ago • 103 • 7
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3 • 91
Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published 30 days ago • 27

YOLO-World: Real-Time Open-Vocabulary Object Detection

Paper • 2401.17270 • Published Jan 30 • 30
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Paper • 2401.14405 • Published Jan 25 • 11
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18 • 13
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24 • 25

MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24 • 41
A Touch, Vision, and Language Dataset for Multimodal Alignment

Paper • 2402.13232 • Published Feb 20 • 11
Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20 • 94
FlashTex: Fast Relightable Mesh Texturing with LightControlNet

Paper • 2402.13251 • Published Feb 20 • 13

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs