LLM Compiler Collection Meta LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning. • 4 items • Updated 4 days ago • 121
VoCo-LLaMA: Towards Vision Compression with Large Language Models Paper • 2406.12275 • Published 13 days ago • 28
HelpSteer2: Open-source dataset for training top-performing reward models Paper • 2406.08673 • Published 19 days ago • 14
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published 24 days ago • 49
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published 25 days ago • 69
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training Paper • 2405.15319 • Published May 24 • 23
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation Paper • 2405.14598 • Published May 23 • 11
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model Paper • 2405.09215 • Published May 15 • 14
Searching for Better ViT Baselines Collection Exploring ViT hparams and model shapes for the GPU poor (between tiny and base). • 19 items • Updated 19 days ago • 12
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots Paper • 2405.07990 • Published May 13 • 15
Granite Code Models Collection A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 20 items • Updated 2 days ago • 145
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 116
view article Article StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation Apr 29 • 70
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks Paper • 2404.14723 • Published Apr 23 • 9
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published Apr 19 • 38
view article Article seemore: Implement a Vision Language Model from Scratch By AviSoori1x • 8 days ago • 48
view article Article Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Apr 22 • 75
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published Apr 19 • 27
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding Paper • 2404.11912 • Published Apr 18 • 16
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published Apr 15 • 29
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6 • 85
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11 • 41
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Paper • 2404.05726 • Published Apr 8 • 18
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published Apr 8 • 57
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 123
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model Paper • 2404.01331 • Published Mar 29 • 22
Aurora-M models Collection Aurora-M models (base, biden-harris redteams and instruct) • 5 items • Updated May 6 • 17
PDF Document / OCR Datasets Collection Document datasets with .pdf files that are usable with pixparse libraries and tools. • 2 items • Updated Mar 30 • 38
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models Paper • 2403.20331 • Published Mar 29 • 14
MGM Collection Official model collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 13 items • Updated May 3 • 45
GiT: Towards Generalist Vision Transformer through Universal Language Interface Paper • 2403.09394 • Published Mar 14 • 25
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding Paper • 2403.09626 • Published Mar 14 • 12
BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay Paper • 2402.14194 • Published Feb 22 • 5