-
Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement
Paper • 2411.06558 • Published • 34 -
SlimLM: An Efficient Small Language Model for On-Device Document Assistance
Paper • 2411.09944 • Published • 12 -
Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing
Paper • 2411.19460 • Published • 10 -
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
Paper • 2412.05237 • Published • 41
Collections
Discover the best community collections!
Collections including paper arxiv:2412.04445
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 51 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 41 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 52
-
VILA^2: VILA Augmented VILA
Paper • 2407.17453 • Published • 39 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 116 -
Octo-planner: On-device Language Model for Planner-Action Agents
Paper • 2406.18082 • Published • 47 -
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Paper • 2408.15518 • Published • 42
-
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Paper • 2406.02523 • Published • 10 -
UniT: Unified Tactile Representation for Robot Learning
Paper • 2408.06481 • Published • 9 -
Latent Action Pretraining from Videos
Paper • 2410.11758 • Published • 2 -
Neural Fields in Robotics: A Survey
Paper • 2410.20220 • Published • 4
-
Suppressing Pink Elephants with Direct Principle Feedback
Paper • 2402.07896 • Published • 9 -
Policy Improvement using Language Feedback Models
Paper • 2402.07876 • Published • 5 -
Direct Language Model Alignment from Online AI Feedback
Paper • 2402.04792 • Published • 29 -
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Paper • 2401.01335 • Published • 64
-
GRUtopia: Dream General Robots in a City at Scale
Paper • 2407.10943 • Published • 23 -
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion
Paper • 2407.10973 • Published • 9 -
Cross Anything: General Quadruped Robot Navigation through Complex Terrains
Paper • 2407.16412 • Published • 5 -
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands
Paper • 2408.11048 • Published • 4