Paper - a Aviv-anthonnyolime Collection

Aviv-anthonnyolime 's Collections

Paper

Paper

updated about 19 hours ago

Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

Paper • 2412.15213 • Published 1 day ago • 17
No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published 5 days ago • 35
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published 3 days ago • 82
Autoregressive Video Generation without Vector Quantization

Paper • 2412.14169 • Published 2 days ago • 11
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers

Paper • 2412.12571 • Published 4 days ago • 7
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 8 days ago • 67
Causal Diffusion Transformers for Generative Modeling

Paper • 2412.12095 • Published 4 days ago • 22
Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published 7 days ago • 126
Large Action Models: From Inception to Implementation

Paper • 2412.10047 • Published 8 days ago • 25
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

Paper • 2412.09428 • Published 9 days ago • 7
Phi-4 Technical Report

Paper • 2412.08905 • Published 9 days ago • 87
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

Paper • 2412.14164 • Published 2 days ago • 1
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published 8 days ago • 89
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

Paper • 2412.09585 • Published 8 days ago • 10
Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale

Paper • 2412.09548 • Published 8 days ago • 1
Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published 9 days ago • 38
MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation

Paper • 2412.07147 • Published 11 days ago • 5