new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Aug 23

Submitted by

akhaliq

Sapiens: Foundation for Human Vision Models

·
8 authors

Submitted by

UglyToilet

Controllable Text Generation for Large Language Models: A Survey

·
11 authors

Submitted by

akhaliq

Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

·
39 authors

Submitted by

akhaliq

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

·
10 authors

Submitted by

akhaliq

Hermes 3 Technical Report

·
3 authors

Submitted by

akhaliq

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

·
19 authors

Submitted by

akhaliq

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

·
61 authors

Submitted by

Liuff23

DreamCinema: Cinematic Transfer with Free Camera and 3D Character

·
6 authors

Submitted by

akhaliq

Scalable Autoregressive Image Generation with Mamba

·
7 authors

Submitted by

IAMJB

The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design

·
5 authors

Submitted by

IAMJB

Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese

·
8 authors

Submitted by

akhaliq

Real-Time Video Generation with Pyramid Attention Broadcast

·
4 authors

Submitted by

HenryCai1129

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search

·
8 authors

Submitted by

topyun

SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models

·
4 authors

Submitted by

IAMJB

ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM

·
9 authors

Submitted by

yyyin

SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs

·
10 authors

Submitted by

YunxinLi

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

·
8 authors

Submitted by

akhaliq

Subsurface Scattering for 3D Gaussian Splatting

·
5 authors

Submitted by

akhaliq

Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound

·
4 authors

Submitted by

amanchadha

Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs

·
5 authors