12 335 45

Vlad Bogolin

vladbogo

https://vladbogo.com

AI & ML interests

LLMs, Computer Vision

Articles

Organizations

vladbogo's activity

upvoted a paper about 8 hours ago

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness

Paper • 2409.18125 • Published 2 days ago • 24

upvoted a paper 2 days ago

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Paper • 2409.12183 • Published 10 days ago • 30

upvoted a paper 4 days ago

Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation

Paper • 2409.12941 • Published 9 days ago • 13

upvoted 2 papers 5 days ago

Imagine yourself: Tuning-Free Personalized Image Generation

Paper • 2409.13346 • Published 9 days ago • 64

LLMs Will Always Hallucinate, and We Need to Live With This

Paper • 2409.05746 • Published 19 days ago • 2

upvoted a paper 6 days ago

MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Paper • 2407.21770 • Published Jul 31 • 22

upvoted a paper 7 days ago

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published 9 days ago • 119

upvoted a paper 9 days ago

Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement

Paper • 2409.11378 • Published 11 days ago • 1

upvoted a paper 10 days ago

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published 11 days ago • 55

upvoted a paper 12 days ago

InstantDrag: Improving Interactivity in Drag-based Image Editing

Paper • 2409.08857 • Published 15 days ago • 30

upvoted a paper 13 days ago

UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity

Paper • 2409.04081 • Published 23 days ago • 3

upvoted a paper 14 days ago

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Paper • 2409.06666 • Published 18 days ago • 53

upvoted a paper 15 days ago

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Paper • 2409.04109 • Published 23 days ago • 37

upvoted a paper 17 days ago

SongCreator: Lyrics-based Universal Song Generation

Paper • 2409.06029 • Published 19 days ago • 19

upvoted a paper 18 days ago

Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance

Paper • 2409.04593 • Published 22 days ago • 20

upvoted a paper 19 days ago

Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?

Paper • 2407.01119 • Published Jul 1 • 1

upvoted 2 papers 20 days ago

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Paper • 2409.02889 • Published 24 days ago • 54

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Paper • 2409.02634 • Published 24 days ago • 85

upvoted a paper 21 days ago

In Defense of RAG in the Era of Long-Context Language Models

Paper • 2409.01666 • Published 26 days ago • 2

upvoted a paper 22 days ago

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

Paper • 2409.01322 • Published 26 days ago • 95

upvoted a paper 23 days ago

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published 25 days ago • 76

upvoted 2 papers 24 days ago

When All Options Are Wrong: Evaluating Large Language Model Robustness with Incorrect Multiple-Choice Options

Paper • 2409.00113 • Published Aug 27 • 2

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

Paper • 2408.17267 • Published 29 days ago • 22

upvoted a paper 27 days ago

Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models

Paper • 2408.02442 • Published Aug 5 • 18

upvoted a paper 28 days ago

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published Aug 28 • 81

upvoted a paper 29 days ago

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Paper • 2408.15881 • Published Aug 28 • 20

upvoted 10 papers about 1 month ago

Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published Aug 28 • 41

Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27 • 120

Text2SQL is Not Enough: Unifying AI and Databases with TAG

Paper • 2408.14717 • Published Aug 27 • 23

MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

Paper • 2408.11001 • Published Aug 20 • 11

To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20 • 40

TrackGo: A Flexible and Efficient Method for Controllable Video Generation

Paper • 2408.11475 • Published Aug 21 • 16

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21 • 53

UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling

Paper • 2408.04810 • Published Aug 9 • 22

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19 • 51

Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning

Paper • 2408.07931 • Published Aug 15 • 18

upvoted 23 papers about 2 months ago

Thermometer: Towards Universal Calibration for Large Language Models

Paper • 2403.08819 • Published Feb 20 • 1

Efficient Exploration for LLMs

Paper • 2402.00396 • Published Feb 1 • 21

APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

Paper • 2401.12200 • Published Jan 22 • 1

Provably Robust DPO: Aligning Language Models with Noisy Feedback

Paper • 2403.00409 • Published Mar 1 • 1

One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

Paper • 2407.00256 • Published Jun 28 • 1

Towards Modular LLMs by Building and Reusing a Library of LoRAs

Paper • 2405.11157 • Published May 18 • 25

Can AI Assistants Know What They Don't Know?

Paper • 2401.13275 • Published Jan 24 • 1

Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos

Paper • 2403.05535 • Published Mar 8 • 1

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Paper • 2402.04249 • Published Feb 6 • 3

FairProof : Confidential and Certifiable Fairness for Neural Networks

Paper • 2402.12572 • Published Feb 19 • 1

Prompt Sketching for Large Language Models

Paper • 2311.04954 • Published Nov 8, 2023 • 2

Fewer Truncations Improve Language Modeling

Paper • 2404.10830 • Published Apr 16 • 3

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

Paper • 2408.02545 • Published Aug 5 • 32

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5 • 60

Self-Taught Evaluators

Paper • 2408.02666 • Published Aug 5 • 25

Apple Intelligence Foundation Language Models

Paper • 2407.21075 • Published Jul 29 • 2

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 104

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

Paper • 2407.18219 • Published Jul 25 • 3

Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle

Paper • 2407.13833 • Published Jul 18 • 11

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

Paper • 2407.16741 • Published Jul 23 • 67

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 101

Reflexion: Language Agents with Verbal Reinforcement Learning

Paper • 2303.11366 • Published Mar 20, 2023 • 4

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Paper • 2305.04091 • Published May 6, 2023 • 2

upvoted a paper 2 months ago

NExT-GPT: Any-to-Any Multimodal LLM

Paper • 2309.05519 • Published Sep 11, 2023 • 78

Vlad Bogolin

AI & ML interests

Articles

Many-shot jailbreaking

Gecko: Versatile Text Embeddings Distilled from Large Language Models

VideoMamba: State Space Model for Efficient Video Understanding

Genie: Generative Interactive Environments

Rephrasing the Web A Recipe for Compute and Data-Efficient Language Modeling

Reformatted Alignment

Organizations

vladbogo's activity