Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.07863

mDPO: Conditional Preference Optimization for Multimodal Large Language Models

Paper • 2406.11839 • Published 12 days ago • 35
Pandora: Towards General World Model with Natural Language Actions and Video States

Paper • 2406.09455 • Published 17 days ago • 12
WPO: Enhancing RLHF with Weighted Preference Optimization

Paper • 2406.11827 • Published 12 days ago • 13
In-Context Editing: Learning Knowledge from Self-Induced Distributions

Paper • 2406.11194 • Published 12 days ago • 14

Papers I want to read

Papers in my to read least

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13 • 62
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 110
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24 • 52
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 77

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13 • 62

about 2 hours ago

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 11
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 4

Datasets, code, and models for online RLHF (i.e., iterative DPO)

RLHFlow/prompt-collection-v0.1

Viewer • Updated May 8 • 179k • 103 • 5
RLHFlow/pair-preference-model-LLaMA3-8B

Text Generation • Updated May 24 • 5.99k • 26
sfairXC/FsfairX-LLaMA3-RM-v0.1

Text Classification • Updated Apr 24 • 77.8k • 31
RLHFlow/SFT-OpenHermes-2.5-Standard

Viewer • Updated Apr 24 • 1M • 385 • 2

Foundation AI Papers (II)

Iterative Reasoning Preference Optimization

Paper • 2404.19733 • Published Apr 30 • 44
Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30 • 65
ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 59
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30 • 101

Fine-tuning LLM

PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Paper • 2403.10704 • Published Mar 15 • 56
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

Paper • 2403.13447 • Published Mar 20 • 16
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6 • 105
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15 • 65

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 59
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28 • 32
Teaching Large Language Models to Reason with Reinforcement Learning

Paper • 2403.04642 • Published Mar 7 • 46
Best Practices and Lessons Learned on Synthetic Data for Language Models

Paper • 2404.07503 • Published Apr 11 • 24

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 179
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15 • 65
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Paper • 2403.13372 • Published Mar 20 • 58
InternLM2 Technical Report

Paper • 2403.17297 • Published Mar 26 • 26

Interesting things.

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Paper • 2403.00745 • Published Mar 1 • 8
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 573
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Paper • 2402.16840 • Published Feb 26 • 23
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21 • 106

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs