HammerW (Hammer++++)

upvoted a paper 10 days ago

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published 11 days ago • 120

upvoted a paper 20 days ago

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published 26 days ago • 71

upvoted a paper 26 days ago

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published 27 days ago • 76

upvoted an article 27 days ago

Article

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

Jun 13

• 41

upvoted a paper about 1 month ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 110

upvoted an article about 1 month ago

Article

Tool Use, Unified

Aug 12

• 53

upvoted a paper 2 months ago

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 101

upvoted an article 2 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 244

upvoted 2 papers 2 months ago

Course-Correction: Safety Alignment Using Synthetic Preferences

Paper • 2407.16637 • Published Jul 23 • 24

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

Paper • 2407.16741 • Published Jul 23 • 67

upvoted a paper 3 months ago

Towards Building Specialized Generalist AI with System 1 and System 2 Fusion

Paper • 2407.08642 • Published Jul 11 • 9

upvoted an article 3 months ago

Article

Fine-tune Llama 3 with ORPO

By

•

Apr 22

• 221

upvoted 7 papers 3 months ago

Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Paper • 2407.02477 • Published Jul 2 • 21

AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology

Paper • 2406.11912 • Published Jun 16 • 26

WPO: Enhancing RLHF with Weighted Preference Optimization

Paper • 2406.11827 • Published Jun 17 • 14

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17 • 48

mDPO: Conditional Preference Optimization for Multimodal Large Language Models

Paper • 2406.11839 • Published Jun 17 • 36

Transformers meet Neural Algorithmic Reasoners

Paper • 2406.09308 • Published Jun 13 • 43

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

Paper • 2406.09961 • Published Jun 14 • 54

upvoted 3 papers 4 months ago

Discovering Preference Optimization Algorithms with and for Large Language Models

Paper • 2406.08414 • Published Jun 12 • 12

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12 • 61

Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published May 27 • 51

upvoted a paper 5 months ago

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18 • 53

upvoted a paper 6 months ago

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Paper • 2404.05961 • Published Apr 9 • 63

upvoted 4 papers 7 months ago

upvoted 3 papers 8 months ago

Code Representation Learning At Scale

Paper • 2402.01935 • Published Feb 2 • 12

Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6 • 109

More Agents Is All You Need

Paper • 2402.05120 • Published Feb 3 • 51

upvoted a paper over 1 year ago

Full Parameter Fine-tuning for Large Language Models with Limited Resources

Paper • 2306.09782 • Published Jun 16, 2023 • 29

Hammer++++

AI & ML interests

Organizations

HammerW's activity

Training Language Models to Self-Correct via Reinforcement Learning

Towards a Unified View of Preference Learning for Large Language Models: A Survey

OLMoE: Open Mixture-of-Experts Language Models

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

Building and better understanding vision-language models: insights and future directions

Tool Use, Unified

The Llama 3 Herd of Models

SmolLM - blazingly fast and remarkably powerful

Course-Correction: Safety Alignment Using Synthetic Preferences

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

Towards Building Specialized Generalist AI with System 1 and System 2 Fusion

Fine-tune Llama 3 with ORPO

Understanding Alignment in Multimodal LLMs: A Comprehensive Study

AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology

WPO: Enhancing RLHF with Weighted Preference Optimization

DataComp-LM: In search of the next generation of training sets for language models

mDPO: Conditional Preference Optimization for Multimodal Large Language Models

Transformers meet Neural Algorithmic Reasoners

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

Discovering Preference Optimization Algorithms with and for Large Language Models

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Transformers Can Do Arithmetic with the Right Embeddings

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Generative Representational Instruction Tuning

Speculative Streaming: Fast LLM Inference without Auxiliary Models

Code Representation Learning At Scale

Self-Discover: Large Language Models Self-Compose Reasoning Structures

More Agents Is All You Need

Full Parameter Fine-tuning for Large Language Models with Limited Resources