10 31 11

Wei Liu

PeterV09

https://vpeterv.github.io

AI & ML interests

Machine Learning, Natural Language Processing

Recent Activity

upvoted a collection 2 days ago

M-STAR

upvoted a collection 2 days ago

SimpleRL

upvoted a collection 2 days ago

SimpleRL-Zoo

View all activity

Organizations

PeterV09's activity

upvoted 3 collections 2 days ago

upvoted a paper 2 days ago

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Paper • 2503.18892 • Published 3 days ago • 25

upvoted a paper 20 days ago

START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published 21 days ago • 98

upvoted a paper 22 days ago

Language Models can Self-Improve at State-Value Estimation for Better Search

Paper • 2503.02878 • Published 23 days ago • 8

upvoted a paper 23 days ago

Predictive Data Selection: The Data That Predicts Is the Data That Teaches

Paper • 2503.00808 • Published 25 days ago • 55

upvoted 3 papers about 1 month ago

MoM: Linear Sequence Modeling with Mixture-of-Memories

Paper • 2502.13685 • Published Feb 19 • 33

LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid

Paper • 2502.07563 • Published Feb 11 • 24

CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Paper • 2502.07316 • Published Feb 11 • 47

upvoted a paper 2 months ago

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Paper • 2501.12895 • Published Jan 22 • 59

upvoted 7 papers 3 months ago

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7 • 46

Cosmos World Foundation Model Platform for Physical AI

Paper • 2501.03575 • Published Jan 7 • 75

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 98

PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models

Paper • 2501.03124 • Published Jan 6 • 14

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published Dec 27, 2024 • 87

Diving into Self-Evolving Training for Multimodal Reasoning

Paper • 2412.17451 • Published Dec 23, 2024 • 43

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 47

upvoted 2 papers 6 months ago

CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling

Paper • 2409.19291 • Published Sep 28, 2024 • 19

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

Paper • 2409.17115 • Published Sep 25, 2024 • 62