weishen

fakerbaby

AI & ML interests

NLP, alignment, LLM

Recent Activity

liked a model 20 days ago
Qwen/QwQ-32B-Preview
liked a dataset 20 days ago
HPAI-BSC/Aloe-Beta-Medical-Collection
upvoted a collection 24 days ago
Medical QA Datasets
View all activity

Organizations

Fudan NLP's profile picture

fakerbaby's activity

reacted to onekq's post with πŸ‘ 3 months ago
view post
Post
2556
Here is my latest study on OpenAIπŸ“o1πŸ“.
A Case Study of Web App Coding with OpenAI Reasoning Models (2409.13773)

I wrote an easy-to-read blogpost to explain finding.
https://huggingface.co/blog/onekq/daily-software-engineering-work-reasoning-models

INSTRUCTION FOLLOWING is the key.

100% instruction following + Reasoning = new SOTA

But if the model misses or misunderstands one instruction, it can perform far worse than non-reasoning models.