ONEKQ AI

company

https://onekq.ai

onekq_ai

onekq

Activity Feed

AI & ML interests

Benchmark, Code Generation, LLM

Recent Activity

onekq updated a Space about 1 month ago

onekq-ai/WebApp1K-models-leaderboard

onekq updated a model 2 months ago

onekq-ai/starcoder2-3b-instruct-v0.1

onekq updated a model 2 months ago

onekq-ai/DeepSeek-Coder-V2-Lite-Base-bnb-4bit

View all activity

onekq-ai's activity

onekq

updated a Space about 1 month ago

Running

🥇

WebApp1K Models Leaderboard

onekq

posted an update about 2 months ago

Post

565

October version of Claude 3.5 lifts SOTA (set by its June version) by 7 points.
onekq-ai/WebApp1K-models-leaderboard

Closed sourced models are widening the gap again.

Note: Our frontier leaderboard now uses double test scenarios because the single-scenario test suit has been saturated.

onekq

updated 2 models 2 months ago

onekq-ai/starcoder2-3b-instruct-v0.1

Text Generation • Updated Oct 19 • 22

onekq-ai/DeepSeek-Coder-V2-Lite-Base-bnb-4bit

Text Generation • Updated Oct 19 • 15

onekq

posted an update 2 months ago

Post

1846

I'm now working on finetuning of coding models. If you are GPU-hungry like me, you will find quantized models very helpful. But quantization for finetuning and inference are different and incompatible. So I made two collections here.

Inference (GGUF, via Ollama, CPU is enough)
onekq-ai/ollama-ready-coding-models-67118c3cfa1af2cf04a926d6

Finetuning (Bitsandbytes, QLora, GPU is needed)
onekq-ai/qlora-ready-coding-models-67118771ce001b8f4cf946b2

For quantization, the inference models are far more popular on HF than finetuning models. I use https://huggingface.co/QuantFactory to generate inference models (GGUF), and there are a few other choices.

But there hasn't been such a service for finetuning models. DIY isn't too hard though. I made a few myself and you can find the script in the model cards. If the original model is small enough, you can even do it on a free T4 (available via Google Colab).

If you know a (small) coding model worthy of quantization, please let me know and I'd love to add it to the collections.

onekq

updated 3 models 2 months ago

onekq

updated 2 datasets 3 months ago

onekq-ai/WebApp1K-Duo-React

Viewer • Updated Oct 4 • 1k • 62

onekq-ai/WebApp1K-React

Viewer • Updated Oct 4 • 1k • 54 • 1

onekq

posted an update 3 months ago

Post

2556

Here is my latest study on OpenAI🍓o1🍓.
A Case Study of Web App Coding with OpenAI Reasoning Models (2409.13773)

I wrote an easy-to-read blogpost to explain finding.
https://huggingface.co/blog/onekq/daily-software-engineering-work-reasoning-models

INSTRUCTION FOLLOWING is the key.

100% instruction following + Reasoning = new SOTA

But if the model misses or misunderstands one instruction, it can perform far worse than non-reasoning models.

onekq

authored a paper 3 months ago

A Case Study of Web App Coding with OpenAI Reasoning Models

Paper • 2409.13773 • Published Sep 19 • 5

onekq

posted an update 3 months ago

Post

424

Announce 🎉 WebApp1K-Duo 🎉
onekq-ai/WebApp1K-Duo-React

This is to keep up the challenge after OpenAI o1 models saturated the WebApp1K benchmark. The new benchmark brings SOTA to 67%. Let the hill climbing commence!
onekq-ai/WebApp1K-models-leaderboard

PS: I will publish more findings soon.

onekq

posted an update 3 months ago

Post

549

🐋 DeepSeek 🐋2.5 is hands-down the best open-source model, leaving its peers way behind. It even beats GPT-4o mini.

onekq-ai/WebApp1K-models-leaderboard

The inference of the official API is painfully slow though. I heard the team is short on GPUs (well, who isn't).

onekq

posted an update 3 months ago

Post

1121

If your plan keeps changing it's a sign that you are living the moment.

I just got the pass@1 result of GPT 🍓o1-preview🍓 : 0.95!!!

This means my benchmark is cast into oblivion, I need to up the ante. I am all ears to suggestions. onekq-ai/WebApp1K-models-leaderboard

1 reply

onekq

authored 2 papers 3 months ago

WebApp1K: A Practical Code-Generation Benchmark for Web App Development

Paper • 2408.00019 • Published Jul 30 • 1

Insights from Benchmarking Frontier Language Models on Web App Code Generation

Paper • 2409.05177 • Published Sep 8 • 5

onekq

updated a Space 5 months ago

Running

🌖

AI & ML interests

Recent Activity

Team members 1

onekq-ai's activity

WebApp1K Models Leaderboard

README