@codelion on Hugging Face: "The new gpt-4o model seems to a very good coder. OpenAI reported a 90+ score…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

codelion

posted an update May 14, 2024

Post

1790

The new gpt-4o model seems to a very good coder. OpenAI reported a 90+ score on https://huggingface.co/datasets/openai_humaneval

We tried the new model on our patched-codes/static-analysis-eval which evaluates the model on vulnerability remediation. gpt-4o has reclaimed the top spot on our leaderboard (from meta-llama/Meta-Llama-3-70B-Instruct).

You can now use the new model with our open-source framework PatchWork - https://github.com/patched-codes/patchwork by passing model=gpt-4o on the CLI.

Taylor658

May 14, 2024

Thanks for posting results for gpt-4o so fast!

You will have to post the latest Gemini model results tomorrow after I/O announcements. :-)

Since we are squarely in the age of multimodal models I am curious if any of the 76 standard scripts run for vulnerability remediation in "static-analysis-eval" demonstrate multimodal vulnerabilities?

codelion

May 14, 2024

At the moment we do not have any multimodal examples in the benchmark. The focus has been on vulnerability remediation but I cannot think off any use to utilize it in coding related tasks? Do you have any ideas on how multi modality can be exploited for something like coding?

victor

May 14, 2024

Thanks for sharing (I was a bit worried after seeing that:)

codelion

May 15, 2024

Here are the updated numbers with the new gemini-1.5-flash-latest model that was released at @goog1e IO - https://huggingface.co/posts/codelion/955796074731531

In this post