Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
codelionΒ 
posted an update May 14
Post
1784
The new gpt-4o model seems to a very good coder. OpenAI reported a 90+ score on https://huggingface.co/datasets/openai_humaneval

We tried the new model on our patched-codes/static-analysis-eval which evaluates the model on vulnerability remediation. gpt-4o has reclaimed the top spot on our leaderboard (from meta-llama/Meta-Llama-3-70B-Instruct).

You can now use the new model with our open-source framework PatchWork - https://github.com/patched-codes/patchwork by passing model=gpt-4o on the CLI.

Thanks for posting results for gpt-4o so fast!

You will have to post the latest Gemini model results tomorrow after I/O announcements. :-)

Since we are squarely in the age of multimodal models I am curious if any of the 76 standard scripts run for vulnerability remediation in "static-analysis-eval" demonstrate multimodal vulnerabilities?

Β·

At the moment we do not have any multimodal examples in the benchmark. The focus has been on vulnerability remediation but I cannot think off any use to utilize it in coding related tasks? Do you have any ideas on how multi modality can be exploited for something like coding?

Thanks for sharing (I was a bit worried after seeing that:)
image.png

Here are the updated numbers with the new gemini-1.5-flash-latest model that was released at @goog1e IO - https://huggingface.co/posts/codelion/955796074731531