Model evaluation

by timesler - opened Jul 13, 2023

Jul 13, 2023

Hi there, thanks for all the great models! Is there any plan for your team to upload evaluation results for any of the GM models?

psinger

H2O.ai org Jul 13, 2023

•

edited Jul 13, 2023

We are currently working on publishing some additional benchmarks.

For now, we ran MT-Bench from https://github.com/lm-sys/FastChat

Here are some relative results:
model mt- score
gpt-3.5-turbo 8.04375
h2ogpt-gm-falcon-40b-v1 6.53125
h2ogpt-gm-open-llama-13b 5.60625
h2ogpt-gm-oasst1-en-xgen-7b-8k 5.28125
h2ogpt-gm-open-llama-7b 5.10625
h2ogpt-gm-falcon-7b 4.92500

willyninja30

Jul 13, 2023

Hello , please what's the main difference between this model and the h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v1 , I just see V1 and V2 with same dataset , so i was wondering if there's something specific or a quality improve?

psinger

H2O.ai org Jul 13, 2023

•

edited Jul 13, 2023

Just a re-run with some personalization, and other hyperparameters.

Both should be pretty much on-par.

willyninja30

Jul 13, 2023

Just a re-run with some personalization, and other hyperparameters.

Both should be pretty much on-par.

Thanks a lot for the feedback. We are about to use a similar method to build our open source model so I wanted to make sure i'm not missing an important point.

v4mmko

Jul 15, 2023

This comment has been hidden

v4mmko

Jul 15, 2023

Amazing work!!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment