OpsEval / data_v2 /network_en_mc_gen.csv
Junetheriver's picture
update leaderboard 2025-02-27
cd43969
raw
history blame
2.97 kB
name,zero_naive,zero_self_con,zero_cot,zero_cot_self_con,few_naive,few_self_con,few_cot,few_cot_self_con
AquilaChat2-34B,36.63,36.63,44.83,44.83,46.65,46.65,,
Baichuan-13B-Chat,18.3,20.4,28.6,37.0,24.1,26.7,18.2,17.8
Baichuan2-13B-Chat,14.1,15.3,24.1,25.8,32.3,33.1,25.6,27.7
ChatGLM2-6B,24.8,24.7,36.6,36.5,37.6,37.6,40.5,40.5
ChatGLM3-6B,43.38487973,43.38487973,44.58762887,44.58762887,42.09621993,42.09621993,43.47079038,43.47079038
Chinese-Alpaca-2-13B,37.7,37.7,49.7,49.7,48.6,48.6,50.5,50.5
Chinese-LLaMA-2-13B,29.4,29.4,37.8,37.8,40.4,40.4,28.8,28.8
DevOps-Model-14B-Chat,30.69,30.59,55.77,63.63,63.85,61.96,41.15,44.01
ERNIE-Bot-4.0,61.15,61.15,70.0,70.0,60.0,60.0,70.0,70.0
GPT-3.5-turbo,66.6,66.8,69.6,72.0,68.3,68.3,70.9,72.5
GPT-4,,,,,,,88.7,88.7
InternLM-7B,38.7,38.7,43.9,43.9,45.2,45.2,51.4,51.4
InternLM2-Chat-20B,56.35738832,56.35738832,26.18025751,26.18025751,60.48109966,60.48109966,45.10309278,45.10309278
InternLM2-Chat-7B,49.74226804,49.74226804,56.18556701,56.18556701,48.19587629,48.19587629,49.74226804,49.74226804
LLaMA-2-13B,41.8,46.5,53.1,58.7,53.3,53.0,56.8,61.0
LLaMA-2-70B-Chat,25.29,25.29,57.97,58.06,52.97,52.97,58.55,58.55
LLaMA-2-7B,39.5,40.0,45.4,49.5,48.2,46.8,52.0,55.2
Mistral-7B,29.27,29.27,46.3,46.3,47.22,47.22,45.58,45.58
Qwen-14B-Chat,43.78,47.81,56.58,59.4,62.09,59.7,49.06,55.88
Qwen-72B-Chat,70.41,70.5,72.38,72.56,70.32,70.32,70.13,70.22
Qwen-7B-Chat,45.9,46.0,47.3,50.1,52.1,51.0,48.3,49.8
Yi-34B-Chat,57.75,59.14,65.11,68.79,68.16,68.37,78.09,80.06
JIUTIAN-75B-net,69.38775510204081,69.38775510204081,76.04726100966703,76.04726100966703,72.71750805585391,72.71750805585391,76.1546723952739,76.1546723952739
Claude-3-Opus,69.03417341637355,69.03417341637355,,,,,,
Deepseek-R1-Distill-Llama-8B,19.65121264227399,19.65121264227399,44.240352567500906,44.240352567500906,25.932243238384128,25.932243238384128,37.26068697187482,37.26068697187482
Deepseek-R1-Distill-Qwen-1.5B,15.894119111389756,15.894119111389756,20.67346477782439,20.67346477782439,18.523623589937444,18.523623589937444,25.022012419153338,25.022012419153338
Deepseek-R1-Distill-Qwen-14B,37.09841741078631,37.09841741078631,,,31.725773661621865,31.725773661621865,,
Deepseek-R1-Distill-Qwen-32B,61.682486412229224,61.682486412229224,,,37.41811610571493,37.41811610571493,,
Deepseek-R1-Distill-Qwen-7B,17.8256800800284,17.8256800800284,35.38237070296834,35.38237070296834,25.341711114081956,25.341711114081956,34.738132885862726,34.738132885862726
Gemma-2B,26.46048,26.46048,33.41924,33.41924,26.6323,26.6323,37.54296,37.54296
Gemma-7B,25.08591,25.08591,50.85911,50.85911,30.24055,30.24055,51.55747,51.55747
Meta-Llama-3-8B-Instruct,38.279481659390655,38.279481659390655,76.69172932330827,76.69172932330827,23.734458771084668,23.734458771084668,33.241749376506874,33.241749376506874
Qwen1.5-14B-Base,34.87973,34.87973,60.82474,60.82474,65.54983,65.54983,47.07904,47.07904
Qwen1.5-14B-Chat,54.89691,56.4433,64.08935,67.09622,52.23368,53.52234,59.53608,64.17526