index,model,run,accuracy,precision,recall,f1,ratio_valid_classifications
1,internlm2_5-7b-chat (0.8-epoch),internlm2_5-7b-chat (0.8-epoch),0.7496666666666667,0.8041871978859686,0.7496666666666667,0.7660159670998776,1.0
2,internlm2_5-7b-chat-1m (0.8-epoch),internlm2_5-7b-chat-1m (0.8-epoch),0.803,0.8031411888150441,0.803,0.8028064320197301,1.0
3,internlm2_5-20b-chat (0.8-epoch),internlm2_5-20b-chat (0.8-epoch),0.795,0.817457691710893,0.795,0.8027552955647029,1.0
4,Qwen2-7B-Instruct (0.4-epoch),Qwen2-7B-Instruct (0.4-epoch),0.759,0.8005303465799652,0.759,0.7748745026535183,1.0
5,Qwen2-72B-Instruct (1.8-epoch),Qwen2-72B-Instruct (1.8-epoch),0.784,0.8354349234761956,0.784,0.804194683154365,1.0
6,Llama3.1-8B-Chinese-Chat (1.0-epoch),Llama3.1-8B-Chinese-Chat (1.0-epoch),0.78,0.810582723471486,0.78,0.7924651054056209,1.0
7,Llama3.1-70B-Chinese-Chat (1.0-epoch),Llama3.1-70B-Chinese-Chat (1.0-epoch),0.7963333333333333,0.8248972880055918,0.7963333333333333,0.8076868978089201,1.0
8,gpt-4o-mini (0-shot),gpt-4o-mini (0-shot),0.7176666666666667,0.785706730193659,0.7176666666666667,0.7296061848734905,1.0
9,o1-mini (20-shot),o1-mini (20-shot),0.7343333333333333,0.786101455887261,0.7343333333333333,0.7535300565051624,0.999
10,gpt-4o (10-shot),gpt-4o (10-shot),0.7916666666666666,0.8227707658360168,0.7916666666666666,0.803614688453356,0.9996666666666667
11,o1-preview (50-shot),o1-preview (50-shot),0.7546666666666667,0.7979981023789272,0.7546666666666667,0.7708181822112403,0.9996666666666667
12,Ensemble Model (Open Source),Ensemble Model (Open Source),0.8193333333333334,0.8407464756633664,0.8193333333333334,0.828054127213081,1.0
13,Ensemble Model (OpenAI),Ensemble Model (OpenAI),0.7986666666666666,0.8223071972084313,0.7986666666666666,0.8080230503376233,1.0