|
--- |
|
license: bsd |
|
--- |
|
Welcome to Qwen2-72B-Instruct-math model, which is used for solving Math Problem. |
|
|
|
<div align="center"> |
|
<h1>Welcome to LLM Math Solver</h1> |
|
|
|
<h4 align="center"> |
|
<a href="https://percent4.github.io/llm_math_solver/"><img src="https://img.shields.io/badge/📄-docs-000000?style=for-the-badge&colorA=09c&colorB=555" height='35px' alt="Docs"></a> |
|
</h4> |
|
<p>LLM Math Solver: using LLM to solve MATH problems. |
|
</p> |
|
<h1></h1> |
|
</div> |
|
|
|
## 评估结果 |
|
|
|
不同模型经过微调的数学能力测评表如下: |
|
|
|
| 基座模型 | GSM8K | MATH | 样本数 | |
|
|---------------------|--------|--------|------| |
|
| QWen1.5-32B | 79.68% | 43.58% | 2402 | |
|
| Yi-1.5-34B | 83.47% | 52.76% | 3480 | |
|
| Yi-1.5-34B-Chat | 85.67% | 57.22% | 3479 | |
|
| QWen-2-72B-Instruct | **93.03%** | **68.54%** | 3469 | |
|
|
|
其它模型: |
|
|
|
|模型|GSM8K | MATH| |
|
|---|---|---| |
|
|GPT-4o-0513|95.8%|76.6%| |
|
|Claude-3.5-Sonnet|96.4%|71.1%| |
|
|GEMINI-1.5-PRO(May 2024)|/|67.7%| |
|
|DeepSeek-Coder-V2-Instruct(236B)|94.9%|75.7%| |
|
|
|
## 使用方法 |
|
|
|
## 参考文献 |
|
|
|
关于该模型使用的训练数据、训练方法和相关文章,可以参考Github上项目: [llm_math_solver](https://github.com/percent4/llm_math_solver). |
|
|
|
文章如下: |
|
|
|
1. [NLP(九十七)大模型数学解题能力的初步探索](https://mp.weixin.qq.com/s?__biz=MzU2NTYyMDk5MQ==&mid=2247486824&idx=1&sn=fd6b36cf78aead227359606a7270516d&chksm=fcb9b4f8cbce3dee332335092f576c703ccdc55598cf45cb7f483f822ba5c72590019384d12a&token=321761101&lang=zh_CN#rd) |
|
2. [NLP(九十九)大模型的数学能力微调及测评](https://mp.weixin.qq.com/s?__biz=MzU2NTYyMDk5MQ==&mid=2247486889&idx=1&sn=27c1a40d3af462f43a80a1ed401843f6&chksm=fcb9b439cbce3d2fd73e753618e0b32027314648eb13dc8b48bb9e713ad5313777c1ef27ce46&token=390124673&lang=zh_CN#rd) |
|
3. [NLP(一百)大模型数学能力测评](https://mp.weixin.qq.com/s?__biz=MzU2NTYyMDk5MQ==&mid=2247486909&idx=1&sn=31b01bd4155b2c9ca15e2a7ae9f4de15&chksm=fcb9b42dcbce3d3bb473cf138f0f0f9a71addeff934900d155b6b90fb2a5857c1926b8aa0e9d&token=584142844&lang=zh_CN#rd) |
|
4. [Open WebUI的Pipelines学习之使用大模型解数学题](https://mp.weixin.qq.com/s?__biz=MzU2NTYyMDk5MQ==&mid=2247487013&idx=1&sn=6a6786ba8c8c7cfdbc02ef558adefe71&chksm=fcb9b7b5cbce3ea37f8fb61e743d0ea0a7d4f5d6b8e8b2c7a80171a5c8c217524d8f307c0146&token=120899150&lang=zh_CN#rd) |