File size: 3,086 Bytes
d9d20f3 0bc29be 31bad26 b0c5524 31bad26 18a8e5f 31bad26 31fcb57 0b22261 31bad26 18a8e5f 31bad26 57d87ea 31bad26 18a8e5f 31bad26 18a8e5f 31bad26 18a8e5f 31bad26 18a8e5f 31bad26 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
---
license: agpl-3.0
datasets:
- stvlynn/Cantonese-Dialogue
language:
- zh
pipeline_tag: text-generation
tags:
- Cantonese
- 廣東話
- 粤语
base_model: Qwen/Qwen-7B-Chat
---
# Qwen-7B-Chat-Cantonese (通义千问·粤语)
## Intro
Qwen-7B-Chat-Cantonese is a fine-tuned version based on Qwen-7B-Chat, trained on a substantial amount of Cantonese language data.
Qwen-7B-Chat-Cantonese係基於Qwen-7B-Chat嘅微調版本,基於大量粵語數據進行訓練。
[ModelScope(魔搭社区)](https://www.modelscope.cn/models/stvlynn/Qwen-7B-Chat-Cantonese)
## Usage
### Requirements
* python 3.8 and above
* pytorch 1.12 and above, 2.0 and above are recommended
* CUDA 11.4 and above are recommended (this is for GPU users, flash-attention users, etc.)
### Dependency
To run Qwen-7B-Chat-Cantonese, please make sure you meet the above requirements, and then execute the following pip commands to install the dependent libraries.
```bash
pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed
```
In addition, it is recommended to install the `flash-attention` library (**we support flash attention 2 now.**) for higher efficiency and lower memory usage.
```bash
git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .
```
### Quickstart
Pls turn to QwenLM/Qwen - [Quickstart](https://github.com/QwenLM/Qwen?tab=readme-ov-file#quickstart)
## Training Parameters
| Parameter | Description | Value |
|-----------------|----------------------------------------|--------|
| Learning Rate | AdamW optimizer learning rate | 7e-5 |
| Weight Decay | Regularization strength | 0.8 |
| Gamma | Learning rate decay factor | 1.0 |
| Batch Size | Number of samples per batch | 1000 |
| Precision | Floating point precision | fp16 |
| Learning Policy | Learning rate adjustment policy | cosine |
| Warmup Steps | Initial steps without learning rate adjustment | 0 |
| Total Steps | Total training steps | 1024 |
| Gradient Accumulation Steps | Number of steps to accumulate gradients before updating | 8 |
![loss](https://cdn.statically.io/gh/stvlynn/cloudimg@master/blog/2310/image.q9v1ak08ljk.webp)
## Demo
![深水埗有哪些美食](https://cdn.statically.io/gh/stvlynn/cloudimg@master/blog/2310/截屏2024-05-04-11.59.27.2bea6k113e68.webp)
![鲁迅为什么打周树人](https://cdn.statically.io/gh/stvlynn/cloudimg@master/blog/2310/截屏2024-05-04-11.56.46.72tt5czl2gw0.webp)
![树上几只鸟](https://cdn.statically.io/gh/stvlynn/cloudimg@master/blog/2310/截屏2024-05-04-12.00.38.267hvmc3z3c0.webp)
## Special Note
This is my first fine-tuning LLM project. Pls forgive me if there's anything wrong.
If you have any questions or suggestions, feel free to contact me.
[Twitter @stv_lynn](https://x.com/stv_lynn)
[Telegram @stvlynn](https://t.me/stvlynn)
[email i@stv.pm](mailto://i@stv.pm) |