mabaochang
commited on
Commit
·
0ce6066
1
Parent(s):
ed52b81
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,127 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
tags:
|
4 |
+
- text2text-generation
|
5 |
+
pipeline_tag: text2text-generation
|
6 |
+
language:
|
7 |
+
- zh
|
8 |
+
- en
|
9 |
+
---
|
10 |
+
|
11 |
+
Considering LLaMA's license constraints, the model is for research and learning only.
|
12 |
+
Please strictly respect LLaMA's usage policy. We are not allowed to publish weights for LLaMA, of course, even finetuned, but there is no problem publishing the difference, a patch that we suggest to apply to the files.
|
13 |
+
The encryption is a simple XOR between files, ensuring that only the people that have access to the original weights (from completely legal sources, of course) can transform them into finetuned weights.
|
14 |
+
You can find the decrypt code on https://github.com/LianjiaTech/BELLE/tree/main/models .
|
15 |
+
|
16 |
+
|
17 |
+
# Model Card for Model ID
|
18 |
+
|
19 |
+
## Welcome
|
20 |
+
If you find this model helpful, please *like* this model and star us on https://github.com/LianjiaTech/BELLE !
|
21 |
+
|
22 |
+
## Update
|
23 |
+
A new checkpoint trained with learning rate of 5e-6 is uploaded.
|
24 |
+
In our evaluation, llama trained with smaller lr achieved better performance.
|
25 |
+
|
26 |
+
## Model description
|
27 |
+
BELLE-LLAMA-7B-2M-enc is based on LLAMA 7B and finetuned with 2M Chinese data combined with 50,000 pieces of English data from the open source Stanford-Alpaca, resulting in good Chinese instruction understanding and response generation capabilities.
|
28 |
+
|
29 |
+
The code of Chinese data generation and other detailed information can be found in our Github project repository: https://github.com/LianjiaTech/BELLE.
|
30 |
+
|
31 |
+
|
32 |
+
## Training hyper-parameters
|
33 |
+
| Parameter | Value |
|
34 |
+
| ------ | ------ |
|
35 |
+
| Batch size | 16 |
|
36 |
+
| Learning rate | 5e-6 |
|
37 |
+
| Epochs | 3 |
|
38 |
+
|Weight_decay | 0.0 |
|
39 |
+
|Warmup_rate | 0.03 |
|
40 |
+
|LR_scheduler | cosine |
|
41 |
+
|
42 |
+
## Download, Convert & Check
|
43 |
+
1. After you git clone this model
|
44 |
+
```
|
45 |
+
md5sum ./*
|
46 |
+
45afa71e3067de5119233a57ef9d093d ./config.json.99a4ef2a26cb38c7f684cb83ed9343f660c561dd5a02a97d1b34b47419324dc5.enc
|
47 |
+
f9b33d359f17a437f6c24b4de6f2272e ./generation_config.json.fd7ff399e5568cc21a0a8414f43df88ef7c424995b9b97a90563165d2cf79efd.enc
|
48 |
+
172013287b452114abf5c0e64936f45b ./pytorch_model-00001-of-00002.bin.166879223b7504f1632d72b1577d57bceaa8fdeee1857c61119e575c50a4aae5.enc
|
49 |
+
384f8dc3b6da063c5f7554c52c531c44 ./pytorch_model-00002-of-00002.bin.2319db050dc286cb22c6e08a51a4ec0d9377017a7182a20a12c39eb658f39c80.enc
|
50 |
+
2ac1e5262eefd012918724d68813d03e ./pytorch_model.bin.index.json.f56e69fedde5d28e4f37f2b62f74e8522bbfa13395a6d696d1ef99222a431ab7.enc
|
51 |
+
c066b68b4139328e87a694020fc3a6c3 ./special_tokens_map.json.ca3d163bab055381827226140568f3bef7eaac187cebd76878e0b63e9e442356.enc
|
52 |
+
2d5d4156fd237fceae85f28d06751020 ./tokenizer_config.json.a672113277a674d753b5cdcfa6bfc860dc69bfcc5511bdccb0c6af3ed08873a0.enc
|
53 |
+
39ec1b33fbf9a0934a8ae0f9a24c7163 ./tokenizer.model.9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347.enc
|
54 |
+
```
|
55 |
+
|
56 |
+
2. Decrypt the files using https://github.com/LianjiaTech/BELLE/tree/main/models#使用说明
|
57 |
+
```
|
58 |
+
for f in "encrypted"/*; do if [ -f "$f" ]; then python3 decrypt.py "$f" "original/7B/consolidated.00.pth" "result/"; fi; done
|
59 |
+
```
|
60 |
+
|
61 |
+
3. Check md5sum
|
62 |
+
```
|
63 |
+
md5sum ./*
|
64 |
+
a57bf2d0d7ec2590740bc4175262610b ./config.json
|
65 |
+
2917a1cafb895cf57e746cfd7696bfe5 ./generation_config.json
|
66 |
+
252143e5ed0f0073dc5c04159a0f78c2 ./pytorch_model-00001-of-00002.bin
|
67 |
+
3f71478bd783685f0a45fc742af85042 ./pytorch_model-00002-of-00002.bin
|
68 |
+
d5230ae5fb3bfd12df98af123be53cf5 ./pytorch_model.bin.index.json
|
69 |
+
8a80554c91d9fca8acb82f023de02f11 ./special_tokens_map.json
|
70 |
+
414f52220807d1300ad700283141de69 ./tokenizer_config.json
|
71 |
+
eeec4125e9c7560836b4873b6f8e3025 ./tokenizer.model
|
72 |
+
```
|
73 |
+
|
74 |
+
## Use model
|
75 |
+
Please note that the input should be formatted as follows in both **training** and **inference**.
|
76 |
+
``` python
|
77 |
+
Human: {input} \n\nAssistant:
|
78 |
+
```
|
79 |
+
|
80 |
+
In order to load BELLE-LLAMA-7B-2M-enc with huggingface transformers, please install the main version, as the latest stable version doesn't support LLAMA (as of March 26, 2023).
|
81 |
+
``` python
|
82 |
+
pip install git+https://github.com/huggingface/transformers
|
83 |
+
```
|
84 |
+
|
85 |
+
After you decrypt the files, BELLE-LLAMA-7B-2M can be easily loaded with LlamaForCausalLM.
|
86 |
+
``` python
|
87 |
+
from transformers import LlamaForCausalLM, AutoTokenizer
|
88 |
+
import torch
|
89 |
+
|
90 |
+
ckpt = './result/'
|
91 |
+
device = torch.device('cuda')
|
92 |
+
model = LlamaForCausalLM.from_pretrained(ckpt, device_map='auto', low_cpu_mem_usage=True)
|
93 |
+
tokenizer = AutoTokenizer.from_pretrained(ckpt)
|
94 |
+
prompt = "Human: 写一首中文歌曲,赞美大自然 \n\nAssistant: "
|
95 |
+
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
|
96 |
+
generate_ids = model.generate(input_ids, max_new_tokens=500, do_sample = True, top_k = 30, top_p = 0.85, temperature = 0.5, repetition_penalty=1., eos_token_id=2, bos_token_id=1, pad_token_id=0)
|
97 |
+
output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
|
98 |
+
response = output[len(prompt):]
|
99 |
+
|
100 |
+
```
|
101 |
+
|
102 |
+
## Limitations
|
103 |
+
There still exists a few issues in the model trained on current base model and data:
|
104 |
+
|
105 |
+
1. The model might generate factual errors when asked to follow instructions related to facts.
|
106 |
+
|
107 |
+
2. Occasionally generates harmful responses since the model still struggles to identify potential harmful instructions.
|
108 |
+
|
109 |
+
3. Needs improvements on reasoning and coding.
|
110 |
+
|
111 |
+
Since the model still has its limitations, we require developers only use the open-sourced code, data, model and any other artifacts generated via this project for research purposes. Commercial use and other potential harmful use cases are not allowed.
|
112 |
+
|
113 |
+
|
114 |
+
## Citation
|
115 |
+
|
116 |
+
Please cite us when using our code, data or model.
|
117 |
+
|
118 |
+
```
|
119 |
+
@misc{BELLE,
|
120 |
+
author = {Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Baochang Ma, Xiangang Li},
|
121 |
+
title = {BELLE: Be Everyone's Large Language model Engine},
|
122 |
+
year = {2023},
|
123 |
+
publisher = {GitHub},
|
124 |
+
journal = {GitHub repository},
|
125 |
+
howpublished = {\url{https://github.com/LianjiaTech/BELLE}},
|
126 |
+
}
|
127 |
+
```
|