|
--- |
|
license: gpl-3.0 |
|
tags: |
|
- text2text-generation |
|
pipeline_tag: text2text-generation |
|
language: |
|
- zh |
|
- en |
|
--- |
|
|
|
Considering LLaMA's license constraints, the model is for research and learning only. |
|
Please strictly respect LLaMA's usage policy. |
|
We are not allowed to publish weights for LLaMA, of course, even finetuned, but there is no problem publishing the difference, a patch that we suggest to apply to the files. |
|
The encryption is a simple XOR between files, ensuring that only the people that have access to the original weights (from completely legal sources, of course) can transform them into finetuned weights. |
|
You can find the decrypt code on https://github.com/LianjiaTech/BELLE/tree/main/models . |
|
|
|
|
|
# GPTQ-for-LLaMa |
|
|
|
## Welcome |
|
If you find this model helpful, please *like* this model and star us on https://github.com/LianjiaTech/BELLE ! |
|
|
|
## Model description |
|
8 bits quantization of [BELLE-LLAMA-7B-2M](https://huggingface.co/BelleGroup/BELLE-LLAMA-7B-2M-enc) using [GPTQ](https://arxiv.org/abs/2210.17323) |
|
|
|
GPTQ is SOTA one-shot weight quantization method. |
|
|
|
The code of inference can be found in our Github project repository: https://github.com/LianjiaTech/BELLE/tree/main/gptq. |
|
|
|
Basically, 8-bit quantization and 128 groupsize are recommended. |
|
|
|
**This code is based on [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa)** |
|
|
|
## Model list |
|
|
|
| model name | file size | GPU memory usage | |
|
| -------------------------------------------------- | ------------------- | ------------------ | |
|
| llama7b-2m | 26G | ~15G | |
|
| llama7b-2m-8bit-128g.pt | 6.8G | ~8.9G | |
|
| llama7b-2m-4bit-128g.pt | 3.8G | ~5.6G | |
|
|
|
## Check md5 |
|
1. After you git clone this model |
|
``` |
|
md5sum ./* |
|
340aa9ee27fa7931ccbabcc30f2f8a27 ./config.json.db303d8f096e427bd21ff97bb169c84fb3ae11336a644e3da3506419d44f6429.enc |
|
f9b33d359f17a437f6c24b4de6f2272e ./generation_config.json.fd7ff399e5568cc21a0a8414f43df88ef7c424995b9b97a90563165d2cf79efd.enc |
|
591a2ecabc03530ba70663784fddb0e5 ./llama7b-2m-4bit-128g.pt.8576bae21290e9e75a60f38a6010709255656b19330a0df9a4bf50e1ee83fc51.enc |
|
65926fddcd56be59b0bebf97f1518106 ./llama7b-2m-8bit-128g.pt.44227e0ee3633967c555ed9ba7a89f340955545f6e32f7d5dfdc28603f6e27d2.enc |
|
1ab707fa9b0c4be294fd0b867d73e919 ./special_tokens_map.json.44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a.enc |
|
ff291fcfa4e0048ca4ff262312faad83 ./tokenizer_config.json.ef7ef410b9b909949e96f172b17cbf7c68b11761c632715fa05a6088c0c2b9ac.enc |
|
39ec1b33fbf9a0934a8ae0f9a24c7163 ./tokenizer.model.9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347.enc |
|
``` |
|
|
|
2. Decrypt the files using the scripts in https://github.com/LianjiaTech/BELLE/tree/main/models |
|
|
|
You can use the following command in Bash. |
|
Please replace "/path/to_encrypted" with the path where you stored your encrypted file, |
|
replace "/path/to_original_llama_7B" with the path where you stored your original llama7B file, |
|
and replace "/path/to_finetuned_model" with the path where you want to save your final trained model. |
|
|
|
```bash |
|
mkdir /path/to_finetuned_model |
|
for f in "/path/to_encrypted"/*; \ |
|
do if [ -f "$f" ]; then \ |
|
python3 decrypt.py "$f" "/path/to_original_llama_7B/consolidated.00.pth" "/path/to_finetuned_model/"; \ |
|
fi; \ |
|
done |
|
``` |
|
|
|
After executing the aforementioned command, you will obtain the following files. |
|
|
|
``` |
|
./config.json |
|
./generation_config.json |
|
./llama7b-2m-4bit-128g.pt |
|
./llama7b-2m-8bit-128g.pt |
|
./special_tokens_map.json |
|
./tokenizer_config.json |
|
./tokenizer.model |
|
``` |
|
|
|
3. Check md5sum |
|
|
|
You can verify the integrity of these files by performing an MD5 checksum to ensure their complete recovery. |
|
Here are the MD5 checksums for the relevant files: |
|
``` |
|
md5sum ./* |
|
32490e7229fb82c643e3a7b8d04a6c4b ./config.json |
|
2917a1cafb895cf57e746cfd7696bfe5 ./generation_config.json |
|
856cb1e00b6837f71b8d77f8b44ee5a5 ./llama7b-2m-4bit-128g.pt |
|
a35a44e6ff57e672f649635cf966f5bd ./llama7b-2m-8bit-128g.pt |
|
99914b932bd37a50b983c5e7c90ae93b ./special_tokens_map.json |
|
5526ad31f4928acb5219e295e5ff81ce ./tokenizer_config.json |
|
eeec4125e9c7560836b4873b6f8e3025 ./tokenizer.model |
|
``` |
|
|
|
## Limitations |
|
There still exists a few issues in the model trained on current base model and data: |
|
|
|
1. The model might generate factual errors when asked to follow instructions related to facts. |
|
|
|
2. Occasionally generates harmful responses since the model still struggles to identify potential harmful instructions. |
|
|
|
3. Needs improvements on reasoning and coding. |
|
|
|
Since the model still has its limitations, we require developers only use the open-sourced code, data, model and any other artifacts generated via this project for research purposes. Commercial use and other potential harmful use cases are not allowed. |
|
|
|
## Citation |
|
|
|
Please cite us when using our code, data or model. |
|
|
|
``` |
|
@misc{BELLE, |
|
author = {Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Baochang Ma, Xiangang Li}, |
|
title = {BELLE: Bloom-Enhanced Large Language model Engine }, |
|
year = {2023}, |
|
publisher = {GitHub}, |
|
journal = {GitHub repository}, |
|
howpublished = {\url{https://github.com/LianjiaTech/BELLE}}, |
|
} |
|
``` |
|
|
|
Cite the original LLaMa, Stanford Alpaca and Self-Instruct papers as well! |