Update README.md
Browse files
README.md
CHANGED
@@ -3,7 +3,10 @@
|
|
3 |
<a href="" target="_blank"><img src="https://github.com/zjunlp/CaMA/blob/main/assets/logo.jpg?raw=true" alt="ZJU-CaMA" style="width: 30%; min-width: 30px; display: block; margin: auto;"></a>
|
4 |
</p>
|
5 |
|
|
|
6 |
> This is the result of the weight difference between `Llama 13B` and `CaMA-13B`. You can click [here](https://github.com/zjunlp/cama) to learn more.
|
|
|
|
|
7 |
# CaMA: A Chinese-English Bilingual LLaMA Model
|
8 |
|
9 |
With the birth of ChatGPT, artificial intelligence has also entered the "iPhone moment," where various large language models (LLMs) have sprung up like mushrooms. The wave of these large models has quickly swept through artificial intelligence fields beyond natural language processing. However, training such a model requires extremely high hardware costs, and open-source language models are scarce due to various reasons, making Chinese language models even more scarce. It wasn't until the open-sourcing of LLaMA that a variety of language models based on LLaMA started to emerge. This project is also based on the LLaMA model. To further enhance Chinese language capabilities without compromising its original language distribution, we first <b>(1) perform additional pre-training on LLaMA (13B) using Chinese corpora, aiming to improve the model's Chinese comprehension and knowledge base while preserving its original English and code abilities to the greatest extent possible;</b> then, <b>(2) we fine-tune the model from the first step using an instruction dataset to enhance the language model's understanding of human instructions.</b>
|
@@ -193,7 +196,7 @@ Our pre-trained model has demonstrated certain abilities in instruction followin
|
|
193 |
The effectiveness of information extraction is illustrated in the following figure. We tested different instructions for different tasks as well as the same instructions for the same task, and achieved good results for all of them.
|
194 |
|
195 |
<p align="center" width="100%">
|
196 |
-
<a href="" target="_blank"><img src="https://github.com/zjunlp/CaMA/blob/main/assets/ie-case.jpg" alt="IE" style="width: 60%; min-width: 60px; display: block; margin: auto;"></a>
|
197 |
</p>
|
198 |
|
199 |
|
@@ -463,7 +466,7 @@ We offer two methods: the first one is **command-line interaction**, and the sec
|
|
463 |
```
|
464 |
Here is a screenshot of the web-based interaction:
|
465 |
<p align="center" width="100%">
|
466 |
-
<a href="" target="_blank"><img src="https://github.com/zjunlp/CaMA/blob/main/assets/finetune_web.jpg" alt="finetune-web" style="width: 100%; min-width: 100px; display: block; margin: auto;"></a>
|
467 |
</p>
|
468 |
|
469 |
**3. Usage of Instruction tuning Model**
|
@@ -476,7 +479,7 @@ python examples/generate_lora_web.py --base_model ./CaMA --lora_weights ./LoRA
|
|
476 |
|
477 |
Here is a screenshot of the web-based interaction:
|
478 |
<p align="center" width="100%">
|
479 |
-
<a href="" target="_blank"><img src="https://github.com/zjunlp/CaMA/blob/main/assets/lora_web.png" alt="finetune-web" style="width: 100%; min-width: 100px; display: block; margin: auto;"></a>
|
480 |
</p>
|
481 |
|
482 |
The `instruction` is a required parameter, while `input` is an optional parameter. For general tasks (such as the examples provided in section `1.3`), you can directly enter the input in the `instruction` field. For information extraction tasks (as shown in the example in section `1.2`), please enter the instruction in the `instruction` field and the sentence to be extracted in the `input` field. We provide an information extraction prompt in section `2.5`.
|
@@ -499,7 +502,7 @@ For information extraction tasks such as named entity recognition (NER), event e
|
|
499 |
>
|
500 |
> (2) Instruction tuning stage using LoRA. This stage enables the model to understand human instructions and generate appropriate responses.
|
501 |
|
502 |
-
![](https://github.com/zjunlp/CaMA/blob/main/assets/main.jpg)
|
503 |
|
504 |
<h3 id="3-1">3.1 Dataset Construction (Pretraining)</h3>
|
505 |
|
|
|
3 |
<a href="" target="_blank"><img src="https://github.com/zjunlp/CaMA/blob/main/assets/logo.jpg?raw=true" alt="ZJU-CaMA" style="width: 30%; min-width: 30px; display: block; margin: auto;"></a>
|
4 |
</p>
|
5 |
|
6 |
+
|
7 |
> This is the result of the weight difference between `Llama 13B` and `CaMA-13B`. You can click [here](https://github.com/zjunlp/cama) to learn more.
|
8 |
+
|
9 |
+
|
10 |
# CaMA: A Chinese-English Bilingual LLaMA Model
|
11 |
|
12 |
With the birth of ChatGPT, artificial intelligence has also entered the "iPhone moment," where various large language models (LLMs) have sprung up like mushrooms. The wave of these large models has quickly swept through artificial intelligence fields beyond natural language processing. However, training such a model requires extremely high hardware costs, and open-source language models are scarce due to various reasons, making Chinese language models even more scarce. It wasn't until the open-sourcing of LLaMA that a variety of language models based on LLaMA started to emerge. This project is also based on the LLaMA model. To further enhance Chinese language capabilities without compromising its original language distribution, we first <b>(1) perform additional pre-training on LLaMA (13B) using Chinese corpora, aiming to improve the model's Chinese comprehension and knowledge base while preserving its original English and code abilities to the greatest extent possible;</b> then, <b>(2) we fine-tune the model from the first step using an instruction dataset to enhance the language model's understanding of human instructions.</b>
|
|
|
196 |
The effectiveness of information extraction is illustrated in the following figure. We tested different instructions for different tasks as well as the same instructions for the same task, and achieved good results for all of them.
|
197 |
|
198 |
<p align="center" width="100%">
|
199 |
+
<a href="" target="_blank"><img src="https://github.com/zjunlp/CaMA/blob/main/assets/ie-case.jpg?raw=true" alt="IE" style="width: 60%; min-width: 60px; display: block; margin: auto;"></a>
|
200 |
</p>
|
201 |
|
202 |
|
|
|
466 |
```
|
467 |
Here is a screenshot of the web-based interaction:
|
468 |
<p align="center" width="100%">
|
469 |
+
<a href="" target="_blank"><img src="https://github.com/zjunlp/CaMA/blob/main/assets/finetune_web.jpg?raw=true" alt="finetune-web" style="width: 100%; min-width: 100px; display: block; margin: auto;"></a>
|
470 |
</p>
|
471 |
|
472 |
**3. Usage of Instruction tuning Model**
|
|
|
479 |
|
480 |
Here is a screenshot of the web-based interaction:
|
481 |
<p align="center" width="100%">
|
482 |
+
<a href="" target="_blank"><img src="https://github.com/zjunlp/CaMA/blob/main/assets/lora_web.png?raw=true" alt="finetune-web" style="width: 100%; min-width: 100px; display: block; margin: auto;"></a>
|
483 |
</p>
|
484 |
|
485 |
The `instruction` is a required parameter, while `input` is an optional parameter. For general tasks (such as the examples provided in section `1.3`), you can directly enter the input in the `instruction` field. For information extraction tasks (as shown in the example in section `1.2`), please enter the instruction in the `instruction` field and the sentence to be extracted in the `input` field. We provide an information extraction prompt in section `2.5`.
|
|
|
502 |
>
|
503 |
> (2) Instruction tuning stage using LoRA. This stage enables the model to understand human instructions and generate appropriate responses.
|
504 |
|
505 |
+
![](https://github.com/zjunlp/CaMA/blob/main/assets/main.jpg?raw=true)
|
506 |
|
507 |
<h3 id="3-1">3.1 Dataset Construction (Pretraining)</h3>
|
508 |
|