Update README.md
Browse files
README.md
CHANGED
@@ -7,11 +7,11 @@ tags:
|
|
7 |
- tensorRT
|
8 |
- Belle
|
9 |
---
|
10 |
-
## Model Card for
|
11 |
|
12 |
lyraBelle is currently the **fastest BELLE model** available. To the best of our knowledge, it is the **first accelerated version of Belle**.
|
13 |
|
14 |
-
The inference speed of
|
15 |
|
16 |
Among its main features are:
|
17 |
|
@@ -19,6 +19,12 @@ Among its main features are:
|
|
19 |
- device: Nvidia Ampere architechture or newer (e.g A100)
|
20 |
- batch_size: compiled with dynamic batch size, max batch_size = 8
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
## Speed
|
23 |
|
24 |
### test environment
|
@@ -33,7 +39,13 @@ Among its main features are:
|
|
33 |
|
34 |
- **Repository:** [https://huggingface.co/BelleGroup/BELLE-7B-2M?clone=true]
|
35 |
|
|
|
|
|
|
|
36 |
|
|
|
|
|
|
|
37 |
|
38 |
## Uses
|
39 |
|
@@ -47,7 +59,7 @@ model_dir = "./model"
|
|
47 |
model_name = "1-gpu-fp16.h5"
|
48 |
max_output_length = 512
|
49 |
|
50 |
-
|
51 |
model = LyraBelle(model_dir, model_name, data_type, 0)
|
52 |
output_texts = model.generate(prompts, output_length=max_output_length,top_k=30, top_p=0.85, temperature=0.35, repetition_penalty=1.2, do_sample=True)
|
53 |
print(output_texts)
|
|
|
7 |
- tensorRT
|
8 |
- Belle
|
9 |
---
|
10 |
+
## Model Card for lyraBelle
|
11 |
|
12 |
lyraBelle is currently the **fastest BELLE model** available. To the best of our knowledge, it is the **first accelerated version of Belle**.
|
13 |
|
14 |
+
The inference speed of lyraBelle has achieved **10x** acceleration upon the ealry original version. We are still working hard to further improve the performance.
|
15 |
|
16 |
Among its main features are:
|
17 |
|
|
|
19 |
- device: Nvidia Ampere architechture or newer (e.g A100)
|
20 |
- batch_size: compiled with dynamic batch size, max batch_size = 8
|
21 |
|
22 |
+
Note that:
|
23 |
+
**Some interface/code were set for future uses(see demo below).**
|
24 |
+
|
25 |
+
- **int8 mode**: not supported yet, please always set it to 0
|
26 |
+
- **data type**: only `fp16` available.
|
27 |
+
|
28 |
## Speed
|
29 |
|
30 |
### test environment
|
|
|
39 |
|
40 |
- **Repository:** [https://huggingface.co/BelleGroup/BELLE-7B-2M?clone=true]
|
41 |
|
42 |
+
## Environment
|
43 |
+
|
44 |
+
- **docker image available** at [https://hub.docker.com/repository/docker/bigmoyan/lyrallm/general], pull image by:
|
45 |
|
46 |
+
```
|
47 |
+
docker pull bigmoyan/lyrallm:v0.1
|
48 |
+
```
|
49 |
|
50 |
## Uses
|
51 |
|
|
|
59 |
model_name = "1-gpu-fp16.h5"
|
60 |
max_output_length = 512
|
61 |
|
62 |
+
# int8 mode not supported, data_type only support fp16
|
63 |
model = LyraBelle(model_dir, model_name, data_type, 0)
|
64 |
output_texts = model.generate(prompts, output_length=max_output_length,top_k=30, top_p=0.85, temperature=0.35, repetition_penalty=1.2, do_sample=True)
|
65 |
print(output_texts)
|