English
LLM
BELLE
bigmoyan commited on
Commit
b623eca
1 Parent(s): abd81f7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -3
README.md CHANGED
@@ -7,11 +7,11 @@ tags:
7
  - tensorRT
8
  - Belle
9
  ---
10
- ## Model Card for lyraBELLE
11
 
12
  lyraBelle is currently the **fastest BELLE model** available. To the best of our knowledge, it is the **first accelerated version of Belle**.
13
 
14
- The inference speed of lyraChatGLM has achieved **10x** acceleration upon the ealry original version. We are still working hard to further improve the performance.
15
 
16
  Among its main features are:
17
 
@@ -19,6 +19,12 @@ Among its main features are:
19
  - device: Nvidia Ampere architechture or newer (e.g A100)
20
  - batch_size: compiled with dynamic batch size, max batch_size = 8
21
 
 
 
 
 
 
 
22
  ## Speed
23
 
24
  ### test environment
@@ -33,7 +39,13 @@ Among its main features are:
33
 
34
  - **Repository:** [https://huggingface.co/BelleGroup/BELLE-7B-2M?clone=true]
35
 
 
 
 
36
 
 
 
 
37
 
38
  ## Uses
39
 
@@ -47,7 +59,7 @@ model_dir = "./model"
47
  model_name = "1-gpu-fp16.h5"
48
  max_output_length = 512
49
 
50
-
51
  model = LyraBelle(model_dir, model_name, data_type, 0)
52
  output_texts = model.generate(prompts, output_length=max_output_length,top_k=30, top_p=0.85, temperature=0.35, repetition_penalty=1.2, do_sample=True)
53
  print(output_texts)
 
7
  - tensorRT
8
  - Belle
9
  ---
10
+ ## Model Card for lyraBelle
11
 
12
  lyraBelle is currently the **fastest BELLE model** available. To the best of our knowledge, it is the **first accelerated version of Belle**.
13
 
14
+ The inference speed of lyraBelle has achieved **10x** acceleration upon the ealry original version. We are still working hard to further improve the performance.
15
 
16
  Among its main features are:
17
 
 
19
  - device: Nvidia Ampere architechture or newer (e.g A100)
20
  - batch_size: compiled with dynamic batch size, max batch_size = 8
21
 
22
+ Note that:
23
+ **Some interface/code were set for future uses(see demo below).**
24
+
25
+ - **int8 mode**: not supported yet, please always set it to 0
26
+ - **data type**: only `fp16` available.
27
+
28
  ## Speed
29
 
30
  ### test environment
 
39
 
40
  - **Repository:** [https://huggingface.co/BelleGroup/BELLE-7B-2M?clone=true]
41
 
42
+ ## Environment
43
+
44
+ - **docker image available** at [https://hub.docker.com/repository/docker/bigmoyan/lyrallm/general], pull image by:
45
 
46
+ ```
47
+ docker pull bigmoyan/lyrallm:v0.1
48
+ ```
49
 
50
  ## Uses
51
 
 
59
  model_name = "1-gpu-fp16.h5"
60
  max_output_length = 512
61
 
62
+ # int8 mode not supported, data_type only support fp16
63
  model = LyraBelle(model_dir, model_name, data_type, 0)
64
  output_texts = model.generate(prompts, output_length=max_output_length,top_k=30, top_p=0.85, temperature=0.35, repetition_penalty=1.2, do_sample=True)
65
  print(output_texts)