michaelfeil commited on
Commit
80a86ec
1 Parent(s): 20ec98d

Upload OpenAssistant/stablelm-7b-sft-v7-epoch-3 ctranslate fp16 weights

Browse files
Files changed (2) hide show
  1. README.md +10 -8
  2. model.bin +2 -2
README.md CHANGED
@@ -21,15 +21,16 @@ Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on
21
 
22
  quantized version of [OpenAssistant/stablelm-7b-sft-v7-epoch-3](https://huggingface.co/OpenAssistant/stablelm-7b-sft-v7-epoch-3)
23
  ```bash
24
- pip install hf-hub-ctranslate2>=2.0.8
25
  ```
26
- Converted on 2023-05-22 using
27
  ```
28
- ct2-transformers-converter --model OpenAssistant/stablelm-7b-sft-v7-epoch-3 --output_dir /home/michael/tmp-ct2fast-stablelm-7b-sft-v7-epoch-3 --force --copy_files tokenizer.json README.md tokenizer_config.json generation_config.json special_tokens_map.json .gitattributes --quantization float16
29
  ```
30
 
31
- Checkpoint compatible to [ctranslate2>=3.13.0](https://github.com/OpenNMT/CTranslate2) and [hf-hub-ctranslate2>=2.0.6](https://github.com/michaelfeil/hf-hub-ctranslate2)
32
- - `compute_type=int8_float16` for `device="cuda"`
 
33
  - `compute_type=int8` for `device="cpu"`
34
 
35
  ```python
@@ -40,14 +41,15 @@ model_name = "michaelfeil/ct2fast-stablelm-7b-sft-v7-epoch-3"
40
  # use either TranslatorCT2fromHfHub or GeneratorCT2fromHfHub here, depending on model.
41
  model = GeneratorCT2fromHfHub(
42
  # load in int8 on CUDA
43
- model_name_or_path=model_name,
44
  device="cuda",
45
  compute_type="int8_float16",
46
  # tokenizer=AutoTokenizer.from_pretrained("OpenAssistant/stablelm-7b-sft-v7-epoch-3")
47
  )
48
  outputs = model.generate(
49
- text=["def print_hello_world():", "def hello_name(name:"],
50
- max_length=64
 
51
  )
52
  print(outputs)
53
  ```
 
21
 
22
  quantized version of [OpenAssistant/stablelm-7b-sft-v7-epoch-3](https://huggingface.co/OpenAssistant/stablelm-7b-sft-v7-epoch-3)
23
  ```bash
24
+ pip install hf-hub-ctranslate2>=2.0.8 ctranslate2>=3.14.0
25
  ```
26
+ Converted on 2023-06-02 using
27
  ```
28
+ ct2-transformers-converter --model OpenAssistant/stablelm-7b-sft-v7-epoch-3 --output_dir /home/michael/tmp-ct2fast-stablelm-7b-sft-v7-epoch-3 --force --copy_files tokenizer.json README.md tokenizer_config.json generation_config.json special_tokens_map.json .gitattributes --quantization int8_float16 --trust_remote_code
29
  ```
30
 
31
+ Checkpoint compatible to [ctranslate2>=3.14.0](https://github.com/OpenNMT/CTranslate2)
32
+ and [hf-hub-ctranslate2>=2.0.8](https://github.com/michaelfeil/hf-hub-ctranslate2)
33
+ - `compute_type=int8_float16` for `device="cuda"`
34
  - `compute_type=int8` for `device="cpu"`
35
 
36
  ```python
 
41
  # use either TranslatorCT2fromHfHub or GeneratorCT2fromHfHub here, depending on model.
42
  model = GeneratorCT2fromHfHub(
43
  # load in int8 on CUDA
44
+ model_name_or_path=model_name,
45
  device="cuda",
46
  compute_type="int8_float16",
47
  # tokenizer=AutoTokenizer.from_pretrained("OpenAssistant/stablelm-7b-sft-v7-epoch-3")
48
  )
49
  outputs = model.generate(
50
+ text=["def fibonnaci(", "User: How are you doing? Bot:"],
51
+ max_length=64,
52
+ include_prompt_in_result=False
53
  )
54
  print(outputs)
55
  ```
model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:26f928a83c8c64129b8c886e2f9dd86b86e0f7583c2cadcb7583bc0cbe3a5058
3
- size 15733850934
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c3bd76d168dcf22eaf748347ef746a59cd340a1726715d6851ec5bf51c664fda
3
+ size 7872100730