michaelfeil commited on
Commit
cd86bbb
1 Parent(s): ef6b889

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - ctranslate2
5
+ ---
6
+ # Fast-Inference with Ctranslate2
7
+ Speedup inference by 2x-8x using int8 inference in C++
8
+
9
+ quantized version of [google/flan-ul2](https://huggingface.co/google/flan-ul2)
10
+ ```bash
11
+ pip install hf_hub_ctranslate2>=1.0.0 ctranslate2>=3.13.0
12
+ ```
13
+
14
+
15
+ Checkpoint compatible to [ctranslate2](https://github.com/OpenNMT/CTranslate2) and [hf-hub-ctranslate2](https://github.com/michaelfeil/hf-hub-ctranslate2)
16
+ - `compute_type=int8_float16` for `device="cuda"`
17
+ - `compute_type=int8` for `device="cuda"`
18
+
19
+ ```python
20
+ from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
21
+
22
+ model_name = "michaelfeil/ct2fast-flan-ul2"
23
+ model = TranslatorCT2fromHfHub(
24
+ # load in int8 on CUDA
25
+ model_name_or_path=model_name,
26
+ device="cuda",
27
+ compute_type="int8_float16"
28
+ )
29
+ outputs = model.generate(
30
+ text=["How do you call a fast Flan-ingo?", "Translate to german: How are you doing?"],
31
+ min_decoding_length=24,
32
+ max_decoding_length=32,
33
+ max_input_length=512,
34
+ beam_size=5
35
+ )
36
+ print(outputs)
37
+ ```
38
+
39
+ # Licence and other remarks:
40
+ This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.