--- language: - en datasets: - natural_instructions - the_pile - cot - Muennighoff/P3 tags: - ctranslate2 - int8 - float16 - gpt pipeline_tag: text-generation inference: parameters: temperature: 0.1 widget: - text: "Is this review positive or negative? Review: Best cast iron skillet you will ever buy. Answer:" example_title: "Sentiment analysis" - text: "Where is Zurich? Ans:" example_title: "Question Answering" --- # # Fast-Inference with Ctranslate2 Speedup inference by 2x-8x using int8 inference in C++ quantized version of [togethercomputer/GPT-JT-6B-v0](https://huggingface.co/togethercomputer/GPT-JT-6B-v0) ```bash pip install hf-hub-ctranslate2>=2.0.6 ctranslate2>=3.13.0 ``` Converted on 2023-05-19 using ``` ct2-transformers-converter --model togethercomputer/GPT-JT-6B-v0 --output_dir /home/michael/tmp-ct2fast-GPT-JT-6B-v0 --force --copy_files merges.txt tokenizer.json README.md tokenizer_config.json vocab.json special_tokens_map.json added_tokens.json .gitattributes --quantization float16 ``` Checkpoint compatible to [ctranslate2](https://github.com/OpenNMT/CTranslate2) and [hf-hub-ctranslate2](https://github.com/michaelfeil/hf-hub-ctranslate2) - `compute_type=int8_float16` for `device="cuda"` - `compute_type=int8` for `device="cpu"` ```python from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub from transformers import AutoTokenizer model_name = "michaelfeil/ct2fast-GPT-JT-6B-v0" # use either TranslatorCT2fromHfHub or GeneratorCT2fromHfHub here, depending on model. model = GeneratorCT2fromHfHub( # load in int8 on CUDA model_name_or_path=model_name, device="cuda", compute_type="int8_float16", tokenizer=AutoTokenizer.from_pretrained("togethercomputer/GPT-JT-6B-v0") ) outputs = model.generate( text=["How do you call a fast Flan-ingo?", "User: How are you doing?"], ) print(outputs) ``` # Licence and other remarks: This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo. # Original description # Quick Start ```python from transformers import pipeline pipe = pipeline(model='togethercomputer/GPT-JT-6B-v0') pipe("Where is Zurich? Ans:") ```