bjoernp commited on
Commit
4a7bd7e
1 Parent(s): 8f6a783

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -1
README.md CHANGED
@@ -34,11 +34,42 @@ The model performs exceptionally well on writing, explanation and discussion tas
34
  - **Finetuned from:** [LeoLM/leo-hessianai-7b](https://huggingface.co/LeoLM/leo-hessianai-7b)
35
  - **Model type:** Causal decoder-only transformer language model
36
  - **Language:** English and German
37
- - **Demo:** [Continuations for 250 random prompts (TGI, 4bit nf4 quantization)](https://open-assistant.github.io/oasst-model-eval/?f=https%3A%2F%2Fraw.githubusercontent.com%2FOpen-Assistant%2Foasst-model-eval%2Fmain%2Fsampling_reports%2Foasst-sft%2F2023-08-22_OpenAssistant_llama2-70b-oasst-sft-v10_sampling_noprefix2_nf4.json%0A)
38
  - **License:** [LLAMA 2 COMMUNITY LICENSE AGREEMENT](https://huggingface.co/meta-llama/Llama-2-70b/raw/main/LICENSE.txt)
39
  - **Contact:** [LAION Discord](https://discord.com/invite/eq3cAMZtCC) or [Björn Plüster](mailto:bjoern.pl@outlook.de)
40
 
41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  ## Prompting / Prompt Template
43
 
44
  Prompt dialogue template (ChatML format):
 
34
  - **Finetuned from:** [LeoLM/leo-hessianai-7b](https://huggingface.co/LeoLM/leo-hessianai-7b)
35
  - **Model type:** Causal decoder-only transformer language model
36
  - **Language:** English and German
37
+ - **Demo:** [Web Demo]()
38
  - **License:** [LLAMA 2 COMMUNITY LICENSE AGREEMENT](https://huggingface.co/meta-llama/Llama-2-70b/raw/main/LICENSE.txt)
39
  - **Contact:** [LAION Discord](https://discord.com/invite/eq3cAMZtCC) or [Björn Plüster](mailto:bjoern.pl@outlook.de)
40
 
41
 
42
+ ## Use in 🤗Transformers
43
+ If you want faster inference using flash-attention2, you need to install these dependencies:
44
+ ```bash
45
+ pip install packaging ninja
46
+ pip install flash-attn==v2.1.1 --no-build-isolation
47
+ pip install git+https://github.com/HazyResearch/flash-attention.git@v2.1.1#subdirectory=csrc/rotary
48
+ ```
49
+ Then load the model in transformers:
50
+ ```python
51
+ from transformers import AutoModelForCausalLM, AutoTokenizer
52
+ import torch
53
+
54
+ model = AutoModelForCausalLM.from_pretrained(
55
+ "LeoLM/leo-hessianai-7b-chat",
56
+ torch_dtype=torch.float16,
57
+ trust_remote_code=True # True for flash-attn, else False
58
+ )
59
+ tokenizer = AutoTokenizer.from_pretrained("LeoLM/leo-hessianai-7b-chat")
60
+
61
+ system_prompt = """<|im_start|>system
62
+ Dies ist eine Unterhaltung zwischen einem intelligenten, hilfsbereitem KI-Assistenten und einem Nutzer.
63
+ Der Assistent gibt ausführliche, hilfreiche und ehrliche Antworten.<|im_end|>
64
+
65
+ """
66
+
67
+ prompt_format = "<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"
68
+ prompt = "Erkläre mir wie die Fahrradwegesituation in Hamburg ist."
69
+
70
+ response, history = model.chat(tokenizer, prompt_format.format(prompt=prompt), history=None)
71
+ ```
72
+
73
  ## Prompting / Prompt Template
74
 
75
  Prompt dialogue template (ChatML format):