NousResearch
/

Nous-Hermes-2-Mistral-7B-DPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

teknium commited on Feb 21, 2024

Commit

488062b

•

1 Parent(s): 868f0aa

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -172,7 +172,7 @@ In LM-Studio, simply select the ChatML Prefix on the settings side pane:
 # Inference Code
-Here is example code using HuggingFace Transformers to inference the model (note: even in 4bit, it will require more than 24GB of VRAM)
 ```python
 # Code to inference Hermes with HF Transformers
@@ -183,9 +183,9 @@ from transformers import AutoTokenizer, AutoModelForCausalLM
 from transformers import LlamaTokenizer, MixtralForCausalLM
 import bitsandbytes, flash_attn
-tokenizer = LlamaTokenizer.from_pretrained('NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO', trust_remote_code=True)
 model = MixtralForCausalLM.from_pretrained(
-    "NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO",
     torch_dtype=torch.float16,
     device_map="auto",
     load_in_8bit=False,

 # Inference Code
+Here is example code using HuggingFace Transformers to inference the model (note: in 4bit, it will require around 5GB of VRAM)
 ```python
 # Code to inference Hermes with HF Transformers
 from transformers import LlamaTokenizer, MixtralForCausalLM
 import bitsandbytes, flash_attn
+tokenizer = LlamaTokenizer.from_pretrained('NousResearch/Nous-Hermes-2-Mistral-7B-DPO', trust_remote_code=True)
 model = MixtralForCausalLM.from_pretrained(
+    "NousResearch/Nous-Hermes-2-Mistral-7B-DPO",
     torch_dtype=torch.float16,
     device_map="auto",
     load_in_8bit=False,