dfurman
/

HermesBagel-34B-v0.1

Text Generation

NousResearch/Nous-Hermes-2-Yi-34B

jondurbin/bagel-dpo-34b-v0.2

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

dfurman commited on Jan 13

Commit

d3d56a0

•

1 Parent(s): c8c30fe

Update README.md

Files changed (1) hide show

README.md +13 -4

README.md CHANGED Viewed

@@ -42,20 +42,29 @@ dtype: bfloat16
 <summary>Setup</summary>
 ```python
-!pip install -qU transformers accelerate
-from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
 model = "dfurman/HermesBagel-34B-v0.1"
 tokenizer = AutoTokenizer.from_pretrained(model)
 model = AutoModelForCausalLM.from_pretrained(
     model,
     torch_dtype=torch.bfloat16,
     device_map="auto",
-    trust_remote_code=True,
 )
 ```

 <summary>Setup</summary>
 ```python
+!pip install -qU transformers accelerate bitsandbytes
+from transformers import (
+    AutoTokenizer,
+    AutoModelForCausalLM,
+    BitsAndBytesConfig
+)
 import torch
 model = "dfurman/HermesBagel-34B-v0.1"
+nf4_config = BitsAndBytesConfig(
+   load_in_4bit=True,
+   bnb_4bit_quant_type="nf4",
+   bnb_4bit_use_double_quant=True,
+   bnb_4bit_compute_dtype=torch.bfloat16
+)
 tokenizer = AutoTokenizer.from_pretrained(model)
 model = AutoModelForCausalLM.from_pretrained(
     model,
     torch_dtype=torch.bfloat16,
     device_map="auto",
+    quantization_config=nf4_config,
 )
 ```