notbdq commited on
Commit
509e804
1 Parent(s): c84d2a7

Add 4-bit quantization and automatic device mapping for improved performance.

Browse files

Merhabalar, öncelikle tebrik ederim mükemmel bir çalışma olmuş, pull request olarak readme’e inference için 4 bit quantization ve modeli sistemdeki tüm ekran kartlarına ve rama otomatik yükleme kodu ekledim bu sayede kullanıcılar performans azalmadan daha hızlı ve verimli bir şekilde kullanabilirler.

Files changed (1) hide show
  1. README.md +41 -0
README.md CHANGED
@@ -63,3 +63,44 @@ generated_ids = model.generate(model_inputs,
63
  decoded = tokenizer.batch_decode(generated_ids)
64
  print(decoded[0])
65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  decoded = tokenizer.batch_decode(generated_ids)
64
  print(decoded[0])
65
 
66
+ ```
67
+
68
+ # 4-bit Quantized Inference
69
+
70
+ ```python
71
+
72
+ # pip install bitsandbytes accelerate
73
+
74
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
75
+ import torch
76
+
77
+ quantization_config = BitsAndBytesConfig(
78
+ load_in_4bit=True,
79
+ bnb_4bit_quant_type="nf4",
80
+ bnb_4bit_use_double_quant=True,
81
+ bnb_4bit_compute_dtype=torch.float16 # or torch.bfloat16
82
+ )
83
+
84
+ model = AutoModelForCausalLM.from_pretrained("TURKCELL/Turkcell-LLM-7b-v1", device_map="auto", quantization_config=quantization_config)
85
+ tokenizer = AutoTokenizer.from_pretrained("TURKCELL/Turkcell-LLM-7b-v1")
86
+
87
+ messages = [
88
+ {"role": "user", "content": "Türkiye'nin başkenti neresidir?"},
89
+ ]
90
+
91
+ encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
92
+
93
+ eos_token = tokenizer("<|im_end|>",add_special_tokens=False)["input_ids"][0]
94
+
95
+ device = "cuda"
96
+ model_inputs = encodeds.to(device)
97
+
98
+ generated_ids = model.generate(model_inputs,
99
+ max_new_tokens=1024,
100
+ do_sample=True,
101
+ eos_token_id=eos_token)
102
+
103
+ decoded = tokenizer.batch_decode(generated_ids)
104
+ print(decoded[0])
105
+
106
+ ```