ArvindSharma18
/

Phi-3-mini-4k-instruct-bnb-4bit-Clinical-Trail-Merged

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ArvindSharma18 commited on Aug 10

Commit

ed966df

•

1 Parent(s): aa8ab9f

Update README.md

Files changed (1) hide show

README.md +6 -7

README.md CHANGED Viewed

@@ -21,18 +21,17 @@ tags:
 ```python
 from unsloth import FastLanguageModel
 import torch
-max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
-dtype = torch.float16 # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
-load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
 model, tokenizer = FastLanguageModel.from_pretrained(
-    model_name = "ArvindSharma18/Phi-3-mini-4k-instruct-bnb-4bit-Clinical-Trail-Merged", # "unsloth/mistral-7b" for 16bit loading
     max_seq_length = max_seq_length,
     dtype = dtype,
-    load_in_4bit = load_in_4bit,
-    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
 )
-FastLanguageModel.for_inference(model) # Enable native 2x faster inference
 inputs = tokenizer(
 [
     "Official Title: Randomized Trial of Usual Care vs. Specialized, Phase-specific Care for Youth at Risk for Psychosis"

 ```python
 from unsloth import FastLanguageModel
 import torch
+max_seq_length = 4096
+dtype = torch.float16
+load_in_4bit = True
 model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name = "ArvindSharma18/Phi-3-mini-4k-instruct-bnb-4bit-Clinical-Trail-Merged",
     max_seq_length = max_seq_length,
     dtype = dtype,
+    load_in_4bit = load_in_4bit
 )
+FastLanguageModel.for_inference(model)
 inputs = tokenizer(
 [
     "Official Title: Randomized Trial of Usual Care vs. Specialized, Phase-specific Care for Youth at Risk for Psychosis"