metadata
base_model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
language:
- en
- fr
- de
- hi
- it
- pt
- es
- th
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
datasets:
- lavita/AlpaCare-MedInstruct-52k
Llama-3.1-8B AlpaCare MediInstruct
- Developed by: Svngoku
- License: apache-2.0
- Finetuned from model :
unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
- Max Context Windows :
4096
Inference with Unsloth
if True:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "Svngoku/Llama-3.1-8B-AlpaCare-MedInstruct",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
# alpaca_prompt = You MUST copy from above!
inputs = tokenizer(
[
alpaca_prompt.format(
"Write an argument emphasizing the importance of ethical considerations in medical research.", # instruction
"", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 800)
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.