metadata

base_model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
language:
  - en
  - fr
  - de
  - hi
  - it
  - pt
  - es
  - th
license: apache-2.0
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - llama
  - trl
datasets:
  - lavita/AlpaCare-MedInstruct-52k

Llama-3.1-8B AlpaCare MediInstruct

Developed by: Svngoku
License: apache-2.0
Finetuned from model : unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
Max Context Windows : 4096

Inference with Unsloth

if True:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "Svngoku/Llama-3.1-8B-AlpaCare-MedInstruct",
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

# alpaca_prompt = You MUST copy from above!

inputs = tokenizer(
[
    alpaca_prompt.format(
        "Write an argument emphasizing the importance of ethical considerations in medical research.", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 800)

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.