Model Card for Model ID

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: Ayan Javeed Shaikh and Srushti Sonavane
Finetuned from model: unsloth/Llama-3.2-1B-bnb-4bit

Model Sources [optional]

Mental Health Llama 3.2 - 1B ConversationalBot

Inference

from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "Ayansk11/Mental_health_Llama3.2-1B_conversationalBot",
max_seq_length = 5020,
dtype = None,
load_in_4bit = True)

Using this text to feed into model for getting the response

text="I'm going through some things with my feelings and myself. I barely sleep and I do nothing but think about how I'm worthless and how I shouldn't be here. I've never tried or contemplated suicide. I've always wanted to fix my issues, but I never get around to it. How can I change my feeling of being worthless to everyone?"

Note: Lets use the fine-tuned model for inference in order to generate responses based on mental health-related prompts !

Key Points to Note:

The model = FastLanguageModel.for_inference(model) command prepares the model specifically for inference, ensuring it is optimized for generating responses efficiently.
The input text is processed using the tokenizer, which converts it into a format suitable for the model. The data_prompt is used to structure the input text, leaving a placeholder for the model's response. Additionally, the return_tensors = "pt" argument ensures the output is in PyTorch tensor format, which is then transferred to the GPU using .to("cuda") for faster processing.
The model.generate function generates responses based on the tokenized input. Parameters like max_new_tokens = 5020 and use_cache = True enable the model to produce lengthy, coherent outputs efficiently by leveraging cached computations from prior layers.

model = FastLanguageModel.for_inference(model)
inputs = tokenizer(
[
    data_prompt.format(
        #instructions
        text,
        #answer
        "",
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 5020, use_cache = True)
answer=tokenizer.batch_decode(outputs)
answer = answer[0].split("### Response:")[-1]
print("Answer of the question is:", answer)

Ayansk11
/

Mental_health_Llama3.2-1B_conversationalBot