|
--- |
|
library_name: transformers |
|
datasets: |
|
- nooynoos/M.O.M_Dataset_GemmaSprint |
|
language: |
|
- ko |
|
base_model: |
|
- unsloth/gemma-2-2b |
|
--- |
|
|
|
# **Model Card for gemma2-2b-M.O.M-gemma-sprint |
|
|
|
This model is fine-tuned from the base `google/gemma-2-2b-it` model using the M.O.M dataset. |
|
|
|
|
|
# What is M.O.M Project? |
|
The Motivational Organizer & Mentor (M.O.M.) project is designed to replicate the familiar and persistent encouragement that a caring parent might provide. |
|
By leveraging large language models (LLMs), M.O.M. delivers timely reminders, motivational "nags," and personalized feedback to keep users focused and productive. |
|
This service helps users manage their tasks by offering gentle yet persistent nudges, task prioritization, and empathetic guidance, ultimately reducing procrastination and boosting accountability. |
|
|
|
|
|
## Model Details |
|
|
|
### **Model Description** |
|
M.O.M model uses the "nagging mom" concept to provide motivation to users through warm but persistent reminders based on the tasks they need to accomplish. |
|
|
|
This model receives keywords representing the user's daily tasks and turns them into motivational messages delivered in the tone of a loving yet slightly exasperated mother. |
|
The model skillfully weaves four provided keywords into a cohesive story, ensuring the tone is warm while also urging the user to take action. |
|
|
|
Key Features: |
|
|
|
* Input: Keywords representing the tasks the user needs to do. |
|
* Output: Motivational "mom nagging" messages. |
|
* Tone: Warm but persistently urging action. |
|
* Purpose: To motivate users to manage their time effectively and take responsibility for their tasks. |
|
|
|
This model helps users stop procrastinating by giving them structured yet loving reminders, encouraging them to be more productive in their daily lives. |
|
|
|
### **Training Procedure** |
|
|
|
**Make Q/A Pairs** |
|
|
|
First, to fine-tune the Gemma 2b model as M.O.M, a Q/A Pair dataset is required. |
|
Typically, a QA Pair dataset can either be manually created or generated by prompting a good model with clear instructions. |
|
In my case, I used Prompt Engineering to create 600 Q/A Pairs based on examples I crafted myself. |
|
The resulting Q/A Pair dataset can be found at: nooynoos/M.O.M_Dataset_GemmaSprint. |
|
|
|
Below is the code for generating the Q/A Pairs. |
|
|
|
```python |
|
import json |
|
from langchain_openai import ChatOpenAI |
|
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler |
|
from langchain_core.prompts import PromptTemplate |
|
|
|
# Set OpenAI API key |
|
openai_api_key = "" # Enter your API key here |
|
|
|
# Define the prompt template |
|
prompt = PromptTemplate.from_template( |
|
"""너는 지구에서 자녀를 가장 사랑하지만, 잔소리가 정말 많은 엄마야. |
|
키워드는 20대 청년이 일상생활에서 해야하는 일을 적어주면 돼. |
|
그 키워드에 맞춰 엄마가 사랑스럽지만 약간 짜증난 듯한 잔소리로 동기부여해주는 답변을 작성해줘. |
|
엄마의 잔소리는 4개의 키워드를 연결된 스토리로 자연스럽게 포함해야 해. |
|
잔소리는 따뜻하지만 꾸준히 행동을 촉구하는 톤으로 작성되어야 하고. 다음 형식에 따라 답변을 생성해줘!: |
|
|
|
Format에 맞춰서, 새로운 키워드와 함께 새로운 QA PAIR 5개를 생성해주면 돼. |
|
|
|
#Format: |
|
```json |
|
{{ |
|
"QUESTION": "미소사 과제, 코딩 공부", |
|
"ANSWER": "미소사 과제랑 코딩 공부 둘 다 언제 할 거야? 과제는 끝낼 기미도 안 보이고, 코딩은 시작도 안 했잖아! 하루 종일 핸드폰만 만지작거릴 게 아니라, 그 시간에 차라리 코딩이라도 조금씩 해둬. 그리고 과제도 미리미리 해놔야 나중에 안 힘들지! 너 과제 몰아서 하다가 밤샘할까 봐 걱정돼 죽겠네." |
|
}}, |
|
{{ |
|
"QUESTION": "방 정리, 자기소개서 작성", |
|
"ANSWER": "방이 이렇게 어질러져 있으면 네 생각도 정리가 안 될 거야! 빨리 방부터 치우고, 자기소개서나 좀 써! 마감은 얼마 안 남았는데, 네 방 상태랑 자소서 상태가 똑같아 보인다, 진짜. 방금 치우고 자기소개서 조금씩 쓰면 마음도 더 가벼워질 거야." |
|
}}, |
|
{{ |
|
"QUESTION": "Cousera 강의, LLM Fine Tuning", |
|
"ANSWER": "Cousera 강의 얼른 들어야지. 이거 마감 얼마 남지 않았잖아! Cousera 강의 빠르게 마무리 해야, LLM Fine Tuning까지 마무리 할 수 있지 않겠어? 조금 더 집중해서 빨리 해!" |
|
}} |
|
``` |
|
""" |
|
) |
|
|
|
# Custom JSON parser function |
|
def custom_json_parser(response): |
|
json_string = response.content.strip().removeprefix("```json\n").removesuffix("\n```").strip() |
|
json_string = f'[{json_string}]' |
|
return json.loads(json_string) |
|
|
|
# Configure the chain |
|
chain = ( |
|
prompt |
|
| ChatOpenAI( |
|
model="gpt-4o", |
|
temperature=0, |
|
streaming=True, |
|
callbacks=[StreamingStdOutCallbackHandler()], |
|
openai_api_key=openai_api_key # Use the API key set directly |
|
) |
|
| custom_json_parser |
|
) |
|
|
|
# List to store QA pairs |
|
qa_pairs = [] |
|
|
|
# Repeat 60 times to generate a total of 300 QA pairs |
|
for i in range(1): |
|
response = chain.invoke({"domain": "AI", "num_questions": "3"}) |
|
# Add the results to qa_pairs |
|
qa_pairs.extend(response) |
|
|
|
# Finally, 300 QA pairs are stored in the qa_pairs list. |
|
print(f"A total of {len(qa_pairs)} QA pairs have been generated.") |
|
|
|
``` |
|
|
|
And save this dataset as a jsonl file. |
|
|
|
```python |
|
from datasets import load_dataset |
|
|
|
# Path to the JSONL file |
|
jsonl_file = "qa_pair.jsonl" |
|
|
|
# Load the JSONL file as a Dataset |
|
dataset = load_dataset("json", data_files=jsonl_file) |
|
|
|
# Save the QA pairs to a JSONL file |
|
``` |
|
|
|
**Loading/Preparing Training Data** |
|
|
|
The dataset uploaded to HuggingFace is loaded, and a function is applied to split it into Instruction and Response. |
|
|
|
```python |
|
from datasets import load_dataset |
|
|
|
# EOS_TOKEN is the token that indicates the end of a sentence. This token must be added. |
|
EOS_TOKEN = tokenizer.eos_token |
|
|
|
# Function to format instructions using AlpacaPrompt. |
|
alpaca_prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request. |
|
|
|
### Instruction: |
|
{} |
|
|
|
### Response: |
|
{}""" |
|
|
|
# Function to format the given examples. |
|
def formatting_prompts_func(examples): |
|
instructions = examples["instruction"] # Get the instructions. |
|
outputs = examples["output"] # Get the outputs. |
|
texts = [] # List to store the formatted texts. |
|
for instruction, output in zip(instructions, outputs): |
|
# The EOS_TOKEN must be added; otherwise, generation may continue indefinitely. |
|
text = alpaca_prompt.format(instruction, output) + EOS_TOKEN |
|
texts.append(text) |
|
return { |
|
"text": texts, # Return the formatted texts. |
|
} |
|
|
|
# Load the dataset from the specified source. |
|
dataset = load_dataset("nooynoos/M.O.M_Dataset_GemmaSprint", split="train") |
|
|
|
# Apply the formatting_prompts_func to the dataset with batch processing enabled. |
|
dataset = dataset.map( |
|
formatting_prompts_func, |
|
batched=True, |
|
) |
|
|
|
``` |
|
|
|
**Training the Model** |
|
|
|
**Unsloth** |
|
|
|
Fine-tune using Unsloth. |
|
The reason for using Unsloth is that it supports 16-bit LoRA or 4-bit QLoRA, which allows for faster fine-tuning speeds. |
|
|
|
First, use the FastLanguageModel.from_pretrained function to load the pre-trained Gemma 2-2b model. |
|
|
|
```python |
|
from unsloth import FastLanguageModel |
|
import torch |
|
|
|
max_seq_length = 1024 # Set the maximum sequence length |
|
dtype = None |
|
# Use 4-bit quantization to reduce memory usage |
|
load_in_4bit = True |
|
|
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name = "unsloth/gemma-2-2b", |
|
max_seq_length = max_seq_length, |
|
dtype = dtype, |
|
load_in_4bit = load_in_4bit, |
|
# token = "hf_...", # Use if working with gated models like meta-llama/Llama-2-7b-hf |
|
) |
|
``` |
|
|
|
Additionally, use the LoRA adapter to update only 1–10% of all parameters. |
|
|
|
```python |
|
model = FastLanguageModel.get_peft_model( |
|
model, |
|
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128 |
|
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", |
|
"gate_proj", "up_proj", "down_proj",], |
|
lora_alpha = 16, |
|
lora_dropout = 0, # Supports any, but = 0 is optimized |
|
bias = "none", # Supports any, but = "none" is optimized |
|
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes! |
|
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context |
|
random_state = 3407, |
|
use_rslora = False, # We support rank stabilized LoRA |
|
loftq_config = None, # And LoftQ |
|
) |
|
``` |
|
|
|
**Training the Model** |
|
|
|
Train the model. If you want to reduce VRAM usage, you can adjust the batch size. |
|
|
|
```python |
|
from trl import SFTTrainer |
|
from transformers import TrainingArguments |
|
from unsloth import is_bfloat16_supported |
|
|
|
trainer = SFTTrainer( |
|
model = model, |
|
tokenizer = tokenizer, |
|
train_dataset = dataset, |
|
dataset_text_field = "text", |
|
max_seq_length = max_seq_length, |
|
dataset_num_proc = 2, |
|
packing = False, # Can make training 5x faster for short sequences. |
|
args = TrainingArguments( |
|
per_device_train_batch_size = 2, |
|
gradient_accumulation_steps = 4, |
|
warmup_steps = 5, |
|
# num_train_epochs = 1, # Set this for 1 full training run. |
|
max_steps = 100, |
|
learning_rate = 2e-4, |
|
fp16 = not is_bfloat16_supported(), |
|
bf16 = is_bfloat16_supported(), |
|
logging_steps = 1, |
|
optim = "adamw_8bit", |
|
weight_decay = 0.01, |
|
lr_scheduler_type = "linear", |
|
seed = 3407, |
|
output_dir = "outputs", |
|
), |
|
) |
|
|
|
trainer_stats = trainer.train() |
|
|
|
``` |
|
|
|
|
|
**Testing the Model** |
|
|
|
Let's check if it has become the 'nagging LLM' we wanted. |
|
|
|
```python |
|
from transformers import StoppingCriteria, StoppingCriteriaList |
|
|
|
class StopOnToken(StoppingCriteria): |
|
def __init__(self, stop_token_id): |
|
self.stop_token_id = stop_token_id # Initialize the stop token ID. |
|
|
|
def __call__(self, input_ids, scores, **kwargs): |
|
return ( |
|
self.stop_token_id in input_ids[0] |
|
) # Stop if the stop token ID is present in the input IDs. |
|
|
|
from transformers import TextStreamer |
|
|
|
# Set inference speed to be twice as fast using FastLanguageModel. |
|
FastLanguageModel.for_inference(model) |
|
inputs = tokenizer( |
|
[ |
|
alpaca_prompt.format( |
|
"운동, 코딩, 과제", # Instruction |
|
"", # Output - leave this blank for generation! |
|
) |
|
], |
|
return_tensors="pt", |
|
).to("cuda") |
|
|
|
text_streamer = TextStreamer(tokenizer) |
|
_ = model.generate( |
|
**inputs, |
|
streamer=text_streamer, |
|
max_new_tokens=4096, # Set the maximum number of tokens to generate. |
|
stopping_criteria=stopping_criteria # Set the criteria to stop generation. |
|
) |
|
|
|
``` |
|
|
|
The detailed results are as follows. |
|
|
|
|
|
|
|
# Save the merged model |
|
|
|
```python |
|
base_model = "unsloth/gemma-2-2b" # Base model to be merged. |
|
huggingface_token = "" # HuggingFace token. |
|
huggingface_repo = "gemma2-2b-M.O.M-gemma-sprint" # Repository to upload the model. |
|
save_method = ( |
|
"merged_16bit" # Options: "merged_4bit", "merged_4bit_forced", "merged_16bit", "lora". |
|
) |
|
model.save_pretrained_merged( |
|
base_model, |
|
tokenizer, |
|
save_method=save_method, # Set the save method to 16-bit merged. |
|
) |
|
``` |
|
# Push the merged model to the Hugging Face Hub |
|
|
|
```python |
|
merged_model.push_to_hub("Hyeonseo/gemma2-2b-it-finetuned-ko-bias-detection_merged", safe_serialization=True) |
|
|
|
# Upload to the Hub |
|
model.push_to_hub_merged( |
|
huggingface_repo, |
|
tokenizer, |
|
save_method=save_method, |
|
token=huggingface_token, |
|
) |
|
``` |
|
|
|
### **Performance** |
|
**Fine-tuned Model(gemma2-2b-M.O.M)** |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/66303ce3e1c93377db71efd5/azcPg11SSklmRRanOw83O.png) |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/66303ce3e1c93377db71efd5/N7CEm0DI5_Aq2dJNTUz1-.png) |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/66303ce3e1c93377db71efd5/PpaOZTPJJbDovgfbor_aG.png) |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/66303ce3e1c93377db71efd5/1a80wjSlcd0R39zk_5dsC.png) |
|
|
|
|
|
|
|
### **More detailed results and the code can be found at the following GitHub link.** |
|
https://github.com/nooynoos/M.O.M-Gemma-Sprint |