--- language: - en - de - fr - it - pt - hi - es - th - zh - ko - ja license: llama3.2 pipeline_tag: text-generation tags: - roleplay - llama3 - sillytavern - idol - facebook - meta - pytorch - llama - llama-3 extra_gated_fields: First Name: text Last Name: text Date of birth: date_picker Country: country Affiliation: text Job title: type: select options: - Student - Research Graduate - AI researcher - AI developer/engineer - Reporter - Other --- # Llama-3.2-3B-Instruct-lyrics  ## Model Information - write song on my phone ## Model Song (SUNO) - https://suno.com/@aifeifei ## virtual idol Twitter - https://x.com/aifeifei799 ## Datasets credits: - Koyd111/alpaca-hiphop-lyrics ## Prompt: write a song. song name: Style of Music(five components: genre, instrument, mood, gender, and timbre. ): [Intro] [Verse 1] [Chorus] [Verse 2] [Chorus] [Bridge] [Chorus] [Outro] ## Program: - unsloth ### Test Prograpm ```python from unsloth import FastLanguageModel from unsloth import PatchFastRL PatchFastRL("GRPO", FastLanguageModel) import torch max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally! dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+ load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False. alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {} ### Input: {} ### Response: {}""" model, tokenizer = FastLanguageModel.from_pretrained( model_name = "./Llama-3.2-3B-Instruct-lyrics", # or choose "unsloth/Llama-3.2-1B-Instruct" max_seq_length = max_seq_length, dtype = dtype, # load_in_4bit = load_in_4bit, # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf ) # alpaca_prompt = You MUST copy from above! inputs = tokenizer( [ alpaca_prompt.format( "", # instruction """ new write a pop love song. song name: Style of Music(five components: genre, instrument, mood, gender, and timbre. ): [Intro] [Verse 1] [Chorus] [Verse 2] [Chorus] [Bridge] [Chorus] [Outro] """, # input "", # output - leave this blank for generation! ) ], return_tensors = "pt").to("cuda") from transformers import TextStreamer text_streamer = TextStreamer(tokenizer) _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 1024) ``` ### Trainer Prograpm ```python from unsloth import FastLanguageModel import torch max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally! dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+ load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False. # 4bit pre quantized models we support for 4x faster downloading + no OOMs. fourbit_models = [ "unsloth/Meta-Llama-3.1-8B-bnb-4bit", # Llama-3.1 2x faster "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit", "unsloth/Meta-Llama-3.1-70B-bnb-4bit", "unsloth/Meta-Llama-3.1-405B-bnb-4bit", # 4bit for 405b! "unsloth/Mistral-Small-Instruct-2409", # Mistral 22b 2x faster! "unsloth/mistral-7b-instruct-v0.3-bnb-4bit", "unsloth/Phi-3.5-mini-instruct", # Phi-3.5 2x faster! "unsloth/Phi-3-medium-4k-instruct", "unsloth/gemma-2-9b-bnb-4bit", "unsloth/gemma-2-27b-bnb-4bit", # Gemma 2x faster! "unsloth/Llama-3.2-1B-bnb-4bit", # NEW! Llama 3.2 models "unsloth/Llama-3.2-1B-Instruct-bnb-4bit", "unsloth/Llama-3.2-3B-bnb-4bit", "unsloth/Llama-3.2-3B-Instruct-bnb-4bit", "unsloth/Llama-3.3-70B-Instruct-bnb-4bit" # NEW! Llama 3.3 70B! ] # More models at https://huggingface.co/unsloth model, tokenizer = FastLanguageModel.from_pretrained( model_name = "./Llama-3.2-3B-Instruct-unsloth-bnb-4bit", # or choose "unsloth/Llama-3.2-1B-Instruct" max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf ) model = FastLanguageModel.get_peft_model( model, r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128 target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",], lora_alpha = 16, lora_dropout = 0, # Supports any, but = 0 is optimized bias = "none", # Supports any, but = "none" is optimized # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes! use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context random_state = 3407, use_rslora = False, # We support rank stabilized LoRA loftq_config = None, # And LoftQ ) from unsloth.chat_templates import get_chat_template tokenizer = get_chat_template( tokenizer, chat_template = "llama-3.2", ) alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {} ### Input: {} ### Response: {}""" EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN def formatting_prompts_func(examples): instructions = examples["instruction"] inputs = examples["input"] outputs = examples["output"] texts = [] for instruction, input, output in zip(instructions, inputs, outputs): # Must add EOS_TOKEN, otherwise your generation will go on forever! text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN texts.append(text) return { "text" : texts, } pass from datasets import load_dataset dataset = load_dataset("Koyd111/alpaca-hiphop-lyrics", split = "train") dataset = dataset.map(formatting_prompts_func, batched = True,) print(dataset[5]) from trl import SFTTrainer from transformers import TrainingArguments, DataCollatorForSeq2Seq from unsloth import is_bfloat16_supported trainer = SFTTrainer( model = model, tokenizer = tokenizer, train_dataset = dataset, dataset_text_field = "text", max_seq_length = max_seq_length, dataset_num_proc = 16, packing = False, # Can make training 5x faster for short sequences. args = TrainingArguments( per_device_train_batch_size = 2, gradient_accumulation_steps = 4, warmup_steps = 5, # num_train_epochs = 1, # Set this for 1 full training run. max_steps = 60, learning_rate = 2e-4, fp16 = not is_bfloat16_supported(), bf16 = is_bfloat16_supported(), logging_steps = 1, optim = "adamw_8bit", weight_decay = 0.01, lr_scheduler_type = "linear", seed = 3407, output_dir = "outputs", report_to = "none", # Use this for WandB etc ), ) trainer_stats = trainer.train() # alpaca_prompt = Copied from above FastLanguageModel.for_inference(model) # Enable native 2x faster inference inputs = tokenizer( [ alpaca_prompt.format( "Continue the fibonnaci sequence.", # instruction "1, 1, 2, 3, 5, 8", # input "", # output - leave this blank for generation! ) ], return_tensors = "pt").to("cuda") # alpaca_prompt = Copied from above FastLanguageModel.for_inference(model) # Enable native 2x faster inference inputs = tokenizer( [ alpaca_prompt.format( "Continue the fibonnaci sequence.", # instruction "1, 1, 2, 3, 5, 8", # input "", # output - leave this blank for generation! ) ], return_tensors = "pt").to("cuda") from transformers import TextStreamer text_streamer = TextStreamer(tokenizer) _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128) outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True) tokenizer.batch_decode(outputs) model.save_pretrained("Llama-3.2-3B-Instruct-lyrics-lora") # Local saving tokenizer.save_pretrained("Llama-3.2-3B-Instruct-lyrics-lora") model.save_pretrained_merged("Llama-3.2-3B-Instruct-lyrics", tokenizer) ``` ## Questions - This model is solely for learning and testing purposes, and errors in output are inevitable. We do not take responsibility for the output results. If the output content is to be used, it must be modified; if not modified, we will assume it has been altered. - For commercial licensing, please refer to the Llama 3.2 agreement. # Llama-3.2-3B-Instruct-unsloth-bnb-4bit Information
See our collection for versions of Llama 3.2 including GGUF & 4-bit formats.
Unsloth's Dynamic 4-bit Quants is selectively quantized, greatly improving accuracy over standard 4-bit.