Llama-3.2-1B-ultrachat200k
Model Details
- Model type: sft model
- License: llama3.2
- Finetuned from model: meta-llama/Llama-3.2-1B
- Training data: HuggingFaceH4/ultrachat_200k
- Training framework: trl
Training Details
Training Hyperparameters
attn_implementation
: flash_attention_2 bf16
: True learning_rate
: 2e-5 lr_scheduler_type
: cosine per_device_train_batch_size
: 2 gradient_accumulation_steps
: 16 torch_dtype
: bfloat16 num_train_epochs
: 1 max_seq_length
: 2048 warmup_ratio
: 0.1
Results
init_train_loss
: 1.726 final_train_loss
: 1.22 \
Training script
import multiprocessing
from datasets import load_dataset
from tqdm.rich import tqdm
from transformers import AutoTokenizer, AutoModelForCausalLM
from trl import (
ModelConfig,
SFTTrainer,
get_peft_config,
get_quantization_config,
get_kbit_device_map,
SFTConfig,
ScriptArguments,
TrlParser
)
tqdm.pandas()
if __name__ == "__main__":
parser = TrlParser((ScriptArguments, SFTConfig, ModelConfig))
args, training_args, model_config = parser.parse_args_and_config()
quantization_config = get_quantization_config(model_config)
model_kwargs = dict(
revision=model_config.model_revision,
trust_remote_code=model_config.trust_remote_code,
attn_implementation=model_config.attn_implementation,
torch_dtype=model_config.torch_dtype,
use_cache=False if training_args.gradient_checkpointing else True,
device_map=get_kbit_device_map() if quantization_config is not None else None,
quantization_config=quantization_config,
)
model = AutoModelForCausalLM.from_pretrained(model_config.model_name_or_path,
**model_kwargs)
tokenizer = AutoTokenizer.from_pretrained(
model_config.model_name_or_path, trust_remote_code=model_config.trust_remote_code, use_fast=True
)
tokenizer.pad_token = '<|end_of_text|>'
train_dataset = load_dataset(args.dataset_name,
split=args.dataset_train_split,
num_proc=multiprocessing.cpu_count())
trainer = SFTTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
processing_class=tokenizer,
peft_config=get_peft_config(model_config),
)
trainer.train()
trainer.save_model(training_args.output_dir)
Test Script
from vllm import LLM
from datasets import load_dataset
from vllm.sampling_params import SamplingParams
from transformers import AutoTokenizer
MODEL_PATH = "autodl-tmp/saves/Llama-3.2-1B-ultrachat200k"
model = LLM(MODEL_PATH,
tensor_parallel_size=1,
dtype='bfloat16')
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
input = tokenizer.apply_chat_template([{"role": "user", "content": "Where is Harbin?"}],
tokenize=False,
add_generation_prompt=True)
sampling_params = SamplingParams(max_tokens=1024,
temperature=0.7,
logprobs=1,
stop_token_ids=[tokenizer.eos_token_id])
vllm_generations = model.generate(input,
sampling_params)
print(vllm_generations[0].outputs[0].text)
# print result: Harbin is located in northeastern China in the Heilongjiang province. It is the capital of Heilongjiang province in the Northeast Asia.
- Downloads last month
- 6
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for AIR-hl/Llama-3.2-1B-ultrachat200k
Base model
meta-llama/Llama-3.2-1B