metadata

datasets:
  - danielpark/gorani-100k-llama2-13b-instruct
language:
  - en
library_name: transformers
pipeline_tag: text-generation

Project is on process. Do not use weight and dataset.

Status: 19.7k check point weights open, waiting for the results on the LLM leaderboard.

GORANI 100k

Model: danielpark/gorani-100k-llama2-13b-instruct
Dataset: TEAMGORANI/gorani-100k

Template

I use llama2-13b with LFM, but I have used it without a default system message. If a system message is specified in some datasets, I use that content.

### System:
{System}

### User:
{New_User_Input}

### Input:
{New User Input}

### Response:
{New_Assistant_Answer}

Caution

The model weights and dataset have not been properly curated yet and are strictly prohibited for use under any license. In relation to this, the developers do not assume any responsibility, either implicitly or explicitly.

Updates

Revision	Commit Hash	Updated	Train Process	Status
Revision 1	6d30494fa8da84128499d55075eef57094336d03	23.10.04	19740/100000	On Training

Training Plan

After checking the performance on the open LLM leaderboard for the 19.7k model, proceed with the following process
Compare max sequence length 512 and 1024 (experiment with a 10k model).
Implementation of the content similar to the llama2 paper, which is more than 20 times slower than the initial stage.
Code modification using flash attention 2.
Dataset refinement and adding hash for freezing.

Revision Infomations

Revision 1: 6d30494fa8da84128499d55075eef57094336d03

max_seq_length = 2048, partially modified tokenizer(related with pad token), default train param, tokenizer need to be fixed (refer 10k tokenizer)

See details

Training Process
Tokenizer Used	LlamaTokenizerFast
Training Progress (Epoch 3.15/16)
Step	19740/100000
Google Colab Resource Usage	150 tokens used

System Information
	Used	Total
System RAM	5.8 GB	83.5 GB
GPU RAM	26.6 GB	40.0 GB
Disk	74.0 GB	166.8 GB

Basic Training Settings
local_rank	-1
per_device_train_batch_size	4
per_device_eval_batch_size	1
gradient_accumulation_steps	4
learning_rate	2e-4
max_grad_norm	0.3
weight_decay	0.001
max_seq_length	2048
num_train_epochs	1
max_steps	100000
warmup_ratio	0.03
save_steps	500000
logging_steps	10000

4-bit Precision Settings
use_4bit	True
use_nested_quant	False
bnb_4bit_compute_dtype	"bfloat16"
bnb_4bit_quant_type	"nf4"

LoRA Settings
lora_alpha	16
lora_dropout	0.1
lora_r	64

Advanced Training Flags
fp16	False
bf16	False
packing	False
gradient_checkpointing	True
optim	"paged_adamw_32bit"
lr_scheduler_type	"constant"
group_by_length	True

GPU Configuration
device_map	{"": 0}