danielpark's picture
Update README.md
404f22a
|
raw
history blame
5.66 kB
metadata
datasets:
  - danielpark/gorani-100k-llama2-13b-instruct
language:
  - en
library_name: transformers
pipeline_tag: text-generation

Project is on process. Do not use weight and dataset.

Status: 19.7k check point weights open, waiting for the results on the LLM leaderboard.

GORANI 100k

Template

I use llama2-13b with LFM, but I have used it without a default system message. If a system message is specified in some datasets, I use that content.

### System:
{System}

### User:
{New_User_Input}

### Input:
{New User Input}

### Response:
{New_Assistant_Answer}

Caution

The model weights and dataset have not been properly curated yet and are strictly prohibited for use under any license. In relation to this, the developers do not assume any responsibility, either implicitly or explicitly.

Updates

Revision Commit Hash Updated Train Process Status
Revision 1 6d30494fa8da84128499d55075eef57094336d03 23.10.04 19740/100000 On Training

Training Plan

  • After checking the performance on the open LLM leaderboard for the 19.7k model, proceed with the following process
  • Compare max sequence length 512 and 1024 (experiment with a 10k model).
  • Implementation of the content similar to the llama2 paper, which is more than 20 times slower than the initial stage.
  • Code modification using flash attention 2.
  • Dataset refinement and adding hash for freezing.

Revision Infomations

Revision 1: 6d30494fa8da84128499d55075eef57094336d03

  • max_seq_length = 2048, partially modified tokenizer(related with pad token), default train param, tokenizer need to be fixed (refer 10k tokenizer)
See details
Training Process
Tokenizer Used LlamaTokenizerFast
Training Progress (Epoch 3.15/16)
Step 19740/100000
Google Colab Resource Usage 150 tokens used
System Information
Used Total
System RAM 5.8 GB 83.5 GB
GPU RAM 26.6 GB 40.0 GB
Disk 74.0 GB 166.8 GB
Basic Training Settings
local_rank -1
per_device_train_batch_size 4
per_device_eval_batch_size 1
gradient_accumulation_steps 4
learning_rate 2e-4
max_grad_norm 0.3
weight_decay 0.001
max_seq_length 2048
num_train_epochs 1
max_steps 100000
warmup_ratio 0.03
save_steps 500000
logging_steps 10000
4-bit Precision Settings
use_4bit True
use_nested_quant False
bnb_4bit_compute_dtype "bfloat16"
bnb_4bit_quant_type "nf4"
LoRA Settings
lora_alpha 16
lora_dropout 0.1
lora_r 64
Advanced Training Flags
fp16 False
bf16 False
packing False
gradient_checkpointing True
optim "paged_adamw_32bit"
lr_scheduler_type "constant"
group_by_length True
GPU Configuration
device_map {"": 0}