metadata
datasets:
- danielpark/gorani-100k-llama2-13b-instruct
language:
- en
library_name: transformers
pipeline_tag: text-generation
Project is on process. Do not use weight and dataset.
Status: 19.7k check point weights open, waiting for the results on the LLM leaderboard.
Update Schedule | Task Description | Status |
---|---|---|
23-10-5 | Completed training - 20k 13b weight | Done |
23-10-6 | Submitted hf model weights | Done |
23-10-20 | QC | On Process |
23-10-13 | Completed training - 50k 13b weight | |
23-10-14 | Submitted hf model weights | |
23-10-18 | Completed training - 100k 13b weight | |
23-10-20 | QA | |
23-10-21 | Official weight release |
GORANI 100k
- Model: danielpark/gorani-100k-llama2-13b-instruct
- Dataset: danielpark/gorani-100k
Template
I use llama2-13b with LFM, but I have used it without a default system message. If a system message is specified in some datasets, I use that content.
### System:
{System}
### User:
{New_User_Input}
### Input:
{New User Input}
### Response:
{New_Assistant_Answer}
Caution
The model weights and dataset have not been properly curated yet and are strictly prohibited for use under any license. In relation to this, the developers do not assume any responsibility, either implicitly or explicitly.
Updates
Revision | Commit Hash | Updated | Train Process | Status |
---|---|---|---|---|
Revision 01 | 6d30494fa8da84128499d55075eef57094336d03 | 23.10.04 | 19,740/100,000 | On Training |
Training Plan
- After checking the performance on the open LLM leaderboard for the 19.7k model, proceed with the following process
- Compare max sequence length 512 and 1024 (experiment with a 10k model).
- Implementation of the content similar to the llama2 paper, which is more than 20 times slower than the initial stage.
- Code modification using flash attention 2.
- Dataset refinement and adding hash for freezing.
Revision Infomations
# Revision 01: 6d30494fa8da84128499d55075eef57094336d03
- 19.74k fine-tuned model weight
- max_seq_length = 2048, partially modified tokenizer(related with pad token), default train param, tokenizer need to be fixed (refer 10k tokenizer)
See details
Training Process | |
---|---|
Tokenizer Used | LlamaTokenizerFast |
Training Progress (Epoch 3.15/16) | |
Step | 19740/100000 |
Google Colab Resource Usage | 150 tokens used |
System Information | ||
---|---|---|
Used | Total | |
System RAM | 5.8 GB | 83.5 GB |
GPU RAM | 26.6 GB | 40.0 GB |
Disk | 74.0 GB | 166.8 GB |
Basic Training Settings | |
---|---|
local_rank | -1 |
per_device_train_batch_size | 4 |
per_device_eval_batch_size | 1 |
gradient_accumulation_steps | 4 |
learning_rate | 2e-4 |
max_grad_norm | 0.3 |
weight_decay | 0.001 |
max_seq_length | 2048 |
num_train_epochs | 1 |
max_steps | 100000 |
warmup_ratio | 0.03 |
save_steps | 500000 |
logging_steps | 10000 |
4-bit Precision Settings | |
---|---|
use_4bit | True |
use_nested_quant | False |
bnb_4bit_compute_dtype | "bfloat16" |
bnb_4bit_quant_type | "nf4" |
LoRA Settings | |
---|---|
lora_alpha | 16 |
lora_dropout | 0.1 |
lora_r | 64 |
Advanced Training Flags | |
---|---|
fp16 | False |
bf16 | False |
packing | False |
gradient_checkpointing | True |
optim | "paged_adamw_32bit" |
lr_scheduler_type | "constant" |
group_by_length | True |
GPU Configuration | |
---|---|
device_map | {"": 0} |