Alpaca 7B Native Enhanced

The Most Advanced Alpaca 7B Model (Model Files Pending)

📃 Model Facts

Trained natively on 8x Nvidia A100 40GB GPUs; no LoRA used
Trained on the largest & most accurate dataset yet
Enhanced Programming Capabilities
First Alpaca model to have conversational awareness

🚀 Quick Start Guide

Step 1. Make sure git-lfs is installed and ready to use (Guide)

Step 2. Download and install text-generation-webui according to the repository's instructions

Step 3. Navigate over to one of it's model folders and clone this repository:

git clone https://huggingface.co/8bit-coder/alpaca-7b-nativeEnhanced

Step 4. Launch the webui and replace the default instruction prompt with:

You are an AI language model designed to assist the User by answering their questions, offering advice, and engaging in casual conversation in a friendly, helpful, and informative manner. You respond clearly, coherently, and you consideration of the conversation history.

User: Hey, how's it going?

Assistant: Hey there! I'm doing great, thank you. What can I help you with today? Let's have a fun chat!

Step 5. Change the settings to match this screenshot:

📚 Training

We used 8x Nvidia A100 40GB GPUs for training this model. Training time took ~3 hours and resulting loss was 0.4761 over 3 epochs. The command used for training is as follows:

torchrun --nproc_per_node=8 --master_port=3045 ./stanford_alpaca/train.py --model_name_or_path ./llama-7b-hf --data_path ./alpaca-7b-nativeEnhanced/training_files/alpaca-megaset-fixed.json --fp16 True --output_dir ./output_7b --num_train_epochs 3 --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --gradient_accumulation_steps 16 --evaluation_strategy "no" --save_strategy "steps" --save_steps 200 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --fsdp "full_shard auto_wrap" --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' --tf32 True

There's a folder in this repository called training_files. full-training-instructions.txt is the full list of commands from start to finish of training, to converting the model all the way to 4 bit quantized ggml. It is not recommended to quantize this model down to 4 bits. The instructions are included purely for informational purposes.

In addition, the training instructions file is built specifically for rented cloud computing. This means that by following the commands in the file, anyone should be able to train a similar model.

Common errors while training:

CUDA Out of Memory error
- This is because your GPUs do not have a minimum of 40GB of vram. The weakest GPU that we've been able to successfully train on has been Nvidia A100 40GB. Even with 8 of these, the vram usage was almost always right up at the limit. If you have 40GB GPUs and are still running into this error, try halving the per_device_train_batch_size and per_device_eval_batch_size and doubling the gradient_accumulation_steps. If you have more than 40GB of vram per GPU and wish to train faster, the opposite applies.
LLaMATokenizer error
- This happens because you forgot to fix tokenizer_config.json in the llama-7b-hf directory. The fix is to rename LLaMATokenizer to LlamaTokenizer in that file.
RuntimeError: CUDA error: invalid device ordinal
- This error occurs when your nproc_per_node is set to a number greater than how many GPUs you have installed in your system. You can check how many GPUs you have installed by running nvidia-smi.
torchrun is not recognized
- This error occurs when you have a python version older than 3.10. Follow the instructions in the training instructions file to install miniconda and get python 3.10 set up. Circumventing this error by running python -m torch.distributed.run will not work. Many of the dependencies require python 3.10 and will fatally error out at the start of training.
KeyError
- This happens when your JSON training data is broken in some way. Try running the dataset_validator.py in the training_files folder to find the broken key.

📝 Notes

The main version of this model is in the hugging face transformers data type. The other one (.pth) format is provided purely for experimental use with llama.cpp and is not guaranteed to have conversational awareness.
This model exhibits weird behavior when quantized to 4 bits. This might be due to the complexity of the model. We recommend the smallest quantization to be 8 bits, but this is untested.
This model is slightly underfitted. We observed that training the model with a smaller gradient accumulation size benefitted the response quality.
This model appears to have full conversational awareness. This means that provided you're running the model in the same configuration we detailed in the Quick Start Guide, you should be able to hold very detailed conversation with the AI without issues. There is a limit to it's memory, and it's 2048 tokens. Beyond that, it'll forget details and will need to be reminded.

🔧 Dataset

The dataset used for training this model is made from AlpacaDataCleaned and codealpaca. We combined these datasets for the following reasons:

Increased accuracy since the original stanford_alpaca dataset had many errors.
Better knowledge in programming
More training data

We had an issue with the latest AlpacaDataCleaned dataset where at around 90k lines in, one of the keys has a typo. The key is "instruction:" instead of "instruction". We have fixed this error in the provided megaset but if you plan on grabbing directly from AlpacaDataCleaned, make sure to fix this error. Otherwise, the training script will fail due to a KeyError.

👨‍💻 Credits

Credits go to Meta for creating the foundational LLaMA models and Stanford for the instructions on how to train. For the dataset, credits go to AlpacaDataCleaned and codealpaca. Credits also go to chavinlo for creating the original Alpaca 7B Native model, the inspiration behind this model.

Lastly, credits go to the homies that stayed up all night again and again: 8bit, π, chug, Taddy, yoyodapro, Symax, and most importantly: stablediffusion for the beautiful artwork