--- base_model: NousResearch/Meta-Llama-3-8B tags: - generated_from_trainer model-index: - name: llama3-8b-redmond-code290k results: [] --- [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config axolotl version: `0.4.0` ```yaml base_model: NousResearch/Meta-Llama-3-8B model_type: LlamaForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: false strict: false datasets: - path: b-mc2/sql-create-context type: context_qa.load_v2 dataset_prepared_path: last_run_prepared val_set_size: 0.05 output_dir: ./artificialguybr/llama3-8b-redmond-code290k sequence_len: 8192 sample_packing: true pad_to_sequence_len: true wandb_project: artificialguybr/llama3-8b-redmond-code290k wandb_entity: wandb_watch: wandb_name: wandb_log_model: gradient_accumulation_steps: 8 micro_batch_size: 1 num_epochs: 3 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 2e-5 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false early_stopping_patience: resume_from_checkpoint: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 100 evals_per_epoch: 2 eval_table_size: saves_per_epoch: 1 debug: deepspeed: weight_decay: 0.0 fsdp: fsdp_config: special_tokens: pad_token: <|end_of_text|> ```

# LLAMA 3 8B Redmond CODE 290K Thanks to [Redmond.ai](https://redmond.ai) for the GPU Support! This model is a fine-tuned version of [NousResearch/Meta-Llama-3-8B](https://huggingface.co/NousResearch/Meta-Llama-3-8B) on the [ajibawa-2023/Code-290k-ShareGPT](https://huggingface.co/datasets/ajibawa-2023/Code-290k-ShareGPT) dataset. ## Model description The Code-290k-ShareGPT model is a large language model designed to generate code and explanations in various programming languages, including Python, Java, JavaScript, GO, C++, Rust, Ruby, SQL, MySQL, R, Julia, Haskell, and more. It takes as input a prompt or question and outputs a corresponding code snippet with a detailed explanation. The model is trained on a massive dataset of approximately 290,000 conversations, each consisting of two conversations. This dataset is in the Vicuna/ShareGPT format, which allows for efficient training and fine-tuning of the model. The model is intended to be used in applications where code generation and explanation are necessary, such as coding assistance, education, and knowledge sharing. ## Intended uses & limitations Intended uses: Generating code and explanations in various programming languages Assisting in coding tasks and education Providing knowledge sharing and documentation Integrating with other language models or tools to provide a more comprehensive coding experience Limitations: The model may not perform well on very rare or niche programming languages The model may not generalize well to unseen coding styles or conventions The model may not be able to handle extremely complex code or edge cases The model may not be able to provide explanations for highly abstract or theoretical concepts The model may not be able to handle ambiguous or open-ended prompts## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - num_epochs: 2 ### Training results Soon ### Framework versions - Transformers 4.40.0.dev0 - Pytorch 2.2.2+cu121 - Datasets 2.15.0 - Tokenizers 0.15.0