--- library_name: transformers tags: - synthetic license: apache-2.0 datasets: - teknium/OpenHermes-2.5 - Iker/OpenHermes-2.5-Spanish - projecte-aina/RAG_Multilingual - Iker/Document-Translation-en-es - Iker/InstructTranslation-EN-ES - Helsinki-NLP/opus-100 - glaiveai/glaive-code-assistant-v3 - glaiveai/glaive-function-calling-v2 language: - es - en pipeline_tag: text-generation base_model: google/gemma-2b --- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/614a1ebb8f82f1df64d55126/2i_CasoeJTgQPNoBIfA8E.jpeg) # Neurona 2B Beta: Un Modelo de Lenguage en Español > Esta es una versión preliminar del dataset card. El modelo está en desarrollo y no es la versión final. Si quieres saber más sobre este modelo, escribe a iker.garciaf@ehu.eus Neurona 2B es un modelo de lenguaje en Español. Esta es la primera iteración y un experimento para poner a punto los scripts y la infraestructura. Neurona 2B ha sido entrenado con los siguiente datasets - [teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) - [Iker/OpenHermes-2.5-Spanish](https://huggingface.co/datasets/Iker/OpenHermes-2.5-Spanish) - [Iker/Document-Translation-en-es](https://huggingface.co/datasets/Iker/Document-Translation-en-es) - [Iker/InstructTranslation-EN-ES](https://huggingface.co/datasets/Iker/InstructTranslation-EN-ES) - [Helsinki-NLP/opus-100 (en-es, only a few examples to reach 1 million instructions)](https://huggingface.co/datasets/Helsinki-NLP/opus-100) - [projecte-aina/RAG_Multilingual(es only, 3701 examples)](https://huggingface.co/datasets/projecte-aina/RAG_Multilingual) - [glaiveai/glaive-code-assistant-v3](https://huggingface.co/datasets/glaiveai/glaive-code-assistant-v3) - [glaiveai/glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) Esta mezcla de datasets en Inglés y Español, permite al modelo adquirir diferentes capacidades, como RAG, function calling, code assistant, question answering, summarization... tanto en Inglés como en Español. # Entrenamiento Este modelo se ha entrado usando 4xNvidia A100 80Gb y axolotl [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) Esta es la configuración usada ```yaml base_model: google/gemma-2b model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer is_falcon_derived_model: is_llama_derived_model: is_qwen_derived_model: is_mistral_derived_model: load_in_8bit: false load_in_4bit: false strict: false device_map: null datasets: - path: /ikerlariak/igarcia945/Mortadelo-Filemon/final_dataset/OpenHermes-2.5-Spanish_fix_gpt.jsonl type: sharegpt conversation: chatml field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/Mortadelo-Filemon/final_dataset/OpenHermes-2.5-English.jsonl type: sharegpt conversation: chatml field: conversations - path: /ikerlariak/igarcia945/Mortadelo-Filemon/final_dataset/glaive-function-calling-v2.jsonl type: sharegpt conversation: chatml field: conversations roles: input: - system - gpt - tool output: - human - path: /ikerlariak/igarcia945/Mortadelo-Filemon/final_dataset/glaive-code-assistant-v3-small.jsonl type: sharegpt conversation: chatml field: conversations roles: input: - system - gpt output: - human chat_template: chatml dataset_prepared_path: /ikerlariak/igarcia945/Mortadelo-Filemon/gemma-2b-spanish/dataset shuffle_merged_datasets: true val_set_size: 0.005 output_dir: /ikerlariak/igarcia945/Mortadelo-Filemon/gemma-2b-spanish/ adapter: lora_model_dir: sequence_len: 8192 sample_packing: true eval_sample_packing: false pad_to_sequence_len: false special_tokens: bos_token: "<|im_start|>" eos_token: "<|im_end|>" pad_token: "<|end_of_text|>" tokens: - "<|begin_of_text|>" - "<|end_of_text|>" - "<|im_start|>" - "<|im_end|>" - "<|start_header_id|>" - "<|end_header_id|>" - "" - "" - "" - "" - "" - "" - "" - "" - "" - "" neftune_noise_alpha: 5 wandb_project: Mortadelo&Filemon wandb_entity: igarciaf wandb_watch: wandb_name: gemma2b wandb_log_model: gradient_accumulation_steps: 32 micro_batch_size: 2 eval_batch_size: 2 num_epochs: 3 optimizer: adamw_torch_fused lr_scheduler: cosine learning_rate: 0.00007 train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true warmup_ratio: 0.03 evals_per_epoch: 4 eval_table_size: save_strategy: "no" debug: deepspeed: /ikerlariak/igarcia945/Mortadelo-Filemon/train_configs/deepspeed_zero3.json weight_decay: 0.0 fsdp: fsdp_config: special_tokens: seed: 33 ```