--- base_model: - TheHierophant/Underground-Cognitive-V0.3-test library_name: transformers tags: - mergekit - merge --- # merge This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). ## Merge Details ### Merge Method This model was merged using the passthrough merge method. ### Models Merged The following models were included in the merge: * [TheHierophant/Underground-Cognitive-V0.3-test](https://huggingface.co/TheHierophant/Underground-Cognitive-V0.3-test) ### Configuration The following YAML configuration was used to produce this model: ```yaml slices: - sources: - model: TheHierophant/Underground-Cognitive-V0.3-test layer_range: [0, 16] parameters: attention: - filter: o_proj value: 1.1 - filter: q_proj value: 1.0 - filter: v_proj value: 1.0 - filter: down_proj value: 0.95 weight: 0.25 # Menor peso en capas primarias para una carga eficiente significance: 0.7 # Focalizar en patrones esenciales - sources: - model: TheHierophant/Underground-Cognitive-V0.3-test layer_range: [16, 32] parameters: attention: - filter: o_proj value: 1.3 - filter: q_proj value: 1.2 - filter: v_proj value: 1.15 - filter: down_proj value: 1.1 weight: 0.35 # Peso moderado para capas secundarias significance: 0.8 # Enfatizar la atención en información relevante - sources: - model: TheHierophant/Underground-Cognitive-V0.3-test layer_range: [32, 48] parameters: attention: - filter: o_proj value: 1.7 - filter: q_proj value: 1.6 - filter: v_proj value: 1.5 - filter: down_proj value: 1.4 weight: 0.4 # Peso incrementado para aprovechar la potencia de las capas profundas significance: 0.9 # Enfocar la importancia en la atención profunda base_model_config: attention_bias: false attention_dropout: 0.05 # Mantener dropout bajo para evitar sobreajuste hidden_act: "silu" # Silu para una activación más suave y eficiente hidden_size: 4096 initializer_range: 0.02 intermediate_size: 14336 max_position_embeddings: 4096 num_attention_heads: 32 num_hidden_layers: 48 num_key_value_heads: 8 pretraining_tp: 1 rms_norm_eps: 1e-05 rope_scaling: 1.3 # Ajuste simplificado para manejar la posición rope_theta: 10000.0 tie_word_embeddings: true vocab_size: 32000 use_cache: true dtype: bfloat16 merge_method: passthrough ```