merge
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the passthrough merge method.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
slices:
- sources:
- model: TheHierophant/Underground-Cognitive-V0.3-test
layer_range: [0, 16]
parameters:
attention:
- filter: o_proj
value: 1.1
- filter: q_proj
value: 1.0
- filter: v_proj
value: 1.0
- filter: down_proj
value: 0.95
weight: 0.25 # Menor peso en capas primarias para una carga eficiente
significance: 0.7 # Focalizar en patrones esenciales
- sources:
- model: TheHierophant/Underground-Cognitive-V0.3-test
layer_range: [16, 32]
parameters:
attention:
- filter: o_proj
value: 1.3
- filter: q_proj
value: 1.2
- filter: v_proj
value: 1.15
- filter: down_proj
value: 1.1
weight: 0.35 # Peso moderado para capas secundarias
significance: 0.8 # Enfatizar la atenci贸n en informaci贸n relevante
- sources:
- model: TheHierophant/Underground-Cognitive-V0.3-test
layer_range: [32, 48]
parameters:
attention:
- filter: o_proj
value: 1.7
- filter: q_proj
value: 1.6
- filter: v_proj
value: 1.5
- filter: down_proj
value: 1.4
weight: 0.4 # Peso incrementado para aprovechar la potencia de las capas profundas
significance: 0.9 # Enfocar la importancia en la atenci贸n profunda
base_model_config:
attention_bias: false
attention_dropout: 0.05 # Mantener dropout bajo para evitar sobreajuste
hidden_act: "silu" # Silu para una activaci贸n m谩s suave y eficiente
hidden_size: 4096
initializer_range: 0.02
intermediate_size: 14336
max_position_embeddings: 4096
num_attention_heads: 32
num_hidden_layers: 48
num_key_value_heads: 8
pretraining_tp: 1
rms_norm_eps: 1e-05
rope_scaling: 1.3 # Ajuste simplificado para manejar la posici贸n
rope_theta: 10000.0
tie_word_embeddings: true
vocab_size: 32000
use_cache: true
dtype: bfloat16
merge_method: passthrough
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for TheHierophant/Underground-Mind-10.7B-V1-finetuned
Base model
ClaudioItaly/Underground