Qwen2.5-7B-Anvita / README.md
sethuiyer's picture
Update README.md
8e3f0fb verified
|
raw
history blame
1.4 kB
metadata
base_model:
  - happzy2633/qwen2.5-7b-ins-v3
  - bunnycore/Qwen2.5-7B-Matrix
  - bunnycore/Qwen2.5-7B-HyperMix
library_name: transformers
tags:
  - mergekit
  - merge
  - reasoning

Qwen 2.5-7B-Anvita

Anvita Model is a reasoning-oriented AI model based on a Sanskrit word meaning "connected" or "understood." "Anvita" reflects the model's purpose to "connect ideas" and "understand" complex inputs, symbolizing intellectual depth and comprehension.

Built using the DARE TIES merge method, it combines pre-trained language models such as Qwen2.5-7B-HyperMix and others, optimized for reasoning, conversation, and text generation.

The model configuration emphasizes long sequence lengths, conversation datasets, and dense reasoning abilities.

Note:

If you want good reasoning power from this model, please use BF16 and XTC sampling.

Configuration

The following YAML configuration was used to produce this model:


slices:
models:
  - model: bunnycore/Qwen2.5-7B-Matrix
    parameters:
      weight: [0.25, 0.35, 0.45, 0.35, 0.25]
      density: [0.1, 0.25, 0.5, 0.25, 0.1]
  - model: bunnycore/Qwen2.5-7B-HyperMix
  - model: happzy2633/qwen2.5-7b-ins-v3
    parameters:
      weight: [0.55, 0.45, 0.35, 0.45, 0.55]
      density: [0.1, 0.25, 0.5, 0.25, 0.1]
merge_method: dare_ties
base_model: bunnycore/Qwen2.5-7B-HyperMix
parameters:
  int8_mask: true
dtype: bfloat16