jeffmeloy commited on
Commit
f971ac5
·
verified ·
1 Parent(s): bf1c79a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -44
README.md CHANGED
@@ -12,47 +12,41 @@ library_name: transformers
12
 
13
  ## Model Description
14
 
15
- Model created by analyzing and selecting optimal layers based on dimensional utilization efficiency. The process follows these steps:
16
-
17
- Layer Analysis
18
- - Downloads base and fine-tuned models from Hugging Face Hub
19
- - Calculates Normalized Effective Rank (NER) for each layer
20
- - NER measures how effectively each layer utilizes its dimensions through entropy analysis of singular value distributions
21
-
22
- Layer Selection
23
- - Identifies common layer structures across models
24
- - Ranks layers based on their NER scores
25
- - Selects highest-performing layers from each model
26
- - Creates a mapping of optimal layer sources
27
-
28
- Model Composition
29
- - Creates a new model starting from the base architecture
30
- - Systematically replaces layers with their highest-performing counterparts
31
- - Preserves model architecture while optimizing layer-wise performance
32
- - Maintains compatibility with original tokenizer and configuration
33
-
34
- Output Generation
35
- - Saves the composite model with complete weights and configuration
36
- - Generates detailed merge reports documenting layer sources
37
- - Copies necessary tokenizer files from base model
38
-
39
- NER measures how effectively a neural network layer utilizes its available dimensions through entropy analysis of its singular value distribution. The calculation proceeds as follows:
40
-
41
- 1. **Singular Value Decomposition**
42
- - Input: Weight matrix A ∈ R^(m×n)
43
- - Compute singular values σᵢ where σᵢ ≥ 0
44
- - Filter values above numerical threshold (>1e-12)
45
-
46
- 2. **Distribution Normalization**
47
- - Sum all singular values: S = Σσᵢ
48
- - Create probability distribution: pᵢ = σᵢ/S
49
-
50
- 3. **Entropy Calculation**
51
- - Compute Shannon entropy: H = -Σ(pᵢ * log₂(pᵢ))
52
- - Calculate maximum possible entropy: H_max = log₂(n)
53
- where n is the number of singular values
54
-
55
- 4. **Normalization**
56
- - Final NER score = H/H_max
57
- - Results in value between 0 and 1
58
- - Higher scores indicate more uniform dimensional utilization
 
12
 
13
  ## Model Description
14
 
15
+ Model created by analyzing and selecting the optimal layers from other Qwen2.5-7B models based on their dimensional utilization efficiency, measured by the Normalized Effective Rank (NER). Computed like:
16
+
17
+ Singular Value Decomposition:
18
+ - Input: Weight matrix A R^(m×n) # m = number of output features, n = number of input features
19
+ - Compute singular values σᵢ where σᵢ ≥ 0 # σᵢ represents the importance of each dimension
20
+ - Filter values above numerical threshold (>1e-12) # removes numerical noise from computation
21
+
22
+ Distribution Normalization:
23
+ - Sum all singular values: S = Σσᵢ # S acts as normalization factor
24
+ - Create probability distribution: pᵢ = σᵢ/S # converts singular values to probabilities summing to 1
25
+
26
+ Entropy Calculation:
27
+ - Compute Shannon entropy: H = -Σ(pᵢ * log₂(pᵢ)) # measures information content of distribution
28
+ - Calculate maximum possible entropy: H_max = log₂(n) # n = number of singular values
29
+ where n is the number of singular values # maximum entropy occurs when all dimensions contribute equally
30
+
31
+ Normalization:
32
+ - Final NER score = H/H_max # normalizes score to [0,1] range
33
+ - Results in value between 0 and 1 # 0 = single dimension dominance, 1 = perfect dimensional utilization
34
+ - Higher scores indicate more uniform dimensional utilization
35
+
36
+ ## Creating Composite Model
37
+
38
+ Layer Analysis:
39
+ - Download base and fine-tuned models from Hugging Face Hub # fetches models using Hugging Face API
40
+ - Calculate Normalized Effective Rank (NER) for each layer within each model # process each independently
41
+
42
+ Layer Selection:
43
+ - Identify common layer structures across models
44
+ - Define model and layer name pairs that have highest NER for each layer based on their NER scores
45
+
46
+ Model Composition:
47
+ - Incrementally build a composite model using layer with highest NER from model pool.
48
+
49
+ Output Generation:
50
+ - Save merge reports documenting layer sources
51
+ - Copy config and tokenizer files from base model
52
+ - Save the composite model with complete weights # model ready to use