ssmits commited on
Commit
52add26
1 Parent(s): 6f30d65

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -15,7 +15,7 @@ tags:
15
 
16
  Qwen2.5-95B-Instruct is a [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) self-merge made with [MergeKit](https://github.com/arcee-ai/mergekit/tree/main).
17
 
18
- The layer ranges chosen for this merge were inspired by a rough estimate of the layer similarity analysis of Falcon2-multilingual. Layer similarity analysis involves examining the outputs of different layers in a neural network to determine how similar or different they are. This technique can help identify which layers contribute most significantly to the model's performance. In the context of the Falcon-11B model, layer similarity analysis across multiple languages revealed that certain layers were more important for maintaining performance. Additionally, this analysis can be used to more rigidly structure the LLM for optimal Next Token Prediction, allowing for a more efficient and effective language model architecture.
19
 
20
  - [alpindale/goliath-120b](https://huggingface.co/alpindale/goliath-120b)
21
  - [cognitivecomputations/MegaDolphin-120b](https://huggingface.co/cognitivecomputations/MegaDolphin-120b)
 
15
 
16
  Qwen2.5-95B-Instruct is a [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) self-merge made with [MergeKit](https://github.com/arcee-ai/mergekit/tree/main).
17
 
18
+ The layer ranges chosen for this merge were inspired by a rough estimate of the layer similarity analysis of [ssmits/Falcon2-5.5B-multilingual](https://huggingface.co/ssmits/Falcon2-5.5B-multilingual). Layer similarity analysis involves examining the outputs of different layers in a neural network to determine how similar or different they are. This technique can help identify which layers contribute most significantly to the model's performance. In the context of the Falcon-11B model, layer similarity analysis across multiple languages revealed that certain layers were more important for maintaining performance. Additionally, this analysis can be used to more rigidly structure the LLM for optimal Next Token Prediction, allowing for a more efficient and effective language model architecture.
19
 
20
  - [alpindale/goliath-120b](https://huggingface.co/alpindale/goliath-120b)
21
  - [cognitivecomputations/MegaDolphin-120b](https://huggingface.co/cognitivecomputations/MegaDolphin-120b)