ssmits commited on
Commit
9c0e7df
1 Parent(s): 52add26

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -15,7 +15,7 @@ tags:
15
 
16
  Qwen2.5-95B-Instruct is a [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) self-merge made with [MergeKit](https://github.com/arcee-ai/mergekit/tree/main).
17
 
18
- The layer ranges chosen for this merge were inspired by a rough estimate of the layer similarity analysis of [ssmits/Falcon2-5.5B-multilingual](https://huggingface.co/ssmits/Falcon2-5.5B-multilingual). Layer similarity analysis involves examining the outputs of different layers in a neural network to determine how similar or different they are. This technique can help identify which layers contribute most significantly to the model's performance. In the context of the Falcon-11B model, layer similarity analysis across multiple languages revealed that certain layers were more important for maintaining performance. Additionally, this analysis can be used to more rigidly structure the LLM for optimal Next Token Prediction, allowing for a more efficient and effective language model architecture.
19
 
20
  - [alpindale/goliath-120b](https://huggingface.co/alpindale/goliath-120b)
21
  - [cognitivecomputations/MegaDolphin-120b](https://huggingface.co/cognitivecomputations/MegaDolphin-120b)
 
15
 
16
  Qwen2.5-95B-Instruct is a [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) self-merge made with [MergeKit](https://github.com/arcee-ai/mergekit/tree/main).
17
 
18
+ The layer ranges chosen for this merge were inspired by a rough estimate of the layer similarity analysis of [ssmits/Falcon2-5.5B-multilingual](https://huggingface.co/ssmits/Falcon2-5.5B-multilingual). Layer similarity analysis involves examining the outputs of different layers in a neural network to determine how similar or different they are. This technique can help identify which layers contribute most significantly to the model's performance. In the context of the Falcon-11B model, layer similarity analysis across multiple languages revealed that the first half of the layers were more important for maintaining performance. Additionally, this analysis can be used to more rigidly slice and add extra layers for optimal Next Token Prediction, allowing for possibly a model architecture that's more creative and powerful.
19
 
20
  - [alpindale/goliath-120b](https://huggingface.co/alpindale/goliath-120b)
21
  - [cognitivecomputations/MegaDolphin-120b](https://huggingface.co/cognitivecomputations/MegaDolphin-120b)