File size: 1,834 Bytes
5f8f83a
 
 
 
 
 
 
 
 
62258f5
5f8f83a
 
 
 
62258f5
0b7470c
62258f5
 
 
a3b6a81
 
62258f5
5f8f83a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
base_model:
- IntervitensInc/Mistral-Nemo-Base-2407-chatml
library_name: transformers
tags:
- mergekit
- merge

---
# MN-12B-solracht-EXPERIMENTAL-011425

This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

## Merge Details

### __LOWER YOUR EXPECTATIONS.__

This is an experimental release of MN-12B-Mag-Mell, to test the NuSLERP feature in Mergekit. **The expectation is that this model behaves exactly like Mag Mell R1.**

It has been observed in testing that it doesn't produce literally the same outputs, despite being in theory a replication of legacy SLERP behavior with NuSLERP hyperparameters. After pondering while this was uploading, it appears likely that the reason for the difference is that DARE pruned different sets of parameters each time.
To reiterate: **The expectation is that this has ==the exact same problems== that Mag Mell does.** I'm posting this *so that people can tell me whether or not this is the case.*

### Merge Method

This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [IntervitensInc/Mistral-Nemo-Base-2407-chatml](https://huggingface.co/IntervitensInc/Mistral-Nemo-Base-2407-chatml) as a base.

### Models Merged

The following models were included in the merge:
* output/wind-r0
* output/water-r0
* output/earth-r0

### Configuration

The following YAML configuration was used to produce this model:

```yaml
models:
 - model: output/earth-r0
   parameters:
     density: 0.7
     weight: 0.5
 - model: output/water-r0
   parameters:
     density: 0.9
     weight: 1
 - model: output/wind-r0
   parameters:
     density: 0.5
     weight: 0.7
merge_method: dare_ties
base_model: IntervitensInc/Mistral-Nemo-Base-2407-chatml
tokenizer_source: base

```