File size: 3,172 Bytes
3a97f52
 
 
 
 
 
 
 
 
 
 
14d3eb4
 
 
 
 
 
 
 
 
 
 
e74af16
 
3a97f52
 
 
 
 
 
c7eed48
 
 
 
 
 
fc32b83
 
 
 
 
14d84d2
fc32b83
 
 
 
 
3a97f52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14d3eb4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
base_model:
- Lambent/threebird-scribe-alpha0.3-7B
- Lambent/bigbird-scribe-7B
- Lambent/aetherbird-scribe-7B
- Lambent/songbird-scribe-7B
- Lambent/codebird-scribe-7B
library_name: transformers
tags:
- mergekit
- merge
license: apache-2.0
datasets:
- Lambent/storytellers-32k
- Doctor-Shotgun/no-robots-sharegpt
- BEE-spoke-data/sarcasm-scrolls
- TheSkullery/Aether-Lite-V1.6
- vishnupriyavr/spotify-million-song-dataset
- TheSkullery/Gryphe-Opus-WritingPrompts-merged
- bigcode/the-stack-smol-xs
- bjoernp/Vezora_Tested-22k-Python-Alpaca-sharegpt-filtered
- thesven/code_bagel_35k
- practical-dreamer/RPGPT_PublicDomain-ShareGPT
- Undi95/Capybara-ShareGPT
---
# fourbirdstock

This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

## Merge Details

| Tasks  |Version|Filter|n-shot|     Metric      |   | Value  |   |Stderr|
|--------|------:|------|-----:|-----------------|---|-------:|---|-----:|
|eq_bench|    2.1|none  |     0|eqbench          |↑  | 78.7955|±  |1.4668|
|        |       |none  |     0|percent_parseable|↑  |100.0000|±  |0.0000|


0.3 involved 3 separate tunes stock merged on overlapping datasets for long context writing, multi-turn conversation and RP, with a touch of poetry and code.
From there, each of the four threads was separately task-tuned on 2 datasets each.
Various methods of combining those via merge were tested, with this one scoring highest on EQ-Bench as an indicator.

My understanding of the Model Stock merge method is that it reduces task adaptation to a significant degree, but also significantly limits forgetting caused by training.
I have hope that the adaptation, especially over two stages, is still sufficient to aid in longer contexts and multi-turn conversations from the ancestor models, and add some individual style while retaining a fair amount of their capability.

This model's refusals are ... not nonexistent, but certainly don't rely on them.
To my knowledge it has no particular refusal behavior for simply NSFW content, but I haven't exactly exhaustively tested which OSHA violations it will aid and abet.

### Merge Method

This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [Lambent/threebird-scribe-alpha0.3-7B](https://huggingface.co/Lambent/threebird-scribe-alpha0.3-7B) as a base.

### Models Merged

The following models were included in the merge:
* [Lambent/bigbird-scribe-7B](https://huggingface.co/Lambent/bigbird-scribe-7B)
* [Lambent/aetherbird-scribe-7B](https://huggingface.co/Lambent/aetherbird-scribe-7B)
* [Lambent/songbird-scribe-7B](https://huggingface.co/Lambent/songbird-scribe-7B)
* [Lambent/codebird-scribe-7B](https://huggingface.co/Lambent/codebird-scribe-7B)

### Configuration

The following YAML configuration was used to produce this model:

```yaml
models:
  - model: Lambent/codebird-scribe-7B
  - model: Lambent/songbird-scribe-7B
  - model: Lambent/aetherbird-scribe-7B
  - model: Lambent/bigbird-scribe-7B
base_model: Lambent/threebird-scribe-alpha0.3-7B
merge_method: model_stock
parameters:
  filter_wise: false
tokenizer_source: union
dtype: float16


```