File size: 5,469 Bytes
f2c5fc7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
---
license: llama3.1
language:
- en
library_name: transformers
tags:
- mergekit
- merge
base_model:
- meta-llama/Meta-Llama-3.1-70B-Instruct
- turboderp/Cat-Llama-3-70B-instruct
- Nexusflow/Athene-70B
---
![image/png](https://cdn-uploads.huggingface.co/production/uploads/649dc85249ae3a68334adcc6/KxaiZ7rDKkYlix99O9j5H.png)
**Cathallama-70B-128K** **originally from: [gbueno86/Cathallama-70B](https://huggingface.co/gbueno86/Cathallama-70B)**
=====================================
Awesome model, my new daily driver.
Edit: I am seeing a lot of token generations pointing to unknown unicode addresses that didn't show up during testing for this model, so I have stopped using it and I am working on a new version.
**Notable Performance**
* 9% overall success rate increase on MMLU-PRO over LLaMA 3.1 70b at Q4_0
* Strong performance in MMLU-PRO categories overall
* Great performance during manual testing
**Creation workflow**
=====================
**Models merged**
* meta-llama/Meta-Llama-3.1-70B-Instruct
* turboderp/Cat-Llama-3-70B-instruct
* Nexusflow/Athene-70B
```
flowchart TD
A[Nexusflow_Athene] -->|Merge with| B[Meta-Llama-3.1]
C[turboderp_Cat] -->|Merge with| D[Meta-Llama-3.1]
B -->| | E[Merge]
D -->| | E[Merge]
E[Merge] -->|Result| F[Cathallama]
```
![image/png](https://cdn-uploads.huggingface.co/production/uploads/649dc85249ae3a68334adcc6/bBcB194tAtsZjPUnI1pDQ.png)
**Testing**
=====================
**Hyperparameters**
---------------
* **Temperature**: 0.0 for automated, 0.9 for manual
* **Penalize repeat sequence**: 1.05
* **Consider N tokens for penalize**: 256
* **Penalize repetition of newlines**
* **Top-K sampling**: 40
* **Top-P sampling**: 0.95
* **Min-P sampling**: 0.05
**LLaMAcpp Version**
------------------
* b3527-2-g2d5dd7bb
* -fa -ngl -1 -ctk f16 --no-mmap
**Tested Files**
------------------
* Cathallama-70B.Q4_0.gguf
* Nexusflow_Athene-70B.Q4_0.gguf
* turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf
* Meta-Llama-3.1-70B-Instruct.Q4_0.gguf
**Tests**
--------------
**Manual testing**
| Category | Test Case | Cathallama-70B.Q4_0.gguf | Nexusflow_Athene-70B.Q4_0.gguf | turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | Meta-Llama-3.1-70B-Instruct.Q4_0.gguf |
| --- | --- | --- | --- | --- | --- |
| **Common Sense** | Ball on cup | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | OK |
| | Big duck small horse | <span style="color: red;">KO</span> | OK | <span style="color: red;">KO</span> | OK |
| | Killers | OK | OK | <span style="color: red;">KO</span> | OK |
| | Strawberry r's | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
| | 9.11 or 9.9 bigger | <span style="color: red;">KO</span> | OK | OK | <span style="color: red;">KO</span> |
| | Dragon or lens | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
| | Shirts | OK | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
| | Sisters | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
| | Jane faster | OK | OK | OK | OK |
| **Programming** | JSON | OK | OK | OK | OK |
| | Python snake game | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
| **Math** | Door window combination | OK | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
| **Smoke** | Poem | OK | OK | OK | OK |
| | Story | OK | OK | KO | OK |
*Note: See [sample_generations.txt](https://huggingface.co/gbueno86/Cathallama-70B/blob/main/sample_generations.txt) on the main folder of the repo for the raw generations.*
**MMLU-PRO**
| Model | Success % |
| --- | --- |
| Cathallama-70B.Q4_0.gguf | **51.0%** |
| turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | 37.0% |
| Nexusflow_Athene-70B.Q4_0.gguf | 41.0% |
| Meta-Llama-3.1-70B-Instruct.Q4_0.gguf | 42.0% |
| MMLU-PRO category| Cathallama-70B.Q4_0.gguf | Nexusflow_Athene-70B.Q4_0.gguf | turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | Meta-Llama-3.1-70B-Instruct.Q4_0.gguf |
| --- | --- | --- | --- | --- |
| Business | **50.0%** | 45.0% | 20.0% | 40.0% |
| Law | **40.0%** | 30.0% | 30.0% | 35.0% |
| Psychology | **85.0%** | 80.0% | 70.0% | 75.0% |
| Biology | 80.0% | 70.0% | **85.0%** | 80.0% |
| Chemistry | **55.0%** | 40.0% | 35.0% | 35.0% |
| History | **65.0%** | 60.0% | 55.0% | **65.0%** |
| Other | **55.0%** | 50.0% | 45.0% | 50.0% |
| Health | **75.0%** | 40.0% | 60.0% | 65.0% |
| Economics | **80.0%** | 75.0% | 65.0% | 70.0% |
| Math | **45.0%** | 35.0% | 15.0% | 40.0% |
| Physics | **50.0%** | 45.0% | 45.0% | 45.0% |
| Computer Science | **60.0%** | 55.0% | 55.0% | **60.0%** |
| Philosophy | 55.0% | **60.0%** | 45.0% | 50.0% |
| Engineering | 35.0% | **40.0%** | 25.0% | 35.0% |
*Note: MMLU-PRO Overall tested with 100 questions. Categories testes with 20 questions from each category.*
**PubmedQA**
Model Name | Success% |
| --- | --- |
| Cathallama-70B.Q4_0.gguf| 73.00% |
| turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | **76.00%** |
| Nexusflow_Athene-70B.Q4_0.gguf | 67.00% |
| Meta-Llama-3.1-70B-Instruct.Q4_0.gguf | 72.00% |
**Request**
--------------
If you are hiring in the EU or can sponsor a visa, PM me :D
PS. Thank you mradermacher for the GGUFs! |