|
--- |
|
library_name: transformers |
|
tags: |
|
- llama-factory |
|
- merge |
|
license: llama3 |
|
language: |
|
- en |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
![image/webp](https://cdn-uploads.huggingface.co/production/uploads/643eab4f05a395e2b1c727e3/TciPHbHULFVgClbNaw0hY.webp) |
|
|
|
This is a fine tune of a merged model using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [cognitivecomputations/dolphin-2.9-llama3-8b](https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b) as a base. |
|
The following models were included in the merge: |
|
* [Weyaxi/Einstein-v6.1-Llama3-8B](https://huggingface.co/Weyaxi/Einstein-v6.1-Llama3-8B) |
|
|
|
|
|
|
|
|
|
## Model Details |
|
Quant [Q8_0 GGUF](https://huggingface.co/giannisan/penny5-dolphin-einstein-llama3-dare-ties-chatml.gguf) |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
This is the model card of a 🤗 transformers model that has been pushed on the Hub. |
|
|
|
- **Developed by:** [Gianni Sanrochman](https://x.com/Giannisanii) |
|
- **Funded by:** [Merildo Sanrochman] |
|
- **Model type:** [LLaMA-3](https://ai.meta.com/blog/meta-llama-3) |
|
- **Language(s) (NLP):** [English] |
|
- **License:** [llama3](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE) |
|
- **Finetuned from model:** [giannisan/dolphin-einstein-llama3-dare-ties](https://huggingface.co/giannisan/dolphin-einstein-llama3-dare-ties) using the PENNY dataset |
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
[More Information Needed] |
|
|
|
### Training Procedure |
|
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> |
|
|
|
#### Speeds, Sizes, Times [optional] |
|
|
|
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> |
|
|
|
[More Information Needed] |
|
|
|
## Evaluation |
|
|
|
|
|
| Metric | Value | |
|
|----------------------|---------| |
|
| Avg. | 66.72 | |
|
| ARC (25-shot) | 61.01 | |
|
| HellaSwag (10-shot) | 82.50 | |
|
| MMLU (5-shot) | 64.48 | |
|
| TruthfulQA (0-shot) | 50.73 | |
|
| Winogrande (5-shot) | 74.11 | |
|
| GSM8K (5-shot) | 67.48 | |
|
|
|
full results [here](https://huggingface.co/datasets/open-llm-leaderboard/details_giannisan__penny5-dolphin-einstein-llama3-dare-ties-chatml/blob/main/results_2024-05-30T05-14-11.958453.json) |
|
|
|
|
|
## Environmental Impact |
|
|
|
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> |
|
|
|
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
- **Hardware Type:** [Nvidia RTX A100] |
|
- **Hours used:** [2] |
|
- **Cloud Provider:** [RunPod] |
|
- **Compute Region:** [Europe] |
|
- **Carbon Emitted:** [More Information Needed] |
|
|
|
|
|
## Model Card Authors [optional] |
|
|
|
[Gianni Sanrochman] |
|
|
|
## Model Card Contact |
|
|
|
[More Information Needed] |