---
language:
- en
license: mit
library_name: adapter-transformers
datasets:
- argilla/distilabel-intel-orca-dpo-pairs
- jondurbin/truthy-dpo-v0.1
- argilla/distilabel-math-preference-dpo
- argilla/distilabel-capybara-dpo-7k-binarized
base_model: Technoculture/MT7Bi-sft
model-index:
- name: MedMerge-6-7b-alpha-dpo
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 54.27
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Technoculture/MedMerge-6-7b-alpha-dpo
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 75.6
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Technoculture/MedMerge-6-7b-alpha-dpo
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 52.65
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Technoculture/MedMerge-6-7b-alpha-dpo
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 43.94
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Technoculture/MedMerge-6-7b-alpha-dpo
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 71.03
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Technoculture/MedMerge-6-7b-alpha-dpo
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 26.16
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Technoculture/MedMerge-6-7b-alpha-dpo
      name: Open LLM Leaderboard
---

# Technoculture/MedMerge-6-7b-alpha-dpo

# Open LLM Leaderboard

![image/png](https://cdn-uploads.huggingface.co/production/uploads/63486df1f8f01fcc4b23e97d/ZhdVcETriQf5WFiDhXb5q.png)

| Model Name              | ARC      | HellaSwag | MMLU   | TruthfulQA | Winogrande | GSM8K    |
| ----------------------- | -------- | --------- | ------ | ---------- | ---------- | -------- |
| Orca-2-7b               | **78.4** | 76.1      | 53.7   | **52.4**   | **74.2**   | **47.2** |
| LLAMA-2-7b              | 43.2     | **77.1**  | 44.4   | 38.7       | 69.5       | 16       |
| MT7Bi-sft               | 54.1     | 75.11     | -      | 43.08      | 72.14      | 15.54    |
| MedMerge-6-7b           | 29.52    | 41.04     | -      | 37.53      | 59.35      | 0.91     |
| MedMerge-6-7b-alpha-dpo | 54.27    | 75.6      | 52.65  | 43.94      | 71.03      | 26.16    |

## Training Details

- **GPU:** Nvidia A100 Tensor Core GPU
- **Total Batches:** 4266
- **Epochs:** 3
- **Duration:** 3 hours, 57 minutes, and 00 seconds


## DPO Training Dataset Mixture
| Dataset Name                                       | Original Size(Rows) | Ratio | Size After Ratio(Rows) |
|----------------------------------------------------|---------------|-------|------------------|
| argilla/distilabel-math-preference-dpo            | 2.4k | 1.0   | 2.4k           | 
| argilla/distilabel-intel-orca-dpo-pairs           | 12.9k | 0.5   | 6.45k           | 
| jondurbin/truthy-dpo-v0.1                         | 1.04k | 1.0   | 1.04k           |
| argilla/distilabel-capybara-dpo-7k-binarized      | 7.5k | 0.2   | 1.5k           | 
Total Size: 11.38k

## Training Loss Plot
![image/png](https://cdn-uploads.huggingface.co/production/uploads/658bed1c8ff537204fbd92a3/wEkGQGRVK000d0q6FkXE9.png)

## Training Loss Smoothed Plot
![image/png](https://cdn-uploads.huggingface.co/production/uploads/658bed1c8ff537204fbd92a3/CDk_JCsteIwGAG_DyHRDE.png)

### For full details of this dpo-training please read our notebook. 
<a target="_blank" href="https://colab.research.google.com/github/dkshjn/Technoculture/blob/main/MedMerge_6_7b_alpha_dpo.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Technoculture__MedMerge-6-7b-alpha-dpo)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |53.94|
|AI2 Reasoning Challenge (25-Shot)|54.27|
|HellaSwag (10-Shot)              |75.60|
|MMLU (5-Shot)                    |52.65|
|TruthfulQA (0-shot)              |43.94|
|Winogrande (5-shot)              |71.03|
|GSM8k (5-shot)                   |26.16|