SynthIQ-7b / README.md
sethuiyer's picture
Adding Evaluation Results
409c23e
|
raw
history blame
No virus
4.31 kB
metadata
license: llama2
language:
  - en
tags:
  - mistral
  - merge
library_name: transformers
pipeline_tag: text-generation
mergekit:
  - Weyaxi/OpenHermes-2.5-neural-chat-v3-3-openchat-3.5-1210-Slerp
  - uukuguy/speechless-mistral-six-in-one-7b
datasets:
  - stingning/ultrachat
  - garage-bAInd/Open-Platypus
  - Open-Orca/OpenOrca
  - TIGER-Lab/MathInstruct
  - OpenAssistant/oasst_top1_2023-08-25
  - teknium/openhermes
  - meta-math/MetaMathQA
  - Open-Orca/SlimOrca

SynthIQ

SynthIQ

This is SynthIQ, rated 92.23/100 by GPT-4 across varied complex prompts. I used mergekit to merge models.

Metrics from OpenLLM leaderboard:

Model Average ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
Weyaxi/OpenHermes-2.5_neural-chat-v3-3-openchat-5-1210-Slerp 71.26 67.92 86.32 65.47 56.45 79.72 71.72
sethuiyer/SynthIO-7b 69.37 65.87 85.82 64.75 57 78.69 64.06
uukuguy/speechless-mistral-six-in-one-7b 60.76 62.97 84.6 63.29 57.77 77.51 18.42

Yaml Config


slices:
  - sources:
      - model: Weyaxi/OpenHermes-2.5-neural-chat-v3-3-openchat-3.5-1210-Slerp
        layer_range: [0, 32]
      - model: uukuguy/speechless-mistral-six-in-one-7b
        layer_range: [0, 32]

merge_method: slerp
base_model: mistralai/Mistral-7B-v0.1

parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5 # fallback for rest of tensors
tokenizer_source: union

dtype: bfloat16

Prompt template: ChatML

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

SynthIQ's strengths can be succinctly summarized as follows:

  1. Advanced Natural Language Processing: SynthIQ excels in understanding and generating natural language, making it highly effective for conversational AI applications.

  2. Strong Commonsense Reasoning: It demonstrates a solid grasp of everyday scenarios and contexts, essential for practical and real-world applications.

  3. Creative and Engaging Content Generation: SynthIQ has the capability to produce creative content, useful in fields like marketing, creative writing, and social media engagement.

  4. Adaptive User Interaction: It can effectively adapt to various user personas, providing personalized experiences and recommendations.

  5. Multitasking Across Languages and Subjects: SynthIQ is adept at handling tasks across different languages and subjects, showcasing its versatility in global and multifaceted settings.

  6. Analytical and Problem-Solving Skills: The model shows proficiency in analytical reasoning and problem-solving, applicable in data-driven decision-making and complex scenario analysis.

  7. Cultural and Contextual Awareness: SynthIQ's awareness of different cultural and social contexts makes it suitable for applications requiring cultural sensitivity.

  8. Empathetic and Human-Like Interactions: The model can engage in empathetic and human-like dialogues, ideal for applications in mental health support, customer service, and education.

License is LLama2 license as uukuguy/speechless-mistral-six-in-one-7b is llama2 license.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 69.37
ARC (25-shot) 65.87
HellaSwag (10-shot) 85.82
MMLU (5-shot) 64.75
TruthfulQA (0-shot) 57.0
Winogrande (5-shot) 78.69
GSM8K (5-shot) 64.06