Edit model card

Nous-Capybara-34B, Tess-M-v1.4, Airoboros-3_1-yi-34b-200k, PlatYi-34B-200K-Q, Pallas-0.4, Yi-34B-200K-AEZAKMI-v2, and a tiny bit of SUS-Chat-34B merged with a new, experimental implementation of "dare ties" via mergekit.

See the main model card: https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-merge-v5

The merge was then quantized with exllamav2's 0.0.11 brand new exl2 quantization, using 300K tokens from a sci fi story, a fantasy story, and a Vicuna format chat as profiling data, at a high context size. This should results in excellent writing performance for the model size.

This 2.67bpw quantization can fit Long Context on a 16GB GPU at usable quality.


Prompt template: Orca-Vicuna

SYSTEM: {system_message}
USER: {prompt}
ASSISTANT:

It might recognize ChatML, or maybe Llama-chat from Airoboros.

Sometimes the model "spells out" the stop token as </s> like Capybara, so you may need to add </s> as an additional stopping condition.


Running

Being a Yi model, try running a lower temperature with 0.05-0.1 MinP, a little repitition penalty, and no other samplers. Yi tends to run "hot" by default.

24GB GPUs can run Yi-34B-200K models at 45K-75K context with exllamav2, and performant UIs like exui. I go into more detail in this post


Commands

First pass:

python convert.py --in_dir /home/alpha/FastModels/Yi-34B-200K-DARE-merge-v5 -o /home/alpha/FastModels/scratch -om /home/alpha/FastModels/v5.json --cal_dataset /home/alpha/Documents/smol.parquet -ml 32768 -mr 9 -ss 4096 -b 4.0 -hb 6 -nr

Second pass:

python convert.py --in_dir /home/alpha/FastModels/Yi-34B-200K-DARE-merge-v5 -o /home/alpha/FastModels/scratch -m /home/alpha/FastModels/v5.json --cal_dataset /home/alpha/Documents/medium.parquet -l 12288 -r 29 -ml 32768 -mr 9  -ss 4096 -b 2.67 -hb 6 -cf /home/alpha/FastModels/Yi-34B-200K-DARE-merge-v5-exl2-2.67bpw-fiction -nr

Merged in mergekit with the following config, and the tokenizer from chargoddard's Yi-Llama:

models:
  - model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
    # No parameters necessary for base model
  - model: /home/alpha/Storage/Models/Raw/migtissera_Tess-34B-v1.4
    # Less weight than previous merge since Pallas is a finetune of Tess
    parameters:
      weight: 0.14
      density: 0.62
  - model: /home/alpha/FastModels/Mihaiii_Pallas-0.4
    parameters:
      weight: 0.14
      density: 0.62
  - model: /home/alpha//Storage/Models/Raw/bhenrym14_airoboros-3_1-yi-34b-200k
    parameters:
      weight: 0.14
      density: 0.52
  - model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B
    parameters:
      weight: 0.22
      density: 0.62
  - model: /home/alpha/Storage/Models/Raw/kyujinpy_PlatYi-34B-200k-Q-FastChat
    parameters:
      weight: 0.14
      density: 0.52
  #- model: /home/alpha/Storage/Models/Raw/ehartford_dolphin-2.2-yi-34b-200k
  #  Dolphin 200K seems to be broken according to multiple leaderboards and perplexity tests?
  #  parameters:
  #    weight: 0.15
  #    density: 0.6
  - model: /home/alpha/Models/Raw/adamo1139_Yi-34B-200K-AEZAKMI-v2
    parameters:
      weight: 0.14
      density: 0.52
  - model: /home/alpha/Models/Raw/SUSTech_SUS-Chat-34B/
  # Very low density and low weight since its a Yi 4K finetune, to try and preserve long context performance while "keeping" some of SUS
    parameters:
      weight: 0.08
      density: 0.08
merge_method: dare_ties
base_model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
parameters:

  int8_mask: true
dtype: bfloat16

Credits:

https://github.com/cg123/mergekit/tree/dare

https://huggingface.co/NousResearch/Nous-Capybara-34B/

https://huggingface.co/bhenrym14/airoboros-3_1-yi-34b-200k

https://huggingface.co/migtissera/Tess-M-v1.4

https://huggingface.co/kyujinpy/PlatYi-34B-200k-Q-FastChat

https://huggingface.co/adamo1139/Yi-34B-200K-AEZAKMI-v2

https://huggingface.co/Mihaiii/Pallas-0.4

https://huggingface.co/SUSTech/SUS-Chat-34B

https://huggingface.co/chargoddard/Yi-34B-200K-Llama

https://huggingface.co/01-ai/Yi-34B-200K

Downloads last month
7
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.