--- license: apache-2.0 library_name: transformers tags: - dpo datasets: - argilla/distilabel-intel-orca-dpo-pairs base_model: sethuiyer/Chikuma_10.7B pipeline_tag: text-generation model-index: - name: distilabled_Chikuma_10.7B results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 66.38 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/distilabled_Chikuma_10.7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 85.14 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/distilabled_Chikuma_10.7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 64.7 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/distilabled_Chikuma_10.7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 59.2 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/distilabled_Chikuma_10.7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 79.4 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/distilabled_Chikuma_10.7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 58.38 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/distilabled_Chikuma_10.7B name: Open LLM Leaderboard --- # Chikuma_10.7B - V2 (Enhanced with DPO) [For Experiments]

Chikuma

This model is the **DPO fine tuned version** of [Chikuma_10.7B](https://huggingface.co/sethuiyer/Chikuma_10.7B), which was a depth upscaled merge of: * [sethuiyer/SynthIQ-7b](https://huggingface.co/sethuiyer/SynthIQ-7b) * [openchat/openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) The name "Chikuma" is inspired by the [Chikuma River](https://en.wikipedia.org/wiki/Shinano_River), the longest in Japan, known for its continuous flow and meandering path. This metaphorically represents the model's depth, fluidity, and adaptability in processing and understanding language. # Dataset used for Fine Tuning Dataset: `/argilla/distilabel-intel-orca-dpo-pairs` The dataset was roughly ~3000 samples but they were high quality (according to the chosen_score). The following filters were applied to the original dataset: ```python dataset = dataset.filter( lambda r: r["status"] != "tie" and r["chosen_score"] >= 8 and not r["in_gsm8k_train"] ) ``` # Chat Template The chat template for Chikuma_10.7B - V2 is a modified version of ChatML, optimized for improved interaction and engagement: ``` <|im_start|>GPT4 Correct system: {system} Always use <|end_of_turn|> when you want to end the answer. <|im_end|> <|im_start|>GPT4 Correct user: {user}<|im_end|> <|im_start|>GPT4 Correct Assistant: {asistant}<|im_end|> ``` ## Nous Benchmark Evaluation | Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average | |-------------------------------|---------|---------|------------|----------|---------| | SynthIQ-7b | 42.67 | 73.71 | 56.51 | 44.59 | 54.37 | | openchat/openchat-3.5-0106 | **44.17** | 73.72 | 52.53 | 44.4 | 53.71 | | Chikuma_10.7B | 42.41 | 73.41 | 56.69 | 43.5 | 54.00 | | **Chikuma_10.7B_v2** | 42.77 | **73.81** | **58.83** | **44.83** | **55.06** | # OpenLLM Leaderboard | Benchmark Name | Performance | |----------------|-------------| | ARC | 66.38 | | HellaSwag | 85 | | MMLU | 65.27 | | TruthfulQA | 58.83 | | Winogrande | 78.77 | | GSM8K | 63.68 | | **Average** | **69.65** | ### Training Environment - Hardware: Single A100 80GB GPU in a runpod, utilized for approximately 1.5 hours. - Training Script: Accessible via [Google Colab Notebook](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing). Special thanks to [mlabonne](https://huggingface.co/mlabonne) for providing the template. ## Usage ```python # Format prompt from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained(new_model) # Create pipeline pipeline = transformers.pipeline( "text-generation", model=new_model, tokenizer=tokenizer, device="cuda" ) # Generate text message = [ {"role": "system", "content": "You are a helpful assistant chatbot."}, {"role": "user", "content": "Who invented LLMs?"} ] prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False) sequences = pipeline( prompt, max_new_tokens=512 ) print(sequences[0]['generated_text']) ``` ## Acknowledgements A heartfelt appreciation goes to the vibrant open-source community, particularly: * The Intel team for publishing a great open dataset and show how well it worked in the first place * Teknium and NousResearch for their awesome work and models. * Maxime for sharing such great resources. * Argilla for publishing argilla/distilabel-intel-orca-dpo-pairs # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_sethuiyer__distilabled_Chikuma_10.7B) | Metric |Value| |---------------------------------|----:| |Avg. |68.87| |AI2 Reasoning Challenge (25-Shot)|66.38| |HellaSwag (10-Shot) |85.14| |MMLU (5-Shot) |64.70| |TruthfulQA (0-shot) |59.20| |Winogrande (5-shot) |79.40| |GSM8k (5-shot) |58.38|