--- license: apache-2.0 datasets: - argilla/distilabel-intel-orca-dpo-pairs library_name: transformers pipeline_tag: text-generation --- # Chikuma_10.7B - V2 This model is the DPO fine tune of [Chikuma_10.7B](https://huggingface.co/sethuiyer/Chikuma_10.7B) using [argilla/distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs) # Dataset Dataset: `/argilla/distilabel-intel-orca-dpo-pairs` The dataset was roughly ~3000 samples but they were high quality (according to the chosen_score). The following filters were applied to the original dataset: ```python dataset = dataset.filter( lambda r: r["status"] != "tie" and r["chosen_score"] >= 8 and not r["in_gsm8k_train"] ) ``` # Chat Template I decided to go with a slight modification of ChatML. ``` <|im_start|>GPT4 Correct system: {system} Always use <|end_of_turn|> when you want to end the answer. <|im_end|> <|im_start|>GPT4 Correct user: {user}<|im_end|> <|im_start|>GPT4 Correct Assistant: {asistant}<|im_end|> ``` ### Training Hardware I used 1 x A100 80GB in runpod for about 1.5 hours. ## Usage ```python # Format prompt from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained(new_model) # Create pipeline pipeline = transformers.pipeline( "text-generation", model=new_model, tokenizer=tokenizer, device="cuda" ) # Generate text message = [ {"role": "system", "content": "You are a helpful assistant chatbot. Always use <|end_of_turn|> when you want to end the answer."}, {"role": "user", "content": "What is large language model?"} ] prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False) sequences = pipeline( prompt, do_sample=True, temperature=0.7, top_p=0.9, num_return_sequences=1, max_length=512, ) print(sequences[0]['generated_text']) ``` ## Things in Pipeline: 1. Manual Testing and Evaluation against GPT-4 on text-generation-webui across 45 sample complex prompts. 2. Nous Benchmark 3. GGUF Format 4. Ollama Model (if model benchmarks are good) ## Acknowledgements I'd like to thank the amazing open community and in particular: * The Intel team for publishing a great open dataset and show how well it worked in the first place * Teknium and NousResearch for their awesome work and models. * Maxime for sharing such great resources. * Argilla for publishing argilla/distilabel-intel-orca-dpo-pairs