afrideva
/

japanese-mistral-300m-instruction-GGUF

+---
+base_model: ce-lery/japanese-mistral-300m-instruction
+inference: false
+model-index:
+- name: checkpoints-finetuning
+  results: []
+model_creator: ce-lery
+model_name: japanese-mistral-300m-instruction
+pipeline_tag: text-generation
+quantized_by: afrideva
+tags:
+- generated_from_trainer
+- gguf
+- ggml
+- quantized
+- q2_k
+- q3_k_m
+- q4_k_m
+- q5_k_m
+- q6_k
+- q8_0
+---
+# ce-lery/japanese-mistral-300m-instruction-GGUF
+Quantized GGUF model files for [japanese-mistral-300m-instruction](https://huggingface.co/ce-lery/japanese-mistral-300m-instruction) from [ce-lery](https://huggingface.co/ce-lery)
+| Name | Quant method | Size |
+| ---- | ---- | ---- |
+| [japanese-mistral-300m-instruction.fp16.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-instruction-GGUF/resolve/main/japanese-mistral-300m-instruction.fp16.gguf) | fp16 | 712.33 MB  |
+| [japanese-mistral-300m-instruction.q2_k.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-instruction-GGUF/resolve/main/japanese-mistral-300m-instruction.q2_k.gguf) | q2_k | 176.84 MB  |
+| [japanese-mistral-300m-instruction.q3_k_m.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-instruction-GGUF/resolve/main/japanese-mistral-300m-instruction.q3_k_m.gguf) | q3_k_m | 195.04 MB  |
+| [japanese-mistral-300m-instruction.q4_k_m.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-instruction-GGUF/resolve/main/japanese-mistral-300m-instruction.q4_k_m.gguf) | q4_k_m | 234.80 MB  |
+| [japanese-mistral-300m-instruction.q5_k_m.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-instruction-GGUF/resolve/main/japanese-mistral-300m-instruction.q5_k_m.gguf) | q5_k_m | 266.47 MB  |
+| [japanese-mistral-300m-instruction.q6_k.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-instruction-GGUF/resolve/main/japanese-mistral-300m-instruction.q6_k.gguf) | q6_k | 307.38 MB  |
+| [japanese-mistral-300m-instruction.q8_0.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-instruction-GGUF/resolve/main/japanese-mistral-300m-instruction.q8_0.gguf) | q8_0 | 379.17 MB  |
+## Original Model Card:
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# japanese-mistral-300m-instruction
+## Overview
+Welcome to my model card!
+This Model feature is ...
+- Suppression of unknown word generation by using byte fallback in SentencePiece tokenizer and conversion to huggingface Tokenizers format
+- Pretrained by wikipedia dataset and cc100 dataset
+- Use of [Mistral 300M](https://huggingface.co/ce-lery/japanese-mistral-300m-base/blob/main/config.json)
+- Fine-tuning [ce-lery/japanese-mistral-300m-base](https://huggingface.co/ce-lery/japanese-mistral-300m-base) with [kunishou/databricks-dolly-15k-ja](https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja)
+Yukkuri shite ittene!
+## How to use the model
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import os
+MODEL_NAME = "ce-lery/japanese-mistral-300m-instruction"
+torch.set_float32_matmul_precision('high')
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+print(device)
+tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=False,trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(MODEL_NAME,trust_remote_code=True).to(device)
+MAX_ASSISTANT_LENGTH = 100
+MAX_INPUT_LENGTH = 128
+INPUT_PROMPT = r'<s>\n以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。\n[SEP]\n指示:\n{instruction}\n[SEP]\n入力:\n{input}\n[SEP]\n応答:\n'
+NO_INPUT_PROMPT = r'<s>\n以下は、タスクを説明する指示です。要求を適切に満たす応答を書きなさい。\n[SEP]\n指示:\n{instruction}\n[SEP]\n応答:\n'
+def prepare_input(instruction, input_text):
+    if input_text != "":
+        prompt = INPUT_PROMPT.format(instruction=instruction, input=input_text)
+    else:
+        prompt = NO_INPUT_PROMPT.format(instruction=instruction)
+    return prompt
+def format_output(output):
+    output = output.lstrip("<s>").rstrip("</s>").replace("[SEP]", "").replace("\\n", "\n")
+    return output
+def generate_response(instruction, input_text):
+    prompt = prepare_input(instruction, input_text)
+    token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
+    n = len(token_ids[0])
+    # print(n)
+    with torch.no_grad():
+        output_ids = model.generate(
+            token_ids.to(model.device),
+            min_length=n,
+            max_length=min(MAX_INPUT_LENGTH, n + MAX_ASSISTANT_LENGTH),
+            top_p=0.95,
+            top_k=50,
+            temperature=0.4,
+            do_sample=True,
+            no_repeat_ngram_size=2,
+            num_beams=3,
+            pad_token_id=tokenizer.pad_token_id,
+            bos_token_id=tokenizer.bos_token_id,
+            eos_token_id=tokenizer.eos_token_id,
+            bad_words_ids=[[tokenizer.unk_token_id]]
+        )
+    output = tokenizer.decode(output_ids.tolist()[0])
+    formatted_output_all = format_output(output)
+    response = f"Assistant:{formatted_output_all.split('応答:')[-1].strip()}"
+    return formatted_output_all, response
+instruction = "あなたは何でも正確に答えられるAIです。"
+questions = [
+    "日本で一番高い山は？",
+    "日本で一番広い湖は？",
+    "世界で一番高い山は？",
+    "世界で一番広い湖は？",
+    "冗談を言ってください。",
+]
+# 各質問に対して応答を生成して表示
+for question in questions:
+    formatted_output_all, response = generate_response(instruction, question)
+    print(response)
+```
+## Receipe
+If you want to restruct this model, you can refer [this Github repository](https://github.com/ce-lery/japanese-mistral-300m-recipe).
+I wrote the receipe for struction this model. For example,
+- Preprocess with sentencepiece
+- Pretraining with flash attention2 and torch.compile and DeepSpeed
+- Fine-tuning with databricks-dolly-15k-ja
+If you find my mistake,error,...etc, please create issue.
+If you create pulreqest, I'm very happy!
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-06
+- train_batch_size: 4
+- eval_batch_size: 4
+- seed: 42
+- distributed_type: multi-GPU
+- gradient_accumulation_steps: 64
+- total_train_batch_size: 256
+- optimizer: Adam with betas=(0.9,0.95) and epsilon=0.0001
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 1000
+- num_epochs: 200
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 3.595         | 3.51   | 40   | 3.5299          |
+| 3.4769        | 7.02   | 80   | 3.3722          |
+| 3.3037        | 10.53  | 120  | 3.1871          |
+| 3.1255        | 14.05  | 160  | 3.0088          |
+| 2.9615        | 17.56  | 200  | 2.8684          |
+| 2.8468        | 21.07  | 240  | 2.7808          |
+| 2.7699        | 24.58  | 280  | 2.7205          |
+| 2.7139        | 28.09  | 320  | 2.6793          |
+| 2.6712        | 31.6   | 360  | 2.6509          |
+| 2.6356        | 35.12  | 400  | 2.6294          |
+| 2.6048        | 38.63  | 440  | 2.6120          |
+| 2.5823        | 42.14  | 480  | 2.5974          |
+| 2.5536        | 45.65  | 520  | 2.5849          |
+| 2.5293        | 49.16  | 560  | 2.5740          |
+| 2.5058        | 52.67  | 600  | 2.5644          |
+| 2.482         | 56.19  | 640  | 2.5556          |
+| 2.4575        | 59.7   | 680  | 2.5477          |
+| 2.4339        | 63.21  | 720  | 2.5405          |
+| 2.4073        | 66.72  | 760  | 2.5350          |
+| 2.3845        | 70.23  | 800  | 2.5303          |
+| 2.3606        | 73.74  | 840  | 2.5253          |
+| 2.329         | 77.26  | 880  | 2.5215          |
+| 2.3071        | 80.77  | 920  | 2.5185          |
+| 2.2768        | 84.28  | 960  | 2.5155          |
+| 2.2479        | 87.79  | 1000 | 2.5144          |
+| 2.2181        | 91.3   | 1040 | 2.5151          |
+| 2.1901        | 94.81  | 1080 | 2.5139          |
+| 2.1571        | 98.33  | 1120 | 2.5148          |
+| 2.1308        | 101.84 | 1160 | 2.5166          |
+| 2.1032        | 105.35 | 1200 | 2.5193          |
+| 2.0761        | 108.86 | 1240 | 2.5204          |
+| 2.0495        | 112.37 | 1280 | 2.5269          |
+| 2.0231        | 115.88 | 1320 | 2.5285          |
+| 2.0021        | 119.4  | 1360 | 2.5328          |
+| 1.9793        | 122.91 | 1400 | 2.5383          |
+| 1.9575        | 126.42 | 1440 | 2.5442          |
+| 1.9368        | 129.93 | 1480 | 2.5488          |
+| 1.9216        | 133.44 | 1520 | 2.5534          |
+| 1.902         | 136.95 | 1560 | 2.5584          |
+| 1.8885        | 140.47 | 1600 | 2.5609          |
+| 1.8728        | 143.98 | 1640 | 2.5657          |
+| 1.8605        | 147.49 | 1680 | 2.5697          |
+| 1.8476        | 151.0  | 1720 | 2.5741          |
+| 1.8402        | 154.51 | 1760 | 2.5770          |
+| 1.8274        | 158.02 | 1800 | 2.5803          |
+| 1.8218        | 161.54 | 1840 | 2.5829          |
+| 1.8144        | 165.05 | 1880 | 2.5847          |
+| 1.8097        | 168.56 | 1920 | 2.5867          |
+| 1.8076        | 172.07 | 1960 | 2.5883          |
+| 1.8014        | 175.58 | 2000 | 2.5892          |
+| 1.8001        | 179.09 | 2040 | 2.5899          |
+| 1.7987        | 182.61 | 2080 | 2.5903          |
+| 1.7971        | 186.12 | 2120 | 2.5906          |
+| 1.7979        | 189.63 | 2160 | 2.5907          |
+| 1.7975        | 193.14 | 2200 | 2.5907          |
+### Framework versions
+- Transformers 4.35.2
+- Pytorch 2.1.1+cu121
+- Datasets 2.14.5
+- Tokenizers 0.14.1