--- base_model: ce-lery/japanese-mistral-300m-base inference: false model-index: - name: checkpoints-mistral-300M-FA2 results: [] model_creator: ce-lery model_name: japanese-mistral-300m-base pipeline_tag: text-generation quantized_by: afrideva tags: - generated_from_trainer - gguf - ggml - quantized - q2_k - q3_k_m - q4_k_m - q5_k_m - q6_k - q8_0 --- # ce-lery/japanese-mistral-300m-base-GGUF Quantized GGUF model files for [japanese-mistral-300m-base](https://huggingface.co/ce-lery/japanese-mistral-300m-base) from [ce-lery](https://huggingface.co/ce-lery) | Name | Quant method | Size | | ---- | ---- | ---- | | [japanese-mistral-300m-base.fp16.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.fp16.gguf) | fp16 | 712.33 MB | | [japanese-mistral-300m-base.q2_k.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q2_k.gguf) | q2_k | 176.84 MB | | [japanese-mistral-300m-base.q3_k_m.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q3_k_m.gguf) | q3_k_m | 195.04 MB | | [japanese-mistral-300m-base.q4_k_m.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q4_k_m.gguf) | q4_k_m | 234.80 MB | | [japanese-mistral-300m-base.q5_k_m.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q5_k_m.gguf) | q5_k_m | 266.47 MB | | [japanese-mistral-300m-base.q6_k.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q6_k.gguf) | q6_k | 307.38 MB | | [japanese-mistral-300m-base.q8_0.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q8_0.gguf) | q8_0 | 379.17 MB | ## Original Model Card: # japanese-mistral-300m-base ## Overview Welcome to my model card! This Model feature is ... - Suppression of unknown word generation by using byte fallback in SentencePiece tokenizer and conversion to huggingface Tokenizers format - Pretrained by wikipedia dataset and cc100 dataset - Use of [Mistral 300M](https://huggingface.co/ce-lery/japanese-mistral-300m-base/blob/main/config.json) Yukkuri shite ittene! ## How to use the model ```python from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer import torch MODEL_NAME = "ce-lery/japanese-mistral-300m-base" torch.set_float32_matmul_precision('high') DEVICE = "cuda" if torch.cuda.is_available(): print("cuda") DEVICE = "cuda" else: print("cpu") DEVICE = "cpu" tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME,use_fast=False) model = AutoModelForCausalLM.from_pretrained( MODEL_NAME, trust_remote_code=True, ).to(DEVICE) # streamer = TextStreamer(tokenizer) prompt = "大規模言語モデルとは、" inputs = tokenizer(prompt, add_special_tokens=False,return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( inputs["input_ids"], max_new_tokens=256, do_sample=True, early_stopping=False, top_p=0.95, top_k=50, temperature=0.9, # streamer=streamer, no_repeat_ngram_size=2, num_beams=3 ) print(outputs.tolist()[0]) outputs_txt = tokenizer.decode(outputs[0]) print(outputs_txt) ``` ## Receipe If you want to restruct this model, you can refer [this Github repository](https://github.com/ce-lery/japanese-mistral-300m-recipe). I wrote the receipe for struction this model. For example, - Preprocess with sentencepiece - Pretraining with flash attention2 and torch.compile and DeepSpeed - Fine-tuning with databricks-dolly-15k-ja If you find my mistake,error,...etc, please create issue. If you create pulreqest, I'm very happy! ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0006 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 64 - total_train_batch_size: 256 - optimizer: Adam with betas=(0.9,0.95) and epsilon=0.0001 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 1000 - num_epochs: 1 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:-----:|:---------------:| | 4.2911 | 0.12 | 5000 | 4.2914 | | 3.9709 | 0.24 | 10000 | 3.9900 | | 3.8229 | 0.36 | 15000 | 3.8388 | | 3.7197 | 0.47 | 20000 | 3.7454 | | 3.652 | 0.59 | 25000 | 3.6739 | | 3.597 | 0.71 | 30000 | 3.6177 | | 3.5554 | 0.83 | 35000 | 3.5770 | | 3.536 | 0.95 | 40000 | 3.5582 | ### Framework versions - Transformers 4.35.2 - Pytorch 2.1.1+cu121 - Datasets 2.14.5 - Tokenizers 0.14.1