afrideva commited on
Commit
79cc64b
1 Parent(s): 7d8d0f6

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +157 -0
README.md ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: ce-lery/japanese-mistral-300m-base
3
+ inference: false
4
+ model-index:
5
+ - name: checkpoints-mistral-300M-FA2
6
+ results: []
7
+ model_creator: ce-lery
8
+ model_name: japanese-mistral-300m-base
9
+ pipeline_tag: text-generation
10
+ quantized_by: afrideva
11
+ tags:
12
+ - generated_from_trainer
13
+ - gguf
14
+ - ggml
15
+ - quantized
16
+ - q2_k
17
+ - q3_k_m
18
+ - q4_k_m
19
+ - q5_k_m
20
+ - q6_k
21
+ - q8_0
22
+ ---
23
+ # ce-lery/japanese-mistral-300m-base-GGUF
24
+
25
+ Quantized GGUF model files for [japanese-mistral-300m-base](https://huggingface.co/ce-lery/japanese-mistral-300m-base) from [ce-lery](https://huggingface.co/ce-lery)
26
+
27
+
28
+ | Name | Quant method | Size |
29
+ | ---- | ---- | ---- |
30
+ | [japanese-mistral-300m-base.fp16.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.fp16.gguf) | fp16 | 712.33 MB |
31
+ | [japanese-mistral-300m-base.q2_k.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q2_k.gguf) | q2_k | 176.84 MB |
32
+ | [japanese-mistral-300m-base.q3_k_m.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q3_k_m.gguf) | q3_k_m | 195.04 MB |
33
+ | [japanese-mistral-300m-base.q4_k_m.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q4_k_m.gguf) | q4_k_m | 234.80 MB |
34
+ | [japanese-mistral-300m-base.q5_k_m.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q5_k_m.gguf) | q5_k_m | 266.47 MB |
35
+ | [japanese-mistral-300m-base.q6_k.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q6_k.gguf) | q6_k | 307.38 MB |
36
+ | [japanese-mistral-300m-base.q8_0.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q8_0.gguf) | q8_0 | 379.17 MB |
37
+
38
+
39
+
40
+ ## Original Model Card:
41
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
42
+ should probably proofread and complete it, then remove this comment. -->
43
+
44
+ # japanese-mistral-300m-base
45
+
46
+ ## Overview
47
+
48
+ Welcome to my model card!
49
+
50
+ This Model feature is ...
51
+
52
+ - Suppression of unknown word generation by using byte fallback in SentencePiece tokenizer and conversion to huggingface Tokenizers format
53
+ - Pretrained by wikipedia dataset and cc100 dataset
54
+ - Use of [Mistral 300M](https://huggingface.co/ce-lery/japanese-mistral-300m-base/blob/main/config.json)
55
+
56
+ Yukkuri shite ittene!
57
+
58
+ ## How to use the model
59
+
60
+ ```python
61
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
62
+ import torch
63
+
64
+ MODEL_NAME = "ce-lery/japanese-mistral-300m-base"
65
+ torch.set_float32_matmul_precision('high')
66
+
67
+ DEVICE = "cuda"
68
+ if torch.cuda.is_available():
69
+ print("cuda")
70
+ DEVICE = "cuda"
71
+ else:
72
+ print("cpu")
73
+ DEVICE = "cpu"
74
+
75
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME,use_fast=False)
76
+ model = AutoModelForCausalLM.from_pretrained(
77
+ MODEL_NAME,
78
+ trust_remote_code=True,
79
+ ).to(DEVICE)
80
+
81
+ # streamer = TextStreamer(tokenizer)
82
+
83
+ prompt = "大規模言語モデルとは、"
84
+
85
+ inputs = tokenizer(prompt, add_special_tokens=False,return_tensors="pt").to(model.device)
86
+ with torch.no_grad():
87
+
88
+ outputs = model.generate(
89
+ inputs["input_ids"],
90
+ max_new_tokens=256,
91
+ do_sample=True,
92
+ early_stopping=False,
93
+ top_p=0.95,
94
+ top_k=50,
95
+ temperature=0.9,
96
+ # streamer=streamer,
97
+ no_repeat_ngram_size=2,
98
+ num_beams=3
99
+ )
100
+
101
+ print(outputs.tolist()[0])
102
+ outputs_txt = tokenizer.decode(outputs[0])
103
+ print(outputs_txt)
104
+
105
+ ```
106
+
107
+ ## Receipe
108
+
109
+ If you want to restruct this model, you can refer [this Github repository](https://github.com/ce-lery/japanese-mistral-300m-recipe).
110
+
111
+ I wrote the receipe for struction this model. For example,
112
+
113
+ - Preprocess with sentencepiece
114
+ - Pretraining with flash attention2 and torch.compile and DeepSpeed
115
+ - Fine-tuning with databricks-dolly-15k-ja
116
+
117
+ If you find my mistake,error,...etc, please create issue.
118
+ If you create pulreqest, I'm very happy!
119
+
120
+ ## Training procedure
121
+
122
+ ### Training hyperparameters
123
+
124
+ The following hyperparameters were used during training:
125
+ - learning_rate: 0.0006
126
+ - train_batch_size: 4
127
+ - eval_batch_size: 4
128
+ - seed: 42
129
+ - distributed_type: multi-GPU
130
+ - gradient_accumulation_steps: 64
131
+ - total_train_batch_size: 256
132
+ - optimizer: Adam with betas=(0.9,0.95) and epsilon=0.0001
133
+ - lr_scheduler_type: cosine
134
+ - lr_scheduler_warmup_steps: 1000
135
+ - num_epochs: 1
136
+ - mixed_precision_training: Native AMP
137
+
138
+ ### Training results
139
+
140
+ | Training Loss | Epoch | Step | Validation Loss |
141
+ |:-------------:|:-----:|:-----:|:---------------:|
142
+ | 4.2911 | 0.12 | 5000 | 4.2914 |
143
+ | 3.9709 | 0.24 | 10000 | 3.9900 |
144
+ | 3.8229 | 0.36 | 15000 | 3.8388 |
145
+ | 3.7197 | 0.47 | 20000 | 3.7454 |
146
+ | 3.652 | 0.59 | 25000 | 3.6739 |
147
+ | 3.597 | 0.71 | 30000 | 3.6177 |
148
+ | 3.5554 | 0.83 | 35000 | 3.5770 |
149
+ | 3.536 | 0.95 | 40000 | 3.5582 |
150
+
151
+
152
+ ### Framework versions
153
+
154
+ - Transformers 4.35.2
155
+ - Pytorch 2.1.1+cu121
156
+ - Datasets 2.14.5
157
+ - Tokenizers 0.14.1