RichardErkhov commited on
Commit
86ed482
1 Parent(s): 33773a6

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +172 -0
README.md ADDED
@@ -0,0 +1,172 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ gemma-2-baku-2b-it - bnb 4bits
11
+ - Model creator: https://huggingface.co/rinna/
12
+ - Original model: https://huggingface.co/rinna/gemma-2-baku-2b-it/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
20
+ license: gemma
21
+ language:
22
+ - ja
23
+ - en
24
+ tags:
25
+ - gemma2
26
+ - conversational
27
+ base_model:
28
+ - google/gemma-2-2b
29
+ - google/gemma-2-2b-it
30
+ - rinna/gemma-2-baku-2b
31
+ base_model_relation: merge
32
+ pipeline_tag: text-generation
33
+ library_name: transformers
34
+ ---
35
+
36
+
37
+ # `Gemma 2 Baku 2B Instruct (rinna/gemma-2-baku-2b-it)`
38
+
39
+ ![rinna-icon](./rinna.png)
40
+
41
+ # Overview
42
+
43
+ The model is an instruction-tuned variant of [rinna/gemma-2-baku-2b](https://huggingface.co/rinna/gemma-2-baku-2b), utilizing Chat Vector and Odds Ratio Preference Optimization (ORPO) for fine-tuning. It adheres to the gemma-2 chat format.
44
+
45
+ | Size | Continual Pre-Training | Instruction-Tuning |
46
+ | :- | :- | :- |
47
+ | 2B | Gemma 2 Baku 2B [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b) | Gemma 2 Baku 2B Instruct [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b-it) |
48
+
49
+ * **Model architecture**
50
+
51
+ A 26-layer, 2304-hidden-size transformer-based language model. Please refer to the [Gemma 2 Model Card](https://www.kaggle.com/models/google/gemma-2/) for detailed information on the model's architecture.
52
+
53
+ * **Training**
54
+
55
+ **Model merging.** The base model was endowed with instruction-following capabilities through a chat vector addition process. The chat vector was derived by subtracting the parameter vectors of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) from [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it), as follows.
56
+
57
+ ~~~~text
58
+ rinna/gemma-2-baku-2b + 1.0 * (google/gemma-2-2b-it - google/gemma-2-2b)
59
+ ~~~~
60
+
61
+ During this process, the embedding layer was excluded during the subtraction and addition of parameter vectors.
62
+
63
+ **ORPO** was applied using a subset of the following dataset to further refine the performance of the merged model.
64
+
65
+ - rinna's internal dataset
66
+
67
+ * **Contributors**
68
+
69
+ - [Xinqi Chen](https://huggingface.co/Keely0419)
70
+ - [Toshiaki Wakatsuki](https://huggingface.co/t-w)
71
+ - [Kei Sawada](https://huggingface.co/keisawada)
72
+
73
+ ---
74
+
75
+ # Benchmarking
76
+
77
+ Please refer to [rinna's LM benchmark page](https://rinnakk.github.io/research/benchmarks/lm/index.html).
78
+
79
+ ---
80
+
81
+ # How to use the model
82
+
83
+ ~~~~python
84
+ from transformers import AutoTokenizer, AutoModelForCausalLM
85
+ import torch
86
+
87
+ model_id = "rinna/gemma-2-baku-2b-it"
88
+ dtype = torch.bfloat16
89
+
90
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
91
+ model = AutoModelForCausalLM.from_pretrained(
92
+ model_id,
93
+ device_map="cuda",
94
+ torch_dtype=dtype,
95
+ attn_implementation="eager",
96
+ )
97
+
98
+ chat = [
99
+ { "role": "user", "content": "西田幾多郎とはどんな人物ですか?" },
100
+ ]
101
+ prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
102
+
103
+ input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
104
+ outputs = model.generate(
105
+ input_ids,
106
+ max_new_tokens=512,
107
+ )
108
+
109
+ response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
110
+ print(response)
111
+ ~~~~
112
+
113
+ It is recommended to use eager attention when conducting batch inference under bfloat16 precision.
114
+ Currently, Gemma 2 yields NaN values for input sequences with padding when the default attention mechanism (torch.scaled_dot_product_attention) is employed in conjunction with bfloat16.
115
+
116
+ ---
117
+
118
+ # Tokenization
119
+ The model uses the original [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) tokenizer.
120
+
121
+ ---
122
+
123
+ # How to cite
124
+ ```bibtex
125
+ @misc{rinna-gemma-2-baku-2b-it,
126
+ title = {rinna/gemma-2-baku-2b-it},
127
+ author = {Chen, Xinqi and Wakatsuki, Toshiaki and Sawada, Kei},
128
+ url = {https://huggingface.co/rinna/gemma-2-baku-2b-it}
129
+ }
130
+
131
+ @inproceedings{sawada2024release,
132
+ title = {Release of Pre-Trained Models for the {J}apanese Language},
133
+ author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
134
+ booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
135
+ month = {5},
136
+ year = {2024},
137
+ pages = {13898--13905},
138
+ url = {https://aclanthology.org/2024.lrec-main.1213},
139
+ note = {\url{https://arxiv.org/abs/2404.01657}}
140
+ }
141
+ ```
142
+ ---
143
+
144
+ # References
145
+ ```bibtex
146
+ @article{gemma-2-2024,
147
+ title = {Gemma 2},
148
+ url = {https://www.kaggle.com/models/google/gemma-2},
149
+ publisher = {Kaggle},
150
+ author = {Gemma Team},
151
+ year = {2024}
152
+ }
153
+
154
+ @article{huang2023chat,
155
+ title = {Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages},
156
+ author = {Huang, Shih-Cheng and Li, Pin-Zu and Hsu, Yu-Chi and Chen, Kuang-Ming and Lin, Yu Tung and Hsiao, Shih-Kai and Tzong-Han Tsai, Richard and Lee, Hung-yi},
157
+ year = {2023},
158
+ url = {https://arxiv.org/abs/2310.04799}
159
+ }
160
+
161
+ @article{hong2024orpo,
162
+ title = {ORPO: Monolithic Preference Optimization without Reference Model},
163
+ author = {Hong, Jiwoo and Lee, Noah and Thorne, James},
164
+ year = {2024},
165
+ url = {https://arxiv.org/abs/2403.07691}
166
+ }
167
+ ```
168
+ ---
169
+
170
+ # License
171
+ [Gemma Terms of Use](https://ai.google.dev/gemma/terms)
172
+