Update README.md
Browse files
README.md
CHANGED
@@ -28,7 +28,7 @@ The name `baku` comes from the Japanese word [`獏/ばく/Baku`](https://ja.wiki
|
|
28 |
|
29 |
| Size | Continual Pre-Training | Instruction-Tuning |
|
30 |
| :- | :- | :- |
|
31 |
-
| 2B | Gemma 2 Baku 2B [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b) | Gemma 2 Baku 2B Instruct [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b-
|
32 |
|
33 |
* **Library**
|
34 |
|
@@ -71,7 +71,7 @@ model_id = "rinna/gemma-2-baku-2b"
|
|
71 |
pipeline = transformers.pipeline(
|
72 |
"text-generation",
|
73 |
model=model_id,
|
74 |
-
model_kwargs={"torch_dtype": torch.bfloat16},
|
75 |
device_map="auto"
|
76 |
)
|
77 |
output = pipeline(
|
@@ -82,6 +82,9 @@ output = pipeline(
|
|
82 |
print(output[0]["generated_text"])
|
83 |
~~~
|
84 |
|
|
|
|
|
|
|
85 |
---
|
86 |
|
87 |
# Tokenization
|
|
|
28 |
|
29 |
| Size | Continual Pre-Training | Instruction-Tuning |
|
30 |
| :- | :- | :- |
|
31 |
+
| 2B | Gemma 2 Baku 2B [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b) | Gemma 2 Baku 2B Instruct [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b-it) |
|
32 |
|
33 |
* **Library**
|
34 |
|
|
|
71 |
pipeline = transformers.pipeline(
|
72 |
"text-generation",
|
73 |
model=model_id,
|
74 |
+
model_kwargs={"torch_dtype": torch.bfloat16, "attn_implementation": "eager"},
|
75 |
device_map="auto"
|
76 |
)
|
77 |
output = pipeline(
|
|
|
82 |
print(output[0]["generated_text"])
|
83 |
~~~
|
84 |
|
85 |
+
It is recommended to use eager attention when conducting batch inference under bfloat16 precision.
|
86 |
+
Currently, Gemma 2 yields NaN values for input sequences with padding when the default attention mechanism (torch.scaled_dot_product_attention) is employed in conjunction with bfloat16.
|
87 |
+
|
88 |
---
|
89 |
|
90 |
# Tokenization
|