rinna
/

gemma-2-baku-2b

Text Generation

text-generation-inference

Model card Files Files and versions Community

t-w commited on Oct 2, 2024

Commit

c5a7742

·

verified ·

1 Parent(s): 75d6bcb

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -28,7 +28,7 @@ The name `baku` comes from the Japanese word [`獏/ばく/Baku`](https://ja.wiki
 | Size | Continual Pre-Training | Instruction-Tuning |
 | :-   | :-                     | :-                 |
-| 2B   | Gemma 2 Baku 2B [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b) | Gemma 2 Baku 2B Instruct [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b-instruct) |
 * **Library**
@@ -71,7 +71,7 @@ model_id = "rinna/gemma-2-baku-2b"
 pipeline = transformers.pipeline(
     "text-generation",
     model=model_id,
-    model_kwargs={"torch_dtype": torch.bfloat16},
     device_map="auto"
 )
 output = pipeline(
@@ -82,6 +82,9 @@ output = pipeline(
 print(output[0]["generated_text"])
 ~~~
 ---
 # Tokenization

 | Size | Continual Pre-Training | Instruction-Tuning |
 | :-   | :-                     | :-                 |
+| 2B   | Gemma 2 Baku 2B [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b) | Gemma 2 Baku 2B Instruct [[HF]](https://huggingface.co/rinna/gemma-2-baku-2b-it) |
 * **Library**
 pipeline = transformers.pipeline(
     "text-generation",
     model=model_id,
+    model_kwargs={"torch_dtype": torch.bfloat16, "attn_implementation": "eager"},
     device_map="auto"
 )
 output = pipeline(
 print(output[0]["generated_text"])
 ~~~
+It is recommended to use eager attention when conducting batch inference under bfloat16 precision.
+Currently, Gemma 2 yields NaN values for input sequences with padding when the default attention mechanism (torch.scaled_dot_product_attention) is employed in conjunction with bfloat16.
 ---
 # Tokenization