Erland
/

Llama-3.2-3B-JAX

@@ -5,12 +5,12 @@ tags:
 - flax
 - text-generation
 - transformers
-- meta-llama/Llama-3.2-3B
 ---
 # meta-llama/Llama-3.2-3B - JAX/Flax
-This repository contains the JAX/Flax version of the meta-llama/Llama-3.2-3B model, originally a PyTorch model from {original_model_org}. This conversion enables efficient inference and training on TPUs and GPUs using the JAX/Flax framework.
 ## Model Description
@@ -27,7 +27,7 @@ This model was converted from the original PyTorch implementation to JAX/Flax. T
 ### Important Note about `max_position_embeddings`
-During the conversion process, it was necessary to modify the `max_position_embeddings` parameter in the model's configuration. The original value of {original_max_pos_embed} led to out-of-memory (OOM) errors on the hardware used for conversion. To resolve this, `max_position_embeddings` was adjusted to {new_max_pos_embed}.
 **Implications of this change:**
@@ -313,7 +313,7 @@ The conversion process was performed on the following hardware configuration:
 *   **Transformers version:** 4.47.0
 *   **GPU:** NVIDIA A100-SXM4-40GB
-This conversion took approximately 100.74 seconds to complete.
 ## Usage

 - flax
 - text-generation
 - transformers
+- meta-llama/Llama-3.2-3B # Add the specific model name as a tag
 ---
 # meta-llama/Llama-3.2-3B - JAX/Flax
+This repository contains the JAX/Flax version of the meta-llama/Llama-3.2-3B model, originally a PyTorch model from meta-llama. This conversion enables efficient inference and training on TPUs and GPUs using the JAX/Flax framework.
 ## Model Description
 ### Important Note about `max_position_embeddings`
+During the conversion process, it was necessary to modify the `max_position_embeddings` parameter in the model's configuration. The original value of 131072 led to out-of-memory (OOM) errors on the hardware used for conversion. To resolve this, `max_position_embeddings` was adjusted to 16384.
 **Implications of this change:**
 *   **Transformers version:** 4.47.0
 *   **GPU:** NVIDIA A100-SXM4-40GB
+This conversion took approximately 81.05 seconds to complete.
 ## Usage