keras
/

xlm_roberta_base_multi

Text Classification

KerasHub

Keras

Model card Files Files and versions Community

Divyasreepat commited on Oct 29, 2024

Commit

865e53f

verified ·

1 Parent(s): 7fb7629

Update README.md with new model card content

Browse files

Files changed (1) hide show

README.md +133 -13

README.md CHANGED Viewed

@@ -1,16 +1,136 @@
 ---
 library_name: keras-hub
 ---
-This is a [`XLMRoberta` model](https://keras.io/api/keras_hub/models/xlm_roberta) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
-Model config:
-* **name:** xlm_roberta_backbone
-* **trainable:** True
-* **vocabulary_size:** 250002
-* **num_layers:** 12
-* **num_heads:** 12
-* **hidden_dim:** 768
-* **intermediate_dim:** 3072
-* **dropout:** 0.1
-* **max_sequence_length:** 512
-This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.

 ---
 library_name: keras-hub
 ---
+### Model Overview
+An XLM-RoBERTa encoder network.
+This class implements a bi-directional Transformer-based encoder as
+described in ["Unsupervised Cross-lingual Representation Learning at Scale"](https://arxiv.org/abs/1911.02116).
+It includes the embedding lookups and transformer layers, but it does not
+include the masked language modeling head used during pretraining.
+The default constructor gives a fully customizable, randomly initialized
+RoBERTa encoder with any number of layers, heads, and embedding dimensions.
+To load preset architectures and weights, use the `from_preset()`
+constructor.
+Disclaimer: Pre-trained models are provided on an "as is" basis, without
+warranties or conditions of any kind. The underlying model is provided by a
+third party and subject to a separate license, available
+[here](https://github.com/facebookresearch/fairseq).
+__Arguments__
+- __vocabulary_size__: int. The size of the token vocabulary.
+- __num_layers__: int. The number of transformer layers.
+- __num_heads__: int. The number of attention heads for each transformer.
+    The hidden size must be divisible by the number of attention heads.
+- __hidden_dim__: int. The size of the transformer encoding layer.
+- __intermediate_dim__: int. The output dimension of the first Dense layer in
+    a two-layer feedforward network for each transformer.
+- __dropout__: float. Dropout probability for the Transformer encoder.
+- __max_sequence_length__: int. The maximum sequence length this encoder can
+    consume. The sequence length of the input must be less than
+    `max_sequence_length` default value. This determines the variable
+    shape for positional embeddings.
+### Example Usage
+```python
+import keras
+import keras_hub
+import numpy as np
+```
+Raw string data.
+```python
+features = ["The quick brown fox jumped.", "نسيت الواجب"]
+labels = [0, 3]
+# Pretrained classifier.
+classifier = keras_hub.models.XLMRobertaClassifier.from_preset(
+    "xlm_roberta_base_multi",
+    num_classes=4,
+)
+classifier.fit(x=features, y=labels, batch_size=2)
+classifier.predict(x=features, batch_size=2)
+# Re-compile (e.g., with a new learning rate).
+classifier.compile(
+    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
+    optimizer=keras.optimizers.Adam(5e-5),
+    jit_compile=True,
+)
+# Access backbone programmatically (e.g., to change `trainable`).
+classifier.backbone.trainable = False
+# Fit again.
+classifier.fit(x=features, y=labels, batch_size=2)
+```
+Preprocessed integer data.
+```python
+features = {
+    "token_ids": np.ones(shape=(2, 12), dtype="int32"),
+    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
+}
+labels = [0, 3]
+# Pretrained classifier without preprocessing.
+classifier = keras_hub.models.XLMRobertaClassifier.from_preset(
+    "xlm_roberta_base_multi",
+    num_classes=4,
+    preprocessor=None,
+)
+classifier.fit(x=features, y=labels, batch_size=2)
+```
+## Example Usage with Hugging Face URI
+```python
+import keras
+import keras_hub
+import numpy as np
+```
+Raw string data.
+```python
+features = ["The quick brown fox jumped.", "نسيت الواجب"]
+labels = [0, 3]
+# Pretrained classifier.
+classifier = keras_hub.models.XLMRobertaClassifier.from_preset(
+    "hf://keras/xlm_roberta_base_multi",
+    num_classes=4,
+)
+classifier.fit(x=features, y=labels, batch_size=2)
+classifier.predict(x=features, batch_size=2)
+# Re-compile (e.g., with a new learning rate).
+classifier.compile(
+    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
+    optimizer=keras.optimizers.Adam(5e-5),
+    jit_compile=True,
+)
+# Access backbone programmatically (e.g., to change `trainable`).
+classifier.backbone.trainable = False
+# Fit again.
+classifier.fit(x=features, y=labels, batch_size=2)
+```
+Preprocessed integer data.
+```python
+features = {
+    "token_ids": np.ones(shape=(2, 12), dtype="int32"),
+    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
+}
+labels = [0, 3]
+# Pretrained classifier without preprocessing.
+classifier = keras_hub.models.XLMRobertaClassifier.from_preset(
+    "hf://keras/xlm_roberta_base_multi",
+    num_classes=4,
+    preprocessor=None,
+)
+classifier.fit(x=features, y=labels, batch_size=2)
+```