Divyasreepat commited on
Commit
865e53f
·
verified ·
1 Parent(s): 7fb7629

Update README.md with new model card content

Browse files
Files changed (1) hide show
  1. README.md +133 -13
README.md CHANGED
@@ -1,16 +1,136 @@
1
  ---
2
  library_name: keras-hub
3
  ---
4
- This is a [`XLMRoberta` model](https://keras.io/api/keras_hub/models/xlm_roberta) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
5
- Model config:
6
- * **name:** xlm_roberta_backbone
7
- * **trainable:** True
8
- * **vocabulary_size:** 250002
9
- * **num_layers:** 12
10
- * **num_heads:** 12
11
- * **hidden_dim:** 768
12
- * **intermediate_dim:** 3072
13
- * **dropout:** 0.1
14
- * **max_sequence_length:** 512
15
-
16
- This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: keras-hub
3
  ---
4
+ ### Model Overview
5
+ An XLM-RoBERTa encoder network.
6
+
7
+ This class implements a bi-directional Transformer-based encoder as
8
+ described in ["Unsupervised Cross-lingual Representation Learning at Scale"](https://arxiv.org/abs/1911.02116).
9
+ It includes the embedding lookups and transformer layers, but it does not
10
+ include the masked language modeling head used during pretraining.
11
+
12
+ The default constructor gives a fully customizable, randomly initialized
13
+ RoBERTa encoder with any number of layers, heads, and embedding dimensions.
14
+ To load preset architectures and weights, use the `from_preset()`
15
+ constructor.
16
+
17
+ Disclaimer: Pre-trained models are provided on an "as is" basis, without
18
+ warranties or conditions of any kind. The underlying model is provided by a
19
+ third party and subject to a separate license, available
20
+ [here](https://github.com/facebookresearch/fairseq).
21
+
22
+
23
+ __Arguments__
24
+
25
+
26
+ - __vocabulary_size__: int. The size of the token vocabulary.
27
+ - __num_layers__: int. The number of transformer layers.
28
+ - __num_heads__: int. The number of attention heads for each transformer.
29
+ The hidden size must be divisible by the number of attention heads.
30
+ - __hidden_dim__: int. The size of the transformer encoding layer.
31
+ - __intermediate_dim__: int. The output dimension of the first Dense layer in
32
+ a two-layer feedforward network for each transformer.
33
+ - __dropout__: float. Dropout probability for the Transformer encoder.
34
+ - __max_sequence_length__: int. The maximum sequence length this encoder can
35
+ consume. The sequence length of the input must be less than
36
+ `max_sequence_length` default value. This determines the variable
37
+ shape for positional embeddings.
38
+
39
+ ### Example Usage
40
+ ```python
41
+ import keras
42
+ import keras_hub
43
+ import numpy as np
44
+ ```
45
+
46
+ Raw string data.
47
+ ```python
48
+ features = ["The quick brown fox jumped.", "نسيت الواجب"]
49
+ labels = [0, 3]
50
+
51
+ # Pretrained classifier.
52
+ classifier = keras_hub.models.XLMRobertaClassifier.from_preset(
53
+ "xlm_roberta_base_multi",
54
+ num_classes=4,
55
+ )
56
+ classifier.fit(x=features, y=labels, batch_size=2)
57
+ classifier.predict(x=features, batch_size=2)
58
+
59
+ # Re-compile (e.g., with a new learning rate).
60
+ classifier.compile(
61
+ loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
62
+ optimizer=keras.optimizers.Adam(5e-5),
63
+ jit_compile=True,
64
+ )
65
+ # Access backbone programmatically (e.g., to change `trainable`).
66
+ classifier.backbone.trainable = False
67
+ # Fit again.
68
+ classifier.fit(x=features, y=labels, batch_size=2)
69
+ ```
70
+
71
+ Preprocessed integer data.
72
+ ```python
73
+ features = {
74
+ "token_ids": np.ones(shape=(2, 12), dtype="int32"),
75
+ "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
76
+ }
77
+ labels = [0, 3]
78
+
79
+ # Pretrained classifier without preprocessing.
80
+ classifier = keras_hub.models.XLMRobertaClassifier.from_preset(
81
+ "xlm_roberta_base_multi",
82
+ num_classes=4,
83
+ preprocessor=None,
84
+ )
85
+ classifier.fit(x=features, y=labels, batch_size=2)
86
+ ```
87
+
88
+ ## Example Usage with Hugging Face URI
89
+
90
+ ```python
91
+ import keras
92
+ import keras_hub
93
+ import numpy as np
94
+ ```
95
+
96
+ Raw string data.
97
+ ```python
98
+ features = ["The quick brown fox jumped.", "نسيت الواجب"]
99
+ labels = [0, 3]
100
+
101
+ # Pretrained classifier.
102
+ classifier = keras_hub.models.XLMRobertaClassifier.from_preset(
103
+ "hf://keras/xlm_roberta_base_multi",
104
+ num_classes=4,
105
+ )
106
+ classifier.fit(x=features, y=labels, batch_size=2)
107
+ classifier.predict(x=features, batch_size=2)
108
+
109
+ # Re-compile (e.g., with a new learning rate).
110
+ classifier.compile(
111
+ loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
112
+ optimizer=keras.optimizers.Adam(5e-5),
113
+ jit_compile=True,
114
+ )
115
+ # Access backbone programmatically (e.g., to change `trainable`).
116
+ classifier.backbone.trainable = False
117
+ # Fit again.
118
+ classifier.fit(x=features, y=labels, batch_size=2)
119
+ ```
120
+
121
+ Preprocessed integer data.
122
+ ```python
123
+ features = {
124
+ "token_ids": np.ones(shape=(2, 12), dtype="int32"),
125
+ "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
126
+ }
127
+ labels = [0, 3]
128
+
129
+ # Pretrained classifier without preprocessing.
130
+ classifier = keras_hub.models.XLMRobertaClassifier.from_preset(
131
+ "hf://keras/xlm_roberta_base_multi",
132
+ num_classes=4,
133
+ preprocessor=None,
134
+ )
135
+ classifier.fit(x=features, y=labels, batch_size=2)
136
+ ```