Add examples, remove warning
Browse files
README.md
CHANGED
@@ -20,9 +20,13 @@ widget:
|
|
20 |
- text: Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic
|
21 |
to Paris.
|
22 |
example_title: Amelia Earhart
|
23 |
-
- text: Leonardo
|
24 |
Lisa del Giocondo.
|
25 |
example_title: Leonardo da Vinci
|
|
|
|
|
|
|
|
|
26 |
base_model: roberta-large
|
27 |
model-index:
|
28 |
- name: SpanMarker w. roberta-large on finegrained, supervised FewNERD by Tom Aarsen
|
@@ -150,7 +154,7 @@ from span_marker import SpanMarkerModel
|
|
150 |
# Download from the 🤗 Hub
|
151 |
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super")
|
152 |
# Run inference
|
153 |
-
entities = model.predict("Most of the Steven Seagal movie ``
|
154 |
```
|
155 |
|
156 |
### Downstream Use
|
@@ -178,11 +182,6 @@ trainer.save_model("tomaarsen/span-marker-roberta-large-fewnerd-fine-super-finet
|
|
178 |
```
|
179 |
</details>
|
180 |
|
181 |
-
### ⚠️ Tokenizer Warning
|
182 |
-
The [roberta-large](https://huggingface.co/roberta-large) tokenizer distinguishes between punctuation directly attached to a word and punctuation separated from a word by a space. For example, `Paris.` and `Paris .` are tokenized into different tokens. During training, this model is only exposed to the latter style, i.e. all words are separated by a space. Consequently, the model may perform worse when the inference text is in the former style.
|
183 |
-
|
184 |
-
In short, it is recommended to preprocess your inference text such that all words and punctuation are separated by a space. Some potential approaches to convert regular text into this format are NLTK [`word_tokenize`](https://www.nltk.org/api/nltk.tokenize.word_tokenize.html) or spaCy [`Doc`](https://spacy.io/api/doc#iter) and join the resulting words with a space.
|
185 |
-
|
186 |
## Training Details
|
187 |
|
188 |
### Training Set Metrics
|
|
|
20 |
- text: Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic
|
21 |
to Paris.
|
22 |
example_title: Amelia Earhart
|
23 |
+
- text: Leonardo da Vinci painted the Mona Lisa based on Italian noblewoman
|
24 |
Lisa del Giocondo.
|
25 |
example_title: Leonardo da Vinci
|
26 |
+
- text: Most of the Steven Seagal movie ``Under Siege`` (co-starring Tommy Lee Jones)
|
27 |
+
was filmed aboard the Battleship USS Alabama, which is docked on Mobile Bay at
|
28 |
+
Battleship Memorial Park and open to the public.
|
29 |
+
example_title: Under Siege
|
30 |
base_model: roberta-large
|
31 |
model-index:
|
32 |
- name: SpanMarker w. roberta-large on finegrained, supervised FewNERD by Tom Aarsen
|
|
|
154 |
# Download from the 🤗 Hub
|
155 |
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super")
|
156 |
# Run inference
|
157 |
+
entities = model.predict("Most of the Steven Seagal movie ``Under Siege`` (co-starring Tommy Lee Jones) was filmed aboard the Battleship USS Alabama, which is docked on Mobile Bay at Battleship Memorial Park and open to the public.")
|
158 |
```
|
159 |
|
160 |
### Downstream Use
|
|
|
182 |
```
|
183 |
</details>
|
184 |
|
|
|
|
|
|
|
|
|
|
|
185 |
## Training Details
|
186 |
|
187 |
### Training Set Metrics
|