1-800-BAD-CODE commited on
Commit
2f391ad
1 Parent(s): 09527b1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -157,7 +157,6 @@ Languages were chosen based on whether the News Crawl corpus contained enough re
157
  # Limitations
158
  This model was trained on news data, and may not perform well on conversational or informal data.
159
 
160
- This is also a base-sized model with many languages and many tasks, so capacity may be limited.
161
 
162
  This model predicts punctuation only once per subword.
163
  This implies that some acronyms, e.g., 'U.S.', cannot properly be punctuation.
@@ -167,4 +166,9 @@ This concession was accepted on two grounds:
167
  Since the expected use-case of this model is the output of an ASR system, it is presumed that such
168
  pronunciations would be transcribed as separate tokens, e.g, 'u s' vs. 'us' (though this depends on the model's pre-processing).
169
 
 
 
 
 
 
170
  # Evaluation
 
157
  # Limitations
158
  This model was trained on news data, and may not perform well on conversational or informal data.
159
 
 
160
 
161
  This model predicts punctuation only once per subword.
162
  This implies that some acronyms, e.g., 'U.S.', cannot properly be punctuation.
 
166
  Since the expected use-case of this model is the output of an ASR system, it is presumed that such
167
  pronunciations would be transcribed as separate tokens, e.g, 'u s' vs. 'us' (though this depends on the model's pre-processing).
168
 
169
+ Further, this model is unlikely to be of production quality.
170
+ Though trained to convergence, it was trained with "only" 1M lines per language, and the dev sets may have been noisy due to the nature of web-scraped news data.
171
+ This is also a base-sized model with many languages and many tasks, so capacity may be limited.
172
+
173
+
174
  # Evaluation