LoneWolfgang commited on
Commit
f99eca7
·
verified ·
1 Parent(s): b77c5f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -14
README.md CHANGED
@@ -19,7 +19,7 @@ This model is reccomended for Japanese SNS tasks, like [sentiment analysis](http
19
 
20
  ## Training Data
21
 
22
- The Twitter API was used to collect Japnaese Tweets from June 2022 to April 2023.
23
 
24
  N-gram based deduplication was used to reduce spam content and improve the diversity of the training corpus.
25
  The refined training corpus was 28 million tweets.
@@ -29,16 +29,4 @@ The refined training corpus was 28 million tweets.
29
  The vocabulary was prepared using the [WordPieceTrainer](https://huggingface.co/docs/tokenizers/api/trainers) with the Twitter training corpus.
30
  It shares 60% of its vocabulary with Japanese BERT.
31
 
32
- The vocuabulary includes colloquialisms, neologisms, emoji and kaomoji expressions that are common on Twitter.
33
-
34
- ### Model Description
35
-
36
- <!-- Provide a longer summary of what this model is. -->
37
-
38
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
39
-
40
- - **Developed by:** Jordan Wolfgang Klein, as Master's candiate at the University Malta.
41
- - **Model type:** BERT
42
- - **Language(s) (NLP):** Japanese
43
- - **License:** [More Information Needed]
44
- - **Finetuned from model [optional]:**
 
19
 
20
  ## Training Data
21
 
22
+ The Twitter API was used to collect Japanese tweets from June 2022 to April 2023.
23
 
24
  N-gram based deduplication was used to reduce spam content and improve the diversity of the training corpus.
25
  The refined training corpus was 28 million tweets.
 
29
  The vocabulary was prepared using the [WordPieceTrainer](https://huggingface.co/docs/tokenizers/api/trainers) with the Twitter training corpus.
30
  It shares 60% of its vocabulary with Japanese BERT.
31
 
32
+ The vocabulary includes colloquialisms, neologisms, emoji and kaomoji expressions that are common on Twitter.