LoneWolfgang
/

bert-for-japanese-twitter

Inference Endpoints

Model card Files Files and versions Community

LoneWolfgang commited on Aug 5, 2024

Commit

f99eca7

·

verified ·

1 Parent(s): b77c5f8

Update README.md

Files changed (1) hide show

README.md +2 -14

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ This model is reccomended for Japanese SNS tasks, like [sentiment analysis](http
 ## Training Data
-The Twitter API was used to collect Japnaese Tweets from June 2022 to April 2023.
 N-gram based deduplication was used to reduce spam content and improve the diversity of the training corpus.
 The refined training corpus was 28 million tweets.
@@ -29,16 +29,4 @@ The refined training corpus was 28 million tweets.
 The vocabulary was prepared using the [WordPieceTrainer](https://huggingface.co/docs/tokenizers/api/trainers) with the Twitter training corpus.
 It shares 60% of its vocabulary with Japanese BERT.
-The vocuabulary includes colloquialisms, neologisms, emoji and kaomoji expressions that are common on Twitter.
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** Jordan Wolfgang Klein, as Master's candiate at the University Malta.
-- **Model type:** BERT
-- **Language(s) (NLP):** Japanese
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:**

 ## Training Data
+The Twitter API was used to collect Japanese tweets from June 2022 to April 2023.
 N-gram based deduplication was used to reduce spam content and improve the diversity of the training corpus.
 The refined training corpus was 28 million tweets.
 The vocabulary was prepared using the [WordPieceTrainer](https://huggingface.co/docs/tokenizers/api/trainers) with the Twitter training corpus.
 It shares 60% of its vocabulary with Japanese BERT.
+The vocabulary includes colloquialisms, neologisms, emoji and kaomoji expressions that are common on Twitter.