Asma
/

EBK-BERT

English

EBK-BERT

Model card Files Files and versions Community

Asma

amjadalsulami commited on Dec 8, 2022

Commit

46e98ec

1 Parent(s): 9e1bf78

Create README.md (#1)

Browse files

- Create README.md (985ee098b6a1a2444cefa4eb1defae84ff92c43a)

Co-authored-by: Amjad <amjadalsulami@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +57 -0

README.md ADDED Viewed

	@@ -0,0 +1,57 @@

+---
+language: en
+tags:
+- EBK-BERT
+license: apache-2.0
+datasets:
+- Araevent(November)
+- Araevent(July)
+---
+# BK-BERT
+ Event Knowledge-Based BERT (EBK-BERT) leverages knowledge extracted from events-related sentences to mask words that
+ are significant to the events detection task. This approach aims to produce a language model that enhances the
+ performance of the down-stream event detection task, which is later trained during the fine-tuning process.
+## Model description
+The BERT-base configuration is adopted which has 12 encoder blocks, 768 hidden dimensions, 12 attention heads,
+512 maximum sequence length, and a total of 110M parameters.
+## Pre-training Data
+The pre-training data consists of news articles from the 1.5 billion words corpus by (El-Khair, 2016).
+Due to computation limitations, we only use articles from Alittihad, Riyadh, Almasrya- lyoum, and Alqabas,
+which amount to 10GB of text and about 8M sentences after splitting the articles to approximately
+100 word sentences to accommodate the 128 max_sentence length used when training the model.
+The average number of tokens per sentence is 105.
+### Pretraining
+As previous studies have shown, contextual representation models that are pre-trained using top Personnel
+Transaction Contact Nature Movement Life Justice Conflict business the MLM training task benefit from masking
+the most significant words, using whole word masking.
+To select the most significant words we use odds-ratio. Only words with greater than 2 odds-ratio are considered
+in the masking, which means the words included are at least twice as likely to appear in one event type than the other.
+Google Cloud GPU is used for pre-training the model. The selected hyperparameters are: learning rate=1e − 4,
+batch size =16, maxi- mum sequence length = 128 and average se- quence length = 104. In total, we pre-trained
+our models for 500, 000 steps, completing 1 epoch. Pre-training a single model took approximately 2.25 days.
+## Fine-tuning data
+Tweets are collected from well-known Arabic news accounts, which are: Al-Arabiya, Sabq,
+CNN Arabic, and BBC Arabic. These accounts belong to television channels and online
+newspapers, where they use Twitter to broadcast news related to real-world events.
+The first collection process tracks tweets from the news accounts for 20 days period,
+between November 2, 2021, and November 22, 2021 and we call this dataset AraEvent(November).
+## Evaluation results
+When fine-tuned on down-stream event detection task, this model achieves the following results:
+![Event classification accuracy results for AraEvent(November) based on an average of 10 runs per event type and a confidence interval of 95%](https://raw.githubusercontent.com/AmjadAlsulami/images/main/Screen%20Shot%202022-12-08%20at%2012.00.59%20PM.png)
+## Gradio Demo
+ will be released soon