Asma
/

EBK-BERT

Model card Files Files and versions Community

EBK-BERT / README.md

Asma's picture

Create README.md (#1)

46e98ec almost 2 years ago

|

history blame contribute delete

No virus

2.86 kB

	---
	language: en
	tags:
	- EBK-BERT
	license: apache-2.0
	datasets:
	- Araevent(November)
	- Araevent(July)
	---

	# BK-BERT

	Event Knowledge-Based BERT (EBK-BERT) leverages knowledge extracted from events-related sentences to mask words that
	are significant to the events detection task. This approach aims to produce a language model that enhances the
	performance of the down-stream event detection task, which is later trained during the fine-tuning process.


	## Model description

	The BERT-base configuration is adopted which has 12 encoder blocks, 768 hidden dimensions, 12 attention heads,
	512 maximum sequence length, and a total of 110M parameters.



	## Pre-training Data
	The pre-training data consists of news articles from the 1.5 billion words corpus by (El-Khair, 2016).
	Due to computation limitations, we only use articles from Alittihad, Riyadh, Almasrya- lyoum, and Alqabas,
	which amount to 10GB of text and about 8M sentences after splitting the articles to approximately
	100 word sentences to accommodate the 128 max_sentence length used when training the model.
	The average number of tokens per sentence is 105.

	### Pretraining
	As previous studies have shown, contextual representation models that are pre-trained using top Personnel
	Transaction Contact Nature Movement Life Justice Conflict business the MLM training task benefit from masking
	the most significant words, using whole word masking.
	To select the most significant words we use odds-ratio. Only words with greater than 2 odds-ratio are considered
	in the masking, which means the words included are at least twice as likely to appear in one event type than the other.

	Google Cloud GPU is used for pre-training the model. The selected hyperparameters are: learning rate=1e − 4,
	batch size =16, maxi- mum sequence length = 128 and average se- quence length = 104. In total, we pre-trained
	our models for 500, 000 steps, completing 1 epoch. Pre-training a single model took approximately 2.25 days.

	## Fine-tuning data

	Tweets are collected from well-known Arabic news accounts, which are: Al-Arabiya, Sabq,
	CNN Arabic, and BBC Arabic. These accounts belong to television channels and online
	newspapers, where they use Twitter to broadcast news related to real-world events.
	The first collection process tracks tweets from the news accounts for 20 days period,
	between November 2, 2021, and November 22, 2021 and we call this dataset AraEvent(November).

	## Evaluation results

	When fine-tuned on down-stream event detection task, this model achieves the following results:
	![Event classification accuracy results for AraEvent(November) based on an average of 10 runs per event type and a confidence interval of 95%](https://raw.githubusercontent.com/AmjadAlsulami/images/main/Screen%20Shot%202022-12-08%20at%2012.00.59%20PM.png)

	## Gradio Demo
	will be released soon