Edit model card

Ara-dialect-BERT

We used a pretrained model to further train it on HARD-Arabic-Dataset, the weights were initialized using CAMeL-Lab "bert-base-camelbert-msa-eighth" model

Usage

The model weights can be loaded using transformers library by HuggingFace.

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("MutazYoune/Ara_DialectBERT")
model = AutoModel.from_pretrained("MutazYoune/Ara_DialectBERT")

Example using pipeline:

from transformers import pipeline
fill_mask = pipeline(
    "fill-mask",
    model="MutazYoune/Ara_DialectBERT",
    tokenizer="MutazYoune/Ara_DialectBERT"
)
fill_mask("ุงู„ูู†ุฏู‚ ุฌู…ูŠู„ ูˆ ู„ูƒู† [MASK] ุจุนูŠุฏ")
{'sequence': 'ุงู„ูู†ุฏู‚ ุฌู…ูŠู„ ูˆ ู„ูƒู† ุงู„ู…ูˆู‚ุน ุจุนูŠุฏ', 'score': 0.28233852982521057, 'token': 3221, 'token_str': 'ุงู„ู…ูˆู‚ุน'}
{'sequence': 'ุงู„ูู†ุฏู‚ ุฌู…ูŠู„ ูˆ ู„ูƒู† ู…ูˆู‚ุนู‡ ุจุนูŠุฏ', 'score': 0.24436227977275848, 'token': 19218, 'token_str': 'ู…ูˆู‚ุนู‡'}
{'sequence': 'ุงู„ูู†ุฏู‚ ุฌู…ูŠู„ ูˆ ู„ูƒู† ุงู„ู…ูƒุงู† ุจุนูŠุฏ', 'score': 0.15372352302074432, 'token': 5401, 'token_str': 'ุงู„ู…ูƒุงู†'}
{'sequence': 'ุงู„ูู†ุฏู‚ ุฌู…ูŠู„ ูˆ ู„ูƒู† ุงู„ูู†ุฏู‚ ุจุนูŠุฏ', 'score': 0.029026474803686142, 'token': 11133, 'token_str': 'ุงู„ูู†ุฏู‚'}
{'sequence': 'ุงู„ูู†ุฏู‚ ุฌู…ูŠู„ ูˆ ู„ูƒู† ู…ูƒุงู†ู‡ ุจุนูŠุฏ', 'score': 0.024554792791604996, 'token': 10701, 'token_str': 'ู…ูƒุงู†ู‡'}
Downloads last month
2
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.