File size: 1,494 Bytes
a5cc9a0 db33c36 3075e32 a5cc9a0 db33c36 a5cc9a0 7ffa72f a5cc9a0 3075e32 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
---
tags:
- arabic
- text-generation
- language-model
license: odc-by
datasets:
- lightonai/ArabicWeb24
---
# Model summary
This model is trained on the ArabicWeb dataset V1. It was trained on 25B tokens using the [AraGPT-2](https://huggingface.co/aubmindlab/aragpt2-base) tokenizer. The model has 900 million parameters with a context length of 1024 tokens and uses the Mamba2 architecture.
* License: odc-by
* Languages: Arabic
## Model Description
The ArabicWeb Ablation Model V1 is trained on a diverse corpus of Arabic text, including news articles, art and entertainment, and encyclopedia entries. This makes it suitable for a variety of Arabic text generation tasks. For more details, you can read the blog post.
- **Model Type**: Language Model
- **Architecture**: Mamba
- **Training Data**: ArabicWeb24 dataset
- **Training Objective**: Text generation
## Usage
This model was primarily trained to assess the quality of the ArabicWeb dataset and is designed for text generation in Arabic. Please note that this is an ablation model that was not instruction-tuned. The primary intended use case is to compare its performance with other models trained under the same configuration but with different versions of datasets.
## Training
### Model
* Architecture: Mamba2 model
* Pretraining tokens: 25B
* Scheduler: Cosine
* d_model: 2304
* d_intermediate: 0
* n_layer: 18
### Hardware
* Platform: HPE Cray node
* Hardware: 8 NVIDIA H100 GPUs
* Cloud Provider: Orange Cloud Avenue |