File size: 3,476 Bytes
dd18fba
 
 
 
 
 
 
 
 
 
 
 
 
 
cb01a90
dd18fba
cb40ec4
dd18fba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0f62846
dd18fba
0f62846
 
 
dd18fba
 
 
 
 
 
2527cf8
dd18fba
 
2527cf8
 
dd18fba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c6df027
 
dd18fba
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
language:
- fr
tags:
- music
- rap
- lyrics
- bert
library_name: transformers
---
# WatiBERT: Fine-Tuned BERT Model for French Rap Lyrics

## Overview

__WatiBERT__ is a __BERT__ model fine-tuned on __french rap lyrics__ sourced from __Genius__. Used dataset size was __323MB__, corresponding to __85M tokens__ after tokenization.

This model is designed to understand and analyze the __semantic relationships__ within the context of __French rap__, providing a valuable tool for research in __French slang__, and __music lyrics analysis__.

## Model Details

The model is based on the __FlauBERT Large Cased__ architecture and has been fine-tuned with the following hyperparameters:

| Parameter            | Value        |
|----------------------|--------------|
| Epochs               | 5            |
| Train Batch Size     | 16           |
| Learning Rate        | 2e-5         |
| Weight Decay         | 0.01         |
| Warmup Ratio         | 0.1          |
| Dropout              | 0.1          |

## Versions

The model was trained using __AWS SageMaker__ on a single __ml.p3.2xlarge__ instance with the following software versions:

| Requirement          | Version      |
|----------------------|--------------|
| Transformers Library | 4.6          |
| PyTorch              | 1.7          |
| Python               | 3.6          |

## Installation

Install Required Python Libraries:

```bash
pip install transformers
```

## Loading the Model

To load the WatiBERT model, use the following Python code:

```python
from transformers import FlaubertTokenizer, FlaubertWithLMHeadModel

# Load the tokenizer and model
tokenizer = FlaubertTokenizer.from_pretrained("rapminerz/WatiBERT-large-cased")
model = FlaubertWithLMHeadModel.from_pretrained("rapminerz/WatiBERT-large-cased")
```

## Using the Model

BERT Models being masked-models, you can fill missing words to check it out

```python
def fill_mask(sentence, topk):
    inputs = tokenizer(sentence, return_tensors="pt")
    mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
    outputs = model(**inputs)
    logits = outputs.logits
    top_tokens_ids = logits[0, mask_token_index, :].topk(topk, dim=1).indices[0]
    top_tokens = [tokenizer.decode(token_id) for token_id in top_tokens_ids]
    return top_tokens

sentence = "La <special1> est morte hier, ils faisaient pas le poids (gang)"
fill_mask(sentence, 1)
['concurrence']

sentence = "On s'en souviendra comme le coup de tête de <special1>..."
fill_mask(sentence, 1)
['Zidane']

sentence = "Et quand je serai en haut j'achêterai une <special1> à ma daronne !"
fill_mask(sentence, 1)
['villa']

sentence = "Tout ce qui m'importe c'est faire du <special1> !"
fill_mask(sentence, 5)
['chiffre', 'cash', 'fric', 'sale', 'blé']
```

## Usages

This model can be then fined tune to serveral tasks such as : text classification, named entity recognition, question answering, text summerization, text generation, text completion, paraphrasing, language translation, sentiment analysis...

## Purpose and Disclaimer

This model is designed for academic and research purposes only. It is not intended for commercial use. The creators of this model do not endorse or promote any specific views or opinions that may be represented in the dataset.

__Please mention @RapMinerz if you use our models__

## Contact

For any questions or issues, please contact the repository owner, __RapMinerz__, at rapminerz.contact@gmail.com.