RapMinerz commited on
Commit
dd18fba
1 Parent(s): a82db47

update readme

Browse files
Files changed (1) hide show
  1. README.md +104 -0
README.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - fr
4
+ tags:
5
+ - music
6
+ - rap
7
+ - lyrics
8
+ - bert
9
+ library_name: transformers
10
+ ---
11
+ # WatiBERT: Fine-Tuned BERT Model for French Rap Lyrics
12
+
13
+ ## Overview
14
+
15
+ __WatiBERT__ is a __BERT__ model fine-tuned on __french rap lyrics__ sourced from __Genius__. Used dataset size was __323MB__, corresponding to __77M tokens__ after tokenization.
16
+
17
+ This model is designed to understand and analyze the __semantic relationships__ within the context of __French rap__, providing a valuable tool for research in __French slang__, and __music writing__.
18
+
19
+ ## Model Details
20
+
21
+ The model is based on the __FlauBERT Large Cased__ architecture and has been fine-tuned with the following hyperparameters:
22
+
23
+ | Parameter | Value |
24
+ |----------------------|--------------|
25
+ | Epochs | 5 |
26
+ | Train Batch Size | 16 |
27
+ | Learning Rate | 2e-5 |
28
+ | Weight Decay | 0.01 |
29
+ | Warmup Ratio | 0.1 |
30
+ | Dropout | 0.1 |
31
+ | Mask Token | <special1> |
32
+
33
+ ## Versions
34
+
35
+ The model was trained using __AWS SageMaker__ on a single __ml.p3.2xlarge__ instance with the following software versions:
36
+
37
+ | Requirement | Version |
38
+ |----------------------|--------------|
39
+ | Transformers Library | 4.6 |
40
+ | PyTorch | 1.7 |
41
+ | Python | 3.6 |
42
+
43
+ ## Installation
44
+
45
+ 1. **Install Required Python Libraries**:
46
+
47
+ ```bash
48
+ pip install transformers
49
+ ```
50
+
51
+ ## Loading the Model
52
+
53
+ To load the WatiBERT model, use the following Python code:
54
+
55
+ ```python
56
+ from transformers import AutoModel, AutoTokenizer
57
+
58
+ # Load the tokenizer and model
59
+ tokenizer = AutoTokenizer.from_pretrained("rapminerz/WatiBERT-large-cased")
60
+ model = AutoModel.from_pretrained("rapminerz/WatiBERT-large-cased")
61
+ ```
62
+
63
+ ## Using the Model
64
+
65
+ BERT Models being masked-models, you can fill missing words to check it out
66
+
67
+ ```python
68
+ def fill_mask(sentence, topk):
69
+ inputs = tokenizer(sentence, return_tensors="pt")
70
+ mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
71
+ outputs = model(**inputs)
72
+ logits = outputs.logits
73
+ top_tokens_ids = logits[0, mask_token_index, :].topk(topk, dim=1).indices[0]
74
+ top_tokens = [tokenizer.decode(token_id) for token_id in top_tokens_ids]
75
+ return top_tokens
76
+
77
+ sentence = "La <special1> est morte hier, ils faisaient pas le poids (gang)"
78
+ fill_mask(sentence, 1)
79
+ ['concurrence']
80
+
81
+ sentence = "On s'en souviendra comme le coup de tête de <special1>..."
82
+ fill_mask(sentence, 1)
83
+ ['Zidane']
84
+
85
+ sentence = "Et quand je serai en haut j'achêterai une <special1> à ma daronne !"
86
+ fill_mask(sentence, 1)
87
+ ['villa']
88
+
89
+ sentence = "Tout ce qui m'importe c'est faire du <special1> !"
90
+ fill_mask(sentence, 5)
91
+ ['chiffre', 'cash', 'fric', 'sale', 'blé']
92
+ ```
93
+
94
+ ## Usages
95
+
96
+ This model can be then fined tune to serveral tasks such as : text classification, named entity recognition, question answering, text summerization, text generation, text completion, paraphrasing, language translation, sentiment analysis...
97
+
98
+ ## Purpose and Disclaimer
99
+
100
+ This model is designed for academic and research purposes only. It is not intended for commercial use. The creators of this model do not endorse or promote any specific views or opinions that may be represented in the dataset.
101
+
102
+ ## Contact
103
+
104
+ For any questions or issues, please contact the repository owner, __RapMinerz__, at rapminerz.contact@gmail.com.