MyPoliBERT-ver02

Model Overview

MyPoliBERT-ver02 is a fine-tuned version of bert-base-uncased designed for multi-label and multi-class classification of political texts in Malaysia. It predicts 12 political topics (Democracy, Economy, Race, Leadership, Development, Corruption, Instability, Safety, Administration, Education, Religion, Environment) and their associated sentiments (Unknown: 0, Negative: 1, Neutral: 2, Positive: 3). The model is optimized for texts sourced from Malaysian contexts, including social media, news articles, and political discussions.

Evaluation and Performance

It achieves the following results on the evaluation set:

Overall Metrics:
- Loss: 0.2953
- F1 Score: 0.9255
- Accuracy: 0.9256
Topic-Specific Metrics:

Topic	F1 Score	Accuracy
Democracy	0.9401	0.9392
Economy	0.9191	0.9182
Race	0.9521	0.9516
Leadership	0.8198	0.8196
Development	0.8877	0.8869
Corruption	0.9487	0.9498
Instability	0.9254	0.9283
Safety	0.9209	0.9207
Administration	0.8993	0.9019
Education	0.9632	0.9632
Religion	0.9557	0.9554
Environment	0.9734	0.9729

Dataset

Data Sources:
- tnwei/ms-newspapers dataset
- Malaysian political posts from Reddit
- Malaysian political posts from Instagram
- Malaysian political posts from Facebook
These sources were combined into a single dataset containing approximately 30,268 records. 80% of the dataset was used for training, and 20% was reserved for validation.
Task:
The model performs multi-task learning, simultaneously predicting 12 topics and their respective sentiment classes.

Model Architecture

Base Model: bert-base-uncased
Output Layer: The model generates logits for 12 topics, each with four sentiment classes (Unknown, Negative, Neutral, Positive).

Training procedure

Training hyperparameters
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 16
- mixed_precision_training: Native AMP
Training Configuration (TrainingArguments):
- evaluation_strategy: "epoch"
- save_strategy: "epoch"
- load_best_model_at_end: True
- metric_for_best_model: "overall_f1"
- greater_is_better: True
Custom Trainer:
The compute_loss method calculates the cross-entropy loss for each label and averages the losses across all labels.

Training results

Training Loss	Epoch	Step	Validation Loss	Democracy F1	Democracy Accuracy	Economy F1	Economy Accuracy	Race F1	Race Accuracy	Leadership F1	Leadership Accuracy	Development F1	Development Accuracy	Corruption F1	Corruption Accuracy	Instability F1	Instability Accuracy	Safety F1	Safety Accuracy	Administration F1	Administration Accuracy	Education F1	Education Accuracy	Religion F1	Religion Accuracy	Environment F1	Environment Accuracy	Overall F1	Overall Accuracy
0.4126	1.0	757	0.2492	0.9175	0.9283	0.9010	0.9096	0.9374	0.9433	0.7897	0.7896	0.8463	0.8655	0.9349	0.9422	0.9088	0.9201	0.9165	0.9199	0.8760	0.8918	0.9546	0.9592	0.9480	0.9493	0.9704	0.9713	0.9084	0.9158
0.2014	2.0	1514	0.2217	0.9346	0.9389	0.9180	0.9191	0.9508	0.9528	0.8148	0.8195	0.8814	0.8905	0.9433	0.9427	0.9133	0.9106	0.9155	0.9153	0.8913	0.9052	0.9641	0.9658	0.9513	0.9519	0.9723	0.9741	0.9209	0.9239
0.1422	3.0	2271	0.2244	0.9369	0.9397	0.9202	0.9224	0.9503	0.9500	0.8140	0.8188	0.8824	0.8888	0.9453	0.9447	0.9227	0.9250	0.9182	0.9172	0.8926	0.9014	0.9646	0.9658	0.9529	0.9549	0.9745	0.9746	0.9229	0.9253
0.098	4.0	3028	0.2310	0.9373	0.9405	0.9239	0.9257	0.9532	0.9541	0.8112	0.8142	0.8894	0.8941	0.9470	0.9466	0.9233	0.9243	0.9180	0.9196	0.8967	0.9068	0.9619	0.9630	0.9527	0.9534	0.9742	0.9751	0.9241	0.9265
0.0722	5.0	3785	0.2507	0.9368	0.9379	0.9254	0.9277	0.9531	0.9536	0.8115	0.8117	0.8862	0.8880	0.9424	0.9405	0.9262	0.9285	0.9139	0.9126	0.8961	0.8987	0.9631	0.9642	0.9548	0.9552	0.9750	0.9746	0.9237	0.9244
0.053	6.0	4542	0.2619	0.9405	0.9424	0.9216	0.9220	0.9536	0.9546	0.8155	0.8132	0.8877	0.8907	0.9522	0.9542	0.9215	0.9179	0.9210	0.9207	0.8976	0.8999	0.9611	0.9605	0.9547	0.9544	0.9750	0.9751	0.9252	0.9255
0.0413	7.0	5299	0.2727	0.9414	0.9440	0.9236	0.9235	0.9534	0.9541	0.8247	0.8244	0.8869	0.8890	0.9491	0.9495	0.9264	0.9275	0.9224	0.9227	0.8977	0.9057	0.9635	0.9642	0.9527	0.9523	0.9738	0.9739	0.9263	0.9276
0.0322	8.0	6056	0.2880	0.9389	0.9410	0.9198	0.9234	0.9544	0.9547	0.8142	0.8099	0.8872	0.8878	0.9522	0.9549	0.9274	0.9288	0.9208	0.9214	0.9027	0.9068	0.9632	0.9640	0.9534	0.9536	0.9745	0.9744	0.9257	0.9267
0.0256	9.0	6813	0.2953	0.9401	0.9392	0.9191	0.9182	0.9521	0.9516	0.8198	0.8196	0.8877	0.8869	0.9487	0.9498	0.9254	0.9283	0.9209	0.9207	0.8993	0.9019	0.9632	0.9632	0.9557	0.9554	0.9734	0.9729	0.9255	0.9256

Framework versions

Transformers 4.18.0
Pytorch 2.5.1+cu121
Datasets 3.2.0
Tokenizers 0.12.1

YagiASAFAS
/

MyPoliBERT-ver02