|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
model base: https://huggingface.co/google-bert/bert-base-uncased |
|
|
|
dataset: https://github.com/ramybaly/Article-Bias-Prediction |
|
|
|
|
|
training parameters: |
|
- batch_size: 100 |
|
- epochs: 5 |
|
- dropout: 0.05 |
|
- max_length: 512 |
|
- learning_rate: 3e-5 |
|
- warmup_steps: 100 |
|
- random_state: 239 |
|
|
|
|
|
training methodology: |
|
- sanitize dataset following specific rule-set, utilize random split as provided in the dataset |
|
- train on train split and evaluate on validation split in each epoch |
|
- evaluate test split only on the model that performed best on validation loss |
|
|
|
result summary: |
|
- throughout the five training epochs, model of second epoch achieved the lowest validation loss of 0.3314 |
|
- on test split second epoch model achieved f1 score of 0.9041 |
|
|
|
usage: |
|
|
|
``` |
|
from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
|
|
def main(repository: str): |
|
|
|
model = AutoModelForSequenceClassification.from_pretrained(repository) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(repository) |
|
|
|
nlp = pipeline("text-classification", model=model, tokenizer=tokenizer) |
|
|
|
print(nlp("the masses are controlled by media.")) |
|
|
|
if __name__ == "__main__": |
|
main(repository="premsa/political-bias-prediction-allsides-BERT") |
|
|
|
``` |
|
|