Model Card for Indonesian News Classification Model

Model Description

This model is fine-tuned for the specific task of classifying Indonesian news articles (data were extracted from iqballx/indonesian_news_datasets) into predefined categories. It was trained using a dataset that was created by translating Indonesian news articles into English using a Neural Machine Translation (NMT) system and then labeling them with niksmer/ManiBERT, a model trained to classify political texts. The resulting dataset contains parallel corpora of Indonesian and English news texts alongside their corresponding categories.

Training Data

The training data consists of articles from the iqballx/indonesian_news_datasets which were translated to English and then labeled using the niksmer/ManiBERT model. The dataset includes various categories, capturing a wide array of topics.

Evaluation

The model was evaluated on a held-out test set, and its performance was measured in terms of accuracy. During the training process, the model's accuracy improved across multiple epochs, with the following accuracy scores achieved: 61.71% after the first epoch, 64.62% after the second epoch, 65.64% after the third epoch, and 65.27% after the fourth epoch. These results demonstrate the model's ability to consistently make correct classifications across different categories, indicating its robust performance.

Limitations and Bias

As with any machine learning model, it is important to recognize potential limitations and biases. The translation step could introduce errors or nuances that affect the labeling accuracy. Additionally, the ManiBERT model used for initial labeling was trained on political texts, which may limit its effectiveness on non-political news or introduce political bias.

How to Use the Model

To classify an Indonesian news article, you can use the script below:

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_name = "YagiASAFAS/indonesia-news-classification-bert"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Write Indonesian Text
inputs = tokenizer("[Indonesian Text]", return_tensors="pt")

outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=1)

id2label = model.config.id2label

predicted_class_index = torch.argmax(predictions, dim=1).item()

predicted_class_index

predicted_category = id2label.get(predicted_class_index)

print("Predicted Category:", predicted_category)

Label Mapping

Label ID Label Text
0 Agriculture and Farmers
1 Anti-Growth Economy and Sustainability
2 Anti-Imperialism
3 Centralisation: Positive
4 Civic Mindedness: Positive
5 Constitutionalism: Negative
6 Constitutionalism: Positive
7 Controlled Economy
8 Corporatism/ Mixed Economy
9 Culture: Positive
10 Decentralisation: Positive
11 Democracy
12 Economic Goals
13 Economic Growth: Positive
14 Economic Orthodoxy
15 Economic Planning
16 Education Expansion
17 Education Limitation
18 Environmental Protection
19 Equality: Positive
20 European Community/Union or Latin America Integration: Negative
21 European Community/Union or Latin America Integration: Positive
22 Foreign Special Relationships: Negative
23 Foreign Special Relationships: Positive
24 Free Market Economy
25 Freedom and Human Rights
26 Governmental and Administrative Efficiency
27 Incentives: Positive
28 Internationalism: Negative
29 Internationalism: Positive
30 Labour Groups: Negative
31 Labour Groups: Positive
32 Law and Order
33 Market Regulation
34 Marxist Analysis: Positive
35 Military: Negative
36 Military: Positive
37 Multiculturalism: Negative
38 Multiculturalism: Positive
39 National Way of Life: Negative
40 National Way of Life: Positive
41 Nationalisation
42 Non-economic Demographic Groups
43 None
44 Peace
45 Political Authority
46 Political Corruption
47 Protectionism: Negative
48 Protectionism: Positive
49 Technology and Infrastructure: Positive
50 Traditional Morality: Negative
51 Traditional Morality: Positive
52 Underprivileged Minority Groups
53 Welfare State Expansion
54 Welfare State Limitation
Downloads last month
9
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train YagiASAFAS/indonesia-news-classification-bert