Dockerfile Commit Classification Model

This is a Logistic Regression model enhanced with a rule-based system for multi-label classification of Dockerfile-related commit messages. It combines machine learning with domain-specific rules to achieve accurate categorization.

Files

logistic_model.joblib: Trained Logistic Regression model.
tfidf_vectorizer.joblib: TF-IDF vectorizer for text preprocessing.
label_binarizer.joblib: MultiLabelBinarizer for encoding/decoding labels.

Features

Hybrid Approach: Combines machine learning with rule-based adjustments for better classification.
Dockerfile-Specific Labels: Categorizes commit messages into predefined classes:
- bug fix
- code refactoring
- feature addition
- maintenance/other
- Not enough information
Multi-Label Support: Each commit message can belong to multiple categories.

How to Use

To use this model, load the files and preprocess your data as follows:

from joblib import load

# Load the model and preprocessing artifacts
model = load("logistic_model.joblib")
tfidf_vectorizer = load("tfidf_vectorizer.joblib")
mlb = load("label_binarizer.joblib")

# Example usage
new_messages = [
    "Fixed an issue with the base image in Dockerfile",
    "Added multistage builds to reduce image size",
    "Updated Python version in Dockerfile to 3.10"
]
X_new_tfidf = tfidf_vectorizer.transform(new_messages)

# Predict the labels
predictions = model.predict(X_new_tfidf)
predicted_labels = mlb.inverse_transform(predictions)

# Print results
for msg, labels in zip(new_messages, predicted_labels):
    print(f"Message: {msg}")
    print(f"Predicted Labels: {', '.join(labels) if labels else 'No labels'}\n")

meriemm6
/

commit-classification-logreg

Dockerfile Commit Classification Model

Files

Features

How to Use

Collection including meriemm6/commit-classification-logreg

Dockerfile Commit Classifier