--- license: mit --- # Dockerfile Commit Classification Model This is a Logistic Regression model enhanced with a rule-based system for multi-label classification of Dockerfile-related commit messages. It combines machine learning with domain-specific rules to achieve accurate categorization. ## Files - `logistic_model.joblib`: Trained Logistic Regression model. - `tfidf_vectorizer.joblib`: TF-IDF vectorizer for text preprocessing. - `label_binarizer.joblib`: MultiLabelBinarizer for encoding/decoding labels. ## Features - **Hybrid Approach**: Combines machine learning with rule-based adjustments for better classification. - **Dockerfile-Specific Labels**: Categorizes commit messages into predefined classes: - `bug fix` - `code refactoring` - `feature addition` - `maintenance/other` - `Not enough information` - **Multi-Label Support**: Each commit message can belong to multiple categories. ## How to Use To use this model, load the files and preprocess your data as follows: ```python from joblib import load # Load the model and preprocessing artifacts model = load("logistic_model.joblib") tfidf_vectorizer = load("tfidf_vectorizer.joblib") mlb = load("label_binarizer.joblib") # Example usage new_messages = [ "Fixed an issue with the base image in Dockerfile", "Added multistage builds to reduce image size", "Updated Python version in Dockerfile to 3.10" ] X_new_tfidf = tfidf_vectorizer.transform(new_messages) # Predict the labels predictions = model.predict(X_new_tfidf) predicted_labels = mlb.inverse_transform(predictions) # Print results for msg, labels in zip(new_messages, predicted_labels): print(f"Message: {msg}") print(f"Predicted Labels: {', '.join(labels) if labels else 'No labels'}\n")