Spaces:
Runtime error
Runtime error
Petr Tsvetkov
commited on
Commit
Β·
3907263
1
Parent(s):
a7bba68
Added some description to the README.md
Browse files
README.md
CHANGED
@@ -5,4 +5,48 @@ sdk_version: 4.25.0
|
|
5 |
app_file: change_visualizer.py
|
6 |
---
|
7 |
|
8 |
-
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
app_file: change_visualizer.py
|
6 |
---
|
7 |
|
8 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
9 |
+
|
10 |
+
# Description
|
11 |
+
|
12 |
+
This project is a main artifact of the "Research on evaluation for AI Commit Message Generation" research.
|
13 |
+
|
14 |
+
# Structure (important components)
|
15 |
+
|
16 |
+
- ### Configuration: [config.py](config.py)
|
17 |
+
- Grazie API JWT token and Hugging Face token must be stored as environment variables.
|
18 |
+
- ### Visualization app -- a Gradio application that is currently deployed
|
19 |
+
at https://huggingface.co/spaces/JetBrains-Research/commit-rewriting-visualization.
|
20 |
+
- Shows
|
21 |
+
- The "golden" dataset of manually collected samples; the dataset is downloaded on startup
|
22 |
+
from https://huggingface.co/datasets/JetBrains-Research/commit-msg-rewriting
|
23 |
+
- The entire dataset that includes the synthetic samples; the dataset is downloaded on startup
|
24 |
+
from https://huggingface.co/datasets/JetBrains-Research/synthetic-commit-msg-rewriting
|
25 |
+
- Some statistics collected for the dataset (and its parts); computed on startup
|
26 |
+
|
27 |
+
_Note: datasets updated => need to restart the app to see the changes._
|
28 |
+
- Files
|
29 |
+
- [change_visualizer.py](change_visualizer.py)
|
30 |
+
- ### Data processing pipeline (_note: datasets and files names can be changed in the configuration file_)
|
31 |
+
- Run the whole pipeline by running [run_pipeline.py](run_pipeline.py)
|
32 |
+
- All intermediate results are stored as files defined in config
|
33 |
+
- Intermediate steps (can run them separately by running the corresponding files
|
34 |
+
from [generation_steps](generation_steps)). The input is then taken from the previous step's artifact.
|
35 |
+
- Generate the synthetic samples
|
36 |
+
- Files [generation_steps/synthetic_end_to_start.py](generation_steps/synthetic_end_to_start.py)
|
37 |
+
and [generation_steps/synthetic_start_to_end.py](generation_steps/synthetic_start_to_end.py)
|
38 |
+
- The first generation step (end to start) downloads the `JetBrains-Research/commit-msg-rewriting`
|
39 |
+
and `JetBrains-Research/lca-commit-message-generation` datasets from
|
40 |
+
Hugging Face datasets.
|
41 |
+
- Compute metrics
|
42 |
+
- File [generation_steps/metrics_analysis.py](generation_steps/metrics_analysis.py)
|
43 |
+
- Includes the functions for all metrics
|
44 |
+
- Downloads `JetBrains-Research/lca-commit-message-generation` Hugging Face dataset.
|
45 |
+
- The resulting artifact (dataset with golden and synthetic samples, attached reference messages and computed
|
46 |
+
metrics) is saved to the file [output/synthetic.csv](output/synthetic.csv). It should be uploaded
|
47 |
+
to https://huggingface.co/datasets/JetBrains-Research/synthetic-commit-msg-rewriting **manually**.
|
48 |
+
- ### Data analysis
|
49 |
+
- [analysis_util.py](analysis_util.py) -- some functions used for data analysis, e.g., correlations computation.
|
50 |
+
- [analysis.ipynb](analysis.ipynb) -- compute the correlations, the resulting tables.
|
51 |
+
- [chart_processing.ipynb](chart_processing.ipynb) -- Jupyter Notebook that draws the charts that were used in the
|
52 |
+
presentation/thesis.
|