se2p
/

b-fein commited on
Commit
ad82e08
·
verified ·
1 Parent(s): cccea6b

link arXiv preprint in readme

Browse files
Files changed (1) hide show
  1. README.md +22 -22
README.md CHANGED
@@ -1,22 +1,22 @@
1
- ---
2
- license: cc-by-4.0
3
- base_model:
4
- - Salesforce/codet5-large
5
- ---
6
-
7
- # AsserT5: Test Assertion Generation Using a Fine-Tuned Code Language Model
8
-
9
- Part of the replication package for our paper at AST 2025 (ToDo: add doi-link when known).
10
- To be used in combination with our main replication package at https://doi.org/10.5281/zenodo.14703162.
11
-
12
- AsserT5 is a fine-tuned [CodeT5](https://huggingface.co/Salesfoce/codet5-large) trained to generate assertion statements for Java JUnit test cases.
13
- It was trained on an extended variant of the [methods2test](https://github.com/microsoft/methods2test) dataset.
14
-
15
-
16
- ## Structure
17
-
18
- - The top-level number indicates the maximum number of assertions allowed per test case in the training dataet.
19
- - The next level below indicates the model variant:
20
- - `abstract`: Identifiers in the data are replaced with abstract tokens.
21
- - `raw`: The source code is tokenised as-is.
22
- - `test-method`: The model is trained only on the test case code rather than test case + focal method pairs.
 
1
+ ---
2
+ license: cc-by-4.0
3
+ base_model:
4
+ - Salesforce/codet5-large
5
+ ---
6
+
7
+ # AsserT5: Test Assertion Generation Using a Fine-Tuned Code Language Model
8
+
9
+ Part of the replication package for our paper at AST 2025 (preprint: https://arxiv.org/abs/2502.02708).
10
+ To be used in combination with our main replication package at https://doi.org/10.5281/zenodo.14703162.
11
+
12
+ AsserT5 is a fine-tuned [CodeT5](https://huggingface.co/Salesfoce/codet5-large) trained to generate assertion statements for Java JUnit test cases.
13
+ It was trained on an extended variant of the [methods2test](https://github.com/microsoft/methods2test) dataset.
14
+
15
+
16
+ ## Structure
17
+
18
+ - The top-level number indicates the maximum number of assertions allowed per test case in the training dataet.
19
+ - The next level below indicates the model variant:
20
+ - `abstract`: Identifiers in the data are replaced with abstract tokens.
21
+ - `raw`: The source code is tokenised as-is.
22
+ - `test-method`: The model is trained only on the test case code rather than test case + focal method pairs.