se2p
/

AsserT5

b-fein commited on Feb 6

Commit

ad82e08

verified ·

1 Parent(s): cccea6b

link arXiv preprint in readme

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,22 +1,22 @@
----
-license: cc-by-4.0
-base_model:
-- Salesforce/codet5-large
----
-# AsserT5: Test Assertion Generation Using a Fine-Tuned Code Language Model
-Part of the replication package for our paper at AST 2025 (ToDo: add doi-link when known).
-To be used in combination with our main replication package at https://doi.org/10.5281/zenodo.14703162.
-AsserT5 is a fine-tuned [CodeT5](https://huggingface.co/Salesfoce/codet5-large) trained to generate assertion statements for Java JUnit test cases.
-It was trained on an extended variant of the [methods2test](https://github.com/microsoft/methods2test) dataset.
-## Structure
-- The top-level number indicates the maximum number of assertions allowed per test case in the training dataet.
-- The next level below indicates the model variant:
-    - `abstract`: Identifiers in the data are replaced with abstract tokens.
-    - `raw`: The source code is tokenised as-is.
-    - `test-method`: The model is trained only on the test case code rather than test case + focal method pairs.

+---
+license: cc-by-4.0
+base_model:
+- Salesforce/codet5-large
+---
+# AsserT5: Test Assertion Generation Using a Fine-Tuned Code Language Model
+Part of the replication package for our paper at AST 2025 (preprint: https://arxiv.org/abs/2502.02708).
+To be used in combination with our main replication package at https://doi.org/10.5281/zenodo.14703162.
+AsserT5 is a fine-tuned [CodeT5](https://huggingface.co/Salesfoce/codet5-large) trained to generate assertion statements for Java JUnit test cases.
+It was trained on an extended variant of the [methods2test](https://github.com/microsoft/methods2test) dataset.
+## Structure
+- The top-level number indicates the maximum number of assertions allowed per test case in the training dataet.
+- The next level below indicates the model variant:
+    - `abstract`: Identifiers in the data are replaced with abstract tokens.
+    - `raw`: The source code is tokenised as-is.
+    - `test-method`: The model is trained only on the test case code rather than test case + focal method pairs.