exLong
exLong is a large language model instruction-tuned from CodeLlama and embeds reasoning about traces that lead to throw statements, conditional expressions that guard throw statements, and non-exceptional behavior tests that execute similar traces.
The model is fine-tuned from CodeLlama-7b-Instruct using LoRA.
The source code will be public soon!
Size | Base Model | Providing EBT name in the prompt | Not providing EBT name in the prompt |
---|---|---|---|
7B | codellama/CodeLlama-7b-Instruct-hf | `revision="with-etest-name" | `revision="no-etest-name" |
Model Use
pip install transformers accelerate bitsandbytes peft
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
# Load the base model
base_model_name = "codellama/CodeLlama-7b-Instruct-hf"
base_model = AutoModelForCausalLM.from_pretrained(base_model_name)
# Load the LoRA configuration
peft_model_id = "EngineeringSoftware/exLong"
config = PeftConfig.from_pretrained(peft_model_id, revision="with-etest-name") # set revision to "no-etest-name" for no EBT name
# Load the LoRA model
model = PeftModel.from_pretrained(base_model, peft_model_id)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
prompt = """<s>[INST] <<SYS>>
You are a helpful programming assistant and an expert Java programmer. You are helping a user writing exceptional-behavior tests for their Java code.
<</SYS>>
Please complete an exceptional behavior test method in Java to test the method 'factorial' for the exception 'IllegalArgumentException'.
The method to be tested is defined as:
```java
public static long factorial(int n) {
if (n < 0) {
throw new IllegalArgumentException("Number must be non-negative.");
}
long result = 1;
for (int i = 1; i <= n; i++) {
result *= i;
}
return result;
}
` ` `
Please only give the new exceptional-behavior test method to complete the following test class. Do NOT use extra libraries or define new helper methods. Return **only** the code in the completion:
```java
public class FactorialTest {
}
` ` `
"""
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
# Generate code
output = model.generate(
input_ids=input_ids,
max_new_tokens=100,
temperature=0.2, # Sampling temperature (lower is more deterministic)
top_p=0.95, # Top-p (nucleus) sampling
do_sample=True # Enable sampling
)
# Decode and print the generated code
generated_code = tokenizer.decode(output[0], skip_special_tokens=True)
print("Generated Code:")
print(generated_code)
- Downloads last month
- 32