exLong

exLong is a large language model instruction-tuned from CodeLlama and embeds reasoning about traces that lead to throw statements, conditional expressions that guard throw statements, and non-exceptional behavior tests that execute similar traces.

The model is fine-tuned from CodeLlama-7b-Instruct using LoRA.

The source code will be public soon!

Size	Base Model	Providing EBT name in the prompt	Not providing EBT name in the prompt
7B	codellama/CodeLlama-7b-Instruct-hf	`revision="with-etest-name"	`revision="no-etest-name"

Model Use

pip install transformers accelerate bitsandbytes peft

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig

# Load the base model
base_model_name = "codellama/CodeLlama-7b-Instruct-hf"
base_model = AutoModelForCausalLM.from_pretrained(base_model_name)

# Load the LoRA configuration
peft_model_id = "EngineeringSoftware/exLong"
config = PeftConfig.from_pretrained(peft_model_id, revision="with-etest-name")  # set revision to "no-etest-name" for no EBT name

# Load the LoRA model
model = PeftModel.from_pretrained(base_model, peft_model_id)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

prompt = """<s>[INST] <<SYS>>
You are a helpful programming assistant and an expert Java programmer. You are helping a user writing exceptional-behavior tests for their Java code.
<</SYS>>

Please complete an exceptional behavior test method in Java to test the method 'factorial' for the exception 'IllegalArgumentException'.
The method to be tested is defined as:
```java
public static long factorial(int n) {
    if (n < 0) {
        throw new IllegalArgumentException("Number must be non-negative.");
    }
    long result = 1;
    for (int i = 1; i <= n; i++) {
        result *= i;
    }
    return result;
}
` ` `
Please only give the new exceptional-behavior test method to complete the following test class. Do NOT use extra libraries or define new helper methods. Return **only** the code in the completion:
```java
public class FactorialTest {
}
` ` `
"""

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

# Generate code
output = model.generate(
    input_ids=input_ids,
    max_new_tokens=100,
    temperature=0.2,      # Sampling temperature (lower is more deterministic)
    top_p=0.95,           # Top-p (nucleus) sampling
    do_sample=True        # Enable sampling
)

# Decode and print the generated code
generated_code = tokenizer.decode(output[0], skip_special_tokens=True)
print("Generated Code:")
print(generated_code)