|
--- |
|
license: apache-2.0 |
|
base_model: codeparrot/codeparrot-small |
|
tags: |
|
- generated_from_trainer |
|
model-index: |
|
- name: solidity-generator |
|
results: [] |
|
datasets: |
|
- mwritescode/slither-audited-smart-contracts |
|
pipeline_tag: text-generation |
|
language: |
|
- en |
|
library_name: transformers |
|
widget: |
|
- text: "contract MyToken is ERC20{" |
|
--- |
|
|
|
|
|
# solidity-generator |
|
|
|
This model is a model specialized in generating Solidity contract codes. Derived from the [codeparrot/codeparrot-small](https://huggingface.co/codeparrot/codeparrot-small) model, it's been meticulously trained on an extensive set of Solidity contracts and patterns, making it apt for assisting in drafting or suggesting contract structures. |
|
|
|
|
|
## Model description |
|
|
|
This model has been designed specifically for generating Solidity contracts. Being a derivative of the `codeparrot-small` model, it retains the broader capabilities of the parent model while demonstrating a keen proficiency in understanding and generating Solidity-centric texts. |
|
|
|
### Performance |
|
|
|
The model reported a loss of `0.2180` on the evaluation set. |
|
|
|
## Intended Uses & Limitations |
|
|
|
|
|
### Intended Uses: |
|
1. Assist developers by auto-generating contract code snippets based on prompts. |
|
2. Help in understanding and drafting complex contract structures. |
|
|
|
### Limitations: |
|
1. The generated code must be reviewed for security and functional correctness. |
|
2. The clarity of the generated code largely depends on the specificity of the prompt. |
|
|
|
## Training Details |
|
|
|
### Dataset |
|
The model was fine-tuned on [mwritescode/slither-audited-smart-contracts](https://huggingface.co/datasets/mwritescode/slither-audited-smart-contracts) dataset comprised of a range of Solidity contracts. |
|
|
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 7e-05 |
|
- train_batch_size: 5 |
|
- eval_batch_size: 5 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_steps: 144 |
|
- num_epochs: 8 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|:-------------:|:-----:|:-----:|:---------------:| |
|
| 0.302 | 0.35 | 2000 | 0.3237 | |
|
| 0.298 | 0.69 | 4000 | 0.2871 | |
|
| 0.232 | 1.04 | 6000 | 0.2645 | |
|
| 0.2415 | 1.38 | 8000 | 0.2522 | |
|
| 0.2261 | 1.73 | 10000 | 0.2431 | |
|
| 0.1924 | 2.07 | 12000 | 0.2332 | |
|
| 0.1913 | 2.42 | 14000 | 0.2282 | |
|
| 0.2152 | 2.76 | 16000 | 0.2215 | |
|
| 0.1508 | 3.11 | 18000 | 0.2180 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.31.0 |
|
- Pytorch 2.0.1+cu118 |
|
- Datasets 2.14.3 |
|
- Tokenizers 0.13.3 |
|
|
|
|
|
## How to Use |
|
If you wish to use this model to generate Solidity contract code, follow the steps below: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
# Load the model and tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("ckandemir/solidity_generator") |
|
model = AutoModelForCausalLM.from_pretrained("ckandemir/solidity_generator") |
|
|
|
# Input your code prompt |
|
input_text = "contract MyToken is ERC20{" |
|
input_ids = tokenizer.encode(input_text, return_tensors='pt') |
|
sample_output = model.generate(input_ids, do_sample=True, max_length=400, num_return_sequences=1, temperature=0.7) |
|
|
|
# Decode and print the generated text |
|
generated_text = tokenizer.decode(sample_output[0], skip_special_tokens=True) |
|
print(generated_text) |
|
``` |
|
|