solidity-generator / README.md
ckandemir's picture
Update README.md
3428e70
---
license: apache-2.0
base_model: codeparrot/codeparrot-small
tags:
- generated_from_trainer
model-index:
- name: solidity-generator
results: []
datasets:
- mwritescode/slither-audited-smart-contracts
pipeline_tag: text-generation
language:
- en
library_name: transformers
widget:
- text: "contract MyToken is ERC20{"
---
# solidity-generator
This model is a model specialized in generating Solidity contract codes. Derived from the [codeparrot/codeparrot-small](https://huggingface.co/codeparrot/codeparrot-small) model, it's been meticulously trained on an extensive set of Solidity contracts and patterns, making it apt for assisting in drafting or suggesting contract structures.
## Model description
This model has been designed specifically for generating Solidity contracts. Being a derivative of the `codeparrot-small` model, it retains the broader capabilities of the parent model while demonstrating a keen proficiency in understanding and generating Solidity-centric texts.
### Performance
The model reported a loss of `0.2180` on the evaluation set.
## Intended Uses & Limitations
### Intended Uses:
1. Assist developers by auto-generating contract code snippets based on prompts.
2. Help in understanding and drafting complex contract structures.
### Limitations:
1. The generated code must be reviewed for security and functional correctness.
2. The clarity of the generated code largely depends on the specificity of the prompt.
## Training Details
### Dataset
The model was fine-tuned on [mwritescode/slither-audited-smart-contracts](https://huggingface.co/datasets/mwritescode/slither-audited-smart-contracts) dataset comprised of a range of Solidity contracts.
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 7e-05
- train_batch_size: 5
- eval_batch_size: 5
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 144
- num_epochs: 8
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:-----:|:---------------:|
| 0.302 | 0.35 | 2000 | 0.3237 |
| 0.298 | 0.69 | 4000 | 0.2871 |
| 0.232 | 1.04 | 6000 | 0.2645 |
| 0.2415 | 1.38 | 8000 | 0.2522 |
| 0.2261 | 1.73 | 10000 | 0.2431 |
| 0.1924 | 2.07 | 12000 | 0.2332 |
| 0.1913 | 2.42 | 14000 | 0.2282 |
| 0.2152 | 2.76 | 16000 | 0.2215 |
| 0.1508 | 3.11 | 18000 | 0.2180 |
### Framework versions
- Transformers 4.31.0
- Pytorch 2.0.1+cu118
- Datasets 2.14.3
- Tokenizers 0.13.3
## How to Use
If you wish to use this model to generate Solidity contract code, follow the steps below:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("ckandemir/solidity_generator")
model = AutoModelForCausalLM.from_pretrained("ckandemir/solidity_generator")
# Input your code prompt
input_text = "contract MyToken is ERC20{"
input_ids = tokenizer.encode(input_text, return_tensors='pt')
sample_output = model.generate(input_ids, do_sample=True, max_length=400, num_return_sequences=1, temperature=0.7)
# Decode and print the generated text
generated_text = tokenizer.decode(sample_output[0], skip_special_tokens=True)
print(generated_text)
```