--- license: apache-2.0 base_model: codeparrot/codeparrot-small tags: - generated_from_trainer model-index: - name: solidity-generator results: [] datasets: - mwritescode/slither-audited-smart-contracts pipeline_tag: text-generation language: - en library_name: transformers widget: - text: "contract MyToken is ERC20{" --- # solidity-generator This model is a model specialized in generating Solidity contract codes. Derived from the [codeparrot/codeparrot-small](https://huggingface.co/codeparrot/codeparrot-small) model, it's been meticulously trained on an extensive set of Solidity contracts and patterns, making it apt for assisting in drafting or suggesting contract structures. ## Model description This model has been designed specifically for generating Solidity contracts. Being a derivative of the `codeparrot-small` model, it retains the broader capabilities of the parent model while demonstrating a keen proficiency in understanding and generating Solidity-centric texts. ### Performance The model reported a loss of `0.2180` on the evaluation set. ## Intended Uses & Limitations ### Intended Uses: 1. Assist developers by auto-generating contract code snippets based on prompts. 2. Help in understanding and drafting complex contract structures. ### Limitations: 1. The generated code must be reviewed for security and functional correctness. 2. The clarity of the generated code largely depends on the specificity of the prompt. ## Training Details ### Dataset The model was fine-tuned on [mwritescode/slither-audited-smart-contracts](https://huggingface.co/datasets/mwritescode/slither-audited-smart-contracts) dataset comprised of a range of Solidity contracts. ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 7e-05 - train_batch_size: 5 - eval_batch_size: 5 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 144 - num_epochs: 8 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:-----:|:---------------:| | 0.302 | 0.35 | 2000 | 0.3237 | | 0.298 | 0.69 | 4000 | 0.2871 | | 0.232 | 1.04 | 6000 | 0.2645 | | 0.2415 | 1.38 | 8000 | 0.2522 | | 0.2261 | 1.73 | 10000 | 0.2431 | | 0.1924 | 2.07 | 12000 | 0.2332 | | 0.1913 | 2.42 | 14000 | 0.2282 | | 0.2152 | 2.76 | 16000 | 0.2215 | | 0.1508 | 3.11 | 18000 | 0.2180 | ### Framework versions - Transformers 4.31.0 - Pytorch 2.0.1+cu118 - Datasets 2.14.3 - Tokenizers 0.13.3 ## How to Use If you wish to use this model to generate Solidity contract code, follow the steps below: ```python from transformers import AutoTokenizer, AutoModelForCausalLM # Load the model and tokenizer tokenizer = AutoTokenizer.from_pretrained("ckandemir/solidity_generator") model = AutoModelForCausalLM.from_pretrained("ckandemir/solidity_generator") # Input your code prompt input_text = "contract MyToken is ERC20{" input_ids = tokenizer.encode(input_text, return_tensors='pt') sample_output = model.generate(input_ids, do_sample=True, max_length=400, num_return_sequences=1, temperature=0.7) # Decode and print the generated text generated_text = tokenizer.decode(sample_output[0], skip_special_tokens=True) print(generated_text) ```