File size: 2,726 Bytes
52615a9
7f55b90
fdb96c8
7f55b90
 
 
 
 
 
 
52615a9
7f55b90
 
 
 
 
 
 
 
 
09e3de1
 
7f55b90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
language: 
  - en
license: apache-2.0
tags:
  - solidity
  - web3
  - code generation
widget:
- text: "pragma solidity ^0.5.7;\n// Context: ParentA | Functions: helloA helloB | Constants: constantA \ncontract HelloWorld is ParentA {"
---

# A code autocomplete T5 model for solidity
- Hello world example to use this model, notice the input `text` includes
  - Header solidity version like `pragma solidity ^0.5.7`
  - Ancestor class/library info, e.g. public functions and constants from `ParentA`
  - Contract/Library/Interface declaration header, e.g. `HelloWorld` ended with `{`

```python
from transformers import AutoTokenizer, T5ForConditionalGeneration
tokenizer = AutoTokenizer.from_pretrained("hululuzhu/solidity-t5")
model = T5ForConditionalGeneration.from_pretrained("hululuzhu/solidity-t5")

text = """pragma solidity ^0.5.7;
// Context: ParentA | Functions: helloA helloB | Constants: constantA 
contract HelloWorld is ParentA {"""
input_ids = model.tokenizer(text, return_tensors="pt", truncation=True).input_ids.to('cuda')

# Need to tune beam/topk/topp params to get good outcome
generated_ids = model.model.generate(input_ids, max_length=256, num_beams=5, top_p=0.95, top_k=50)
print(model.tokenizer.decode(generated_ids[0], skip_special_tokens=True))
```


- Base T5 code model: https://huggingface.co/Salesforce/codet5-large
- Source data: https://huggingface.co/datasets/mwritescode/slither-audited-smart-contracts
  - Processing steps: Clean, contract-level segmentation sepration, split in and out
  - After processing input sample

    ```
    pragma solidity 0.5.7;
    // Context: PauserRole | Functions: isPauser addPauser renouncePauser | Constants: 
    contract Pausable is PauserRole {
    ```

  - After processing output sample (**notice indentation is bad, this is intentional to reduce token size**)

    ```
    event Paused(address account);
    event Unpaused(address account);
    bool private _pausableActive;
    bool private _paused;
    constructor () internal {
    _paused = false;
    }
    function paused() public view returns (bool) {
    return _paused;
    }
    modifier whenNotPaused() {
    require(!_paused);
    _;
    }
    modifier whenPaused() {
    require(_paused);
    _;
    }
    function pause() public onlyPauser whenNotPaused whenPausableActive {
    _paused = true;
    emit Paused(msg.sender);
    }
    function unpause() public onlyPauser whenPaused whenPausableActive {
    _paused = false;
    emit Unpaused(msg.sender);
    }
    function _setPausableActive(bool _active) internal {
    _pausableActive = _active;
    }
    modifier whenPausableActive() {
    require(_pausableActive);
    _;
    }
    }
    ```
- Source training code: To be added