qhduan commited on
Commit
0b33c64
1 Parent(s): 619286f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md CHANGED
@@ -3,3 +3,44 @@ license: apache-2.0
3
  widget:
4
  - text: "<|endoftext|>\ndef load_excel(path):\n return pd.read_excel(path)\n# docstring\n\"\"\""
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  widget:
4
  - text: "<|endoftext|>\ndef load_excel(path):\n return pd.read_excel(path)\n# docstring\n\"\"\""
5
  ---
6
+
7
+ ## Basic info
8
+
9
+ model based [Salesforce/codegen-350M-mono](https://huggingface.co/Salesforce/codegen-350M-mono)
10
+
11
+ fine-tuned with data [codeparrot/github-code-clean](https://huggingface.co/datasets/codeparrot/github-code-clean)
12
+
13
+ ## Usage
14
+
15
+ ```python
16
+ from transformers import AutoTokenizer, AutoModelForCausalLM
17
+
18
+ model_type = 'kdf/python-docstring-generation'
19
+ tokenizer = AutoTokenizer.from_pretrained(model_type)
20
+ model = AutoModelForCausalLM.from_pretrained(model_type)
21
+
22
+ inputs = tokenizer('''<|endoftext|>
23
+ def load_excel(path):
24
+ return pd.read_excel(path)
25
+
26
+ # docstring
27
+ """''', return_tensors='pt')
28
+
29
+ doc_max_length = 128
30
+
31
+ generated_ids = model.generate(
32
+ **inputs,
33
+ max_length=inputs.input_ids.shape[1] + doc_max_length,
34
+ do_sample=False,
35
+ return_dict_in_generate=True,
36
+ top_p = 0.9,
37
+ num_return_sequences=1,
38
+ output_scores=True,
39
+ pad_token_id=50256,
40
+ eos_token_id=50256 # <|endoftext|>
41
+ )
42
+
43
+ ret = tokenizer.decode(generated_ids.sequences[0], skip_special_tokens=False)
44
+ print(ret)
45
+
46
+ ```