iamtarun commited on
Commit
9b294e2
1 Parent(s): 38badf6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - iamtarun/code_contest_python3_alpaca
4
+ language:
5
+ - en
6
+ metrics:
7
+ - code_eval
8
+ library_name: transformers
9
+ pipeline_tag: text-generation
10
+ tags:
11
+ - code
12
+ ---
13
+
14
+ # Competitive Programming LLM for Python Language
15
+
16
+ This model is a finetuned version of [codegen350M-mono](https://huggingface.co/Salesforce/codegen-350M-mono) on cleaned coding competition [dataset](https://huggingface.co/datasets/iamtarun/code_contest_python3_alpaca) that uses alpaca style prompts while training.
17
+
18
+ ## Prompt function
19
+
20
+ ```python
21
+ '''
22
+ This function generates prompts using the problem description, sample input, and output examples.
23
+ @param1 description: str - text problem description
24
+ @param2 inputs: list - list of sample input examples
25
+ @param3 outputs: list - list of outputs corresponding to inputs
26
+ also, len(inputs) == len(outputs)
27
+ '''
28
+ def generate_prompt(description, inputs, outputs):
29
+ text = ("Below is a problem description that describes the problem. Write code in Python that appropriately solves the problem.\n\n"
30
+ "### Description:\n"
31
+ f"{description}\n\n")
32
+ assert len(inputs) == len(outputs)
33
+ c = 1
34
+ for inp, out in zip(inputs, outputs):
35
+ text += ("### Input:\n"
36
+ f"{inp}\n"
37
+ "### Output:\n"
38
+ f"{out}\n\n")
39
+ c += 1
40
+ if c > 2:
41
+ break
42
+ text += "### Code:\n"
43
+ return text
44
+ ```
45
+
46
+ ## Usage
47
+
48
+ ```python
49
+ from transformers import AutoModelForCausalLM, AutoTokenizer
50
+
51
+ # load model and tokenizer
52
+ model = AutoModelForCausalLM.from_pretrained(args.model_path, device_map="auto")
53
+ tokenizer = AutoTokenizer.from_pretrained(args.model_path)
54
+
55
+ # loading model for inference
56
+ model.eval()
57
+
58
+ # inference function
59
+ '''
60
+ This function takes text prompt as input which is generated from the generate_prompt function and returns the generated response
61
+
62
+ @param1 prompt: str - text prompt generated using generate_prompt function.
63
+ '''
64
+ def pipe(prompt):
65
+ device = "cuda"
66
+ inputs = tokenizer(prompt, return_tensors="pt").to(device)
67
+ with torch.no_grad():
68
+ output = model.generate(**inputs,
69
+ max_length=512,
70
+ do_sample=True,
71
+ temperature=0.5,
72
+ top_p=0.95,
73
+ repetition_penalty=1.15)
74
+ return tokenizer.decode(output[0].tolist(),
75
+ skip_special_tokens=True,
76
+ clean_up_tokenization_space=False)
77
+
78
+ # generating code for a problem description
79
+ description = "Mr. Chanek has an integer represented by a string s. Zero or more digits have been erased and are denoted by the character _. There are also zero or more digits marked by the character X, meaning they're the same digit. Mr. Chanek wants to count the number of possible integer s, where s is divisible by 25. Of course, s must not contain any leading zero. He can replace the character _ with any digit. He can also replace the character X with any digit, but it must be the same for every character X. As a note, a leading zero is any 0 digit that comes before the first nonzero digit in a number string in positional notation. For example, 0025 has two leading zeroes. An exception is the integer zero, (0 has no leading zero, but 0000 has three leading zeroes). Input One line containing the string s (1 ≤ |s| ≤ 8). The string s consists of the characters 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, _, and X. Output Output an integer denoting the number of possible integer s. Examples Input 25 Output 1 Input _00 Output 9 Input _XX Output 9 Input 0 Output 1 Input 0_25 Output 0 Note In the first example, the only possible s is 25. In the second and third example, s ∈ \{100, 200,300,400,500,600,700,800,900\}. In the fifth example, all possible s will have at least one leading zero."
80
+ inputs = ["0\n", "_XX\n", "_00\n", "0_25\n"]
81
+ outputs = ["1\n", "9\n", "9\n", "0\n"]
82
+ prompt = generate_prompt(description, inputs, outputs)
83
+ print(pipe(prompt))
84
+ print("\n", "="*100, "\n")
85
+ ```