Abnormal indent when complete the code
Hi, I downloaded weights from this repo and tried to evaluate this model on humaneval benchmark. But when complete the code in humaneval dataset, something abnormal happened.
Here is one example:
Code before "return (numbers.count(threshold) > 0)" is prompt given, and this return statement is generated by this model. I can understand if model give wrong code, but the werid part is its first line indent is 5, not 4. And I generated 20 times for each prompt in humaneval datasets. All of them first line indent is 5 which is wrong. Also, other lines indent is correct except first line.
I am confused by the result. This is code I used:
Prompt here means prompt in humaneval datasets, just as I say above.
Do you have ideas on that? Did I do something wrong which bring this result? Thanks in advance!
Are you stripping the humaneval prompt from the right side? I.e. prompt.strip()
to make sure it ends exactly with """
The prompt I used is:
'from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n """ Check if in given list of numbers, are any two numbers closer to each other than\n given threshold.\n >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n False\n >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n True\n """\n'.
There is a '\n' at the right side. After using prompt.strip() to delete '\n' at the right side. Its indent is correct.
Thanks for your help! BTW, do you know why this happens? In my thought, '\n' at the end of code prompt won't affect completion.
It's a tokenization problem. When you provide an instruction you want to make sure that if you're expected answer were to directly follow the instruction there is a clear token separation.
E.g. if you tokenize...4.0, 5.0, 2.0], 0.3)\n True\n """\n return (numbers...
You will see that it splits the tokens to sth like ...\n"""
, \n
, return
... ; So you want to make sure your instruction ends right after \n"""
so it can produce the next natural token \n
. Otherwise, you offset its distribution leading to generation problems.
Got it! Thanks for your reply.