Salesforce/xgen-7b-8k-base · How to prompt the model properly?

Loaded this model in 8-bit mode (quantized) into ooba. It runs ~4.5 tokens/second on a 3080Ti, not that bad.

Left the context window length at 2048 (default), so not using the 8k capability.

Attempted to test the model with the following context and prompt:

As an expert Python developer I am writing a Python 3 function to find all duplicate files in a folder.

Requirements:
- At any depth in the subdirectory structure.
- The path to the folder must be the parameter of the function.
- Two files are duplicates if they have the same size and contents.
- Files of unique size cannot be duplicates (optimization).
- File contents can be checked based on their SHA256 hashes (checksums).
- Do not read whole files into memory, calculate the hash in 32kB chunks.
- The risk of a hash collision is acceptable in this use case.
- Must find all duplicate files.
- Must NOT delete any files.
- The return value of the function must be a dictionary where keys are the file size and checksum in a tuple, values are the list of paths. Return only duplicate files.
- Must work on both Windows and UNIX (Linux, MAC).
- No need to repeat this documentation in source code comments / docstrings.

The Python 3 code is the following:

It started to write the code, but goes astray at every single detail and gets it just wrong.

Tested the above prompt (with slight modifications to match the expected way of prompting) on these models:

WizardLM/WizardCoder-15B-V1.0
Salesforce/codegen2-16B

Both of them worked reasonably well, both at 16bit (bfloat16) and 8-bit quantized. Interestingly they seemed to produce better code at 8-bit quantization, less prone to over-complicate it. I don't know why.

So the task is not impossible, even for small open-source models.

Am I doing something wrong with the prompt above? Maybe the quantization hurts the model? Any insights?