Seems can not use gguf file with response_format setting.

#5
by svjack - opened
llm = llama_cpp.Llama.from_pretrained(
    repo_id="bartowski/aya-23-8B-GGUF",
    filename="aya-23-8B-Q4_K_M.gguf",
    #tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained("CohereForAI/aya-23-8B"),
    verbose=False,
    n_gpu_layers = -1,
    n_ctx = 3060 * 3
)

prompt =  '''
将下面的json内容翻译成中文,并保留相应的json格式:
    {'problem_description': "Two space agencies, Galactic Explorations and Interstellar Missions, are discussing the potential of Planet X-31 for human colonization. Galactic Explorations claims that Planet X-31 is an ideal candidate due to its Earth-like atmosphere and abundant water resources. Interstellar Missions, however, argues that Planet X-31 is not suitable for colonization because of its high levels of radiation, which they claim would make it impossible for humans to survive there. Galactic Explorations counters this argument by stating that humans could develop technology to shield themselves from radiation in the future. Which statement best describes the fallacy in Galactic Explorations' argument?", 'additional_problem_info': "A) The fallacy is that Galactic Explorations assumes humans can develop technology to shield themselves from radiation without any evidence. \nB) The fallacy is that Interstellar Missions is incorrect about the high levels of radiation on Planet X-31. \nC) The fallacy is that Galactic Explorations believes Planet X-31 is the only planet suitable for human colonization. \nD) The fallacy is that Interstellar Missions doesn't believe in the potential of human technological advancements.", 'chain_of_thought': "Galactic Explorations' argument assumes that humans will be able to develop technology to shield themselves from radiation in the future. However, there is no evidence presented in the problem description to support this claim. Therefore, their argument contains a fallacy.", 'correct_solution': 'A) The fallacy is that Galactic Explorations assumes humans can develop technology to shield themselves from radiation without any evidence.'}
'''

from IPython.display import clear_output
messages = [
    {
        "role": "user",
        "content": prompt
    }
]
response = llm.create_chat_completion(
            messages=messages,
    response_format={
                    "type": "json_object",
                    "schema": {
                        "type": "object",
                        "properties": {
                            "problem_description": {"type": "string"},
                            "additional_problem_info": {"type": "string"},
                            "chain_of_thought": {"type": "string"},
                            "correct_solution": {"type": "string"},
                        },
                        "required": ["problem_description", "additional_problem_info", "chain_of_thought", "correct_solution"],
                    }
                },
                stream=True,
            )

req = ""
for chunk in response:
    delta = chunk["choices"][0]["delta"]
    if "content" not in delta:
        continue
    #print(delta["content"], end="", flush=True)
    req += delta["content"]
    clear_output(wait = True)
    print(req)

when I run this, python kernel died.
Can someone help me ?😊

svjack changed discussion status to closed
svjack changed discussion status to open

You cannot do that with Llama cpp and you should do it with ONNX.
It is based on a T5 transformer.

Sign up or log in to comment