Spaces:

openaccess-ai-collective
/

ggml-ui

Build error

winglian commited on May 15, 2023

Commit

e0bf185

•

1 Parent(s): 44eb762

fix stop tokens to match new prompt formatting, stream instruct response, add comments about concurrency to config

Files changed (2) hide show

config.yml CHANGED Viewed

@@ -10,7 +10,8 @@ chat:
   stop:
     - "</s>"
     - "<unk>"
-    - "### User:"
 queue:
   max_size: 16
-  concurrency_count: 1

   stop:
     - "</s>"
     - "<unk>"
+    - "### USER:"
+    - "USER:"
 queue:
   max_size: 16
+  concurrency_count: 1  # leave this at 1, llama-cpp-python doesn't handle concurrent requests and will crash the entire app

tabbed.py CHANGED Viewed

@@ -49,7 +49,7 @@ def chat(history, system_message, max_tokens, temperature, top_p, top_k, repeat_
     ):
         answer = output['choices'][0]['text']
         history[-1][1] += answer
         yield history, history
@@ -66,8 +66,11 @@ start_message = """
 def generate_text_instruct(input_text):
-    output = llm(f"### Instruction:\n{input_text}\n\n### Response:\n",  echo=False, **config['chat'])
-    return output['choices'][0]['text']
 instruct_interface = gr.Interface(

     ):
         answer = output['choices'][0]['text']
         history[-1][1] += answer
+        # stream the response
         yield history, history
 def generate_text_instruct(input_text):
+    response = ""
+    for output in llm(f"### Instruction:\n{input_text}\n\n### Response:\n",  echo=False, **config['chat']):
+        answer = output['choices'][0]['text']
+        response += answer
+        yield response
 instruct_interface = gr.Interface(