fix stop tokens to match new prompt formatting, stream instruct response, add comments about concurrency to config e0bf185 winglian commited on May 15, 2023
fix layout, max size back to 1, llama.cpp doesn't like parallel calls 80c7d2e winglian commited on May 15, 2023
try to fix combining gr.interface with blocks, try to increase concurrency on larger gpus dce6894 winglian commited on May 15, 2023