Spaces:

huggingchat
/

chat-ui

Running

App Files Files Community

601

nsarrazin HF staff

chenhunghan Henry Chen Mishig

coyotte508 HF staff commited on Nov 15, 2023

Commit

9db8ced

•

1 Parent(s): 04a868e

Modular backends & support for openAI & AWS endpoints (#541)

Browse files

* Fix the response

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Should use /completions

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Use async generator

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Use openai npm

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Fix generateFromDefaultEndpoint

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Fix last char become undefined

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Better support for system prompt

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Updates

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Revert

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Update README

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Default system prompt

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* remove sk-

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Fixing types

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Fix lockfile

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Move .optional

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Add try...catch and controller.error(error)

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* baseURL

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Format

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Fix types

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Fix again

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Better error message

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Update README

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Refactor backend to add support for modular backends

* readme fix

* readme update

* add support for lambda on aws endpoint

* upsate doc for lambda support

* fix typecheck

* make imports really optional

* readme fixes

* make endpoint creator async

* Update README.md

Co-authored-by: Henry Chen <1474479+chenhunghan@users.noreply.github.com>

* Update README.md

Co-authored-by: Henry Chen <1474479+chenhunghan@users.noreply.github.com>

* Update src/lib/server/endpoints/openai/endpointOai.ts

Co-authored-by: Henry Chen <1474479+chenhunghan@users.noreply.github.com>

* trailing comma

* Update README.md

Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu>

* change readme example name

* Update src/lib/server/models.ts

Co-authored-by: Eliott C. <coyotte508@gmail.com>

* fixed preprompt to use conversation.preprompt

* Make openAI endpoint compatible with Azure OpenAI

* surface errors in generation

* Added support for llamacpp endpoint

* fix llamacpp endpoint so it properly stops

* Add llamacpp example to readme

* Add support for legacy configs

---------

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>
Co-authored-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>
Co-authored-by: Henry Chen <1474479+chenhunghan@users.noreply.github.com>
Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu>
Co-authored-by: Eliott C. <coyotte508@gmail.com>

Files changed (20) hide show

.env +1 -0
README.md +111 -4
package-lock.json +227 -2
package.json +4 -1
src/lib/server/endpoints/aws/endpointAws.ts +64 -0
src/lib/server/endpoints/endpoints.ts +42 -0
src/lib/server/endpoints/llamacpp/endpointLlamacpp.ts +100 -0
src/lib/server/endpoints/openai/endpointOai.ts +82 -0
src/lib/server/endpoints/openai/openAIChatToTextGenerationStream.ts +32 -0
src/lib/server/endpoints/openai/openAICompletionToTextGenerationStream.ts +32 -0
src/lib/server/endpoints/tgi/endpointTgi.ts +37 -0
src/lib/server/generateFromDefaultEndpoint.ts +24 -106
src/lib/server/modelEndpoint.ts +0 -50
src/lib/server/models.ts +57 -41
src/lib/server/summarize.ts +2 -7
src/lib/server/websearch/generateQuery.ts +2 -5
src/lib/utils/trimPrefix.ts +0 -6
src/lib/utils/trimSuffix.ts +0 -6
src/routes/conversation/[id]/+page.svelte +2 -0
src/routes/conversation/[id]/+server.ts +75 -130

.env CHANGED Viewed

@@ -8,6 +8,7 @@ MONGODB_DIRECT_CONNECTION=false
 COOKIE_NAME=hf-chat
 HF_ACCESS_TOKEN=#hf_<token> from from https://huggingface.co/settings/token
 HF_API_ROOT=https://api-inference.huggingface.co/models
 # used to activate search with web functionality. disabled if none are defined. choose one of the following:
 YDC_API_KEY=#your docs.you.com api key here

 COOKIE_NAME=hf-chat
 HF_ACCESS_TOKEN=#hf_<token> from from https://huggingface.co/settings/token
 HF_API_ROOT=https://api-inference.huggingface.co/models
+OPENAI_API_KEY=#your openai api key here
 # used to activate search with web functionality. disabled if none are defined. choose one of the following:
 YDC_API_KEY=#your docs.you.com api key here

README.md CHANGED Viewed

@@ -168,6 +168,91 @@ MODELS=`[
 You can change things like the parameters, or customize the preprompt to better suit your needs. You can also add more models by adding more objects to the array, with different preprompts for example.
 #### Custom prompt templates
 By default, the prompt is constructed using `userMessageToken`, `assistantMessageToken`, `userMessageEndToken`, `assistantMessageEndToken`, `preprompt` parameters and a series of default templates.
@@ -258,23 +343,45 @@ You can then add the generated information and the `authorization` parameter to
 ]
 ```
-### Amazon SageMaker
 You can also specify your Amazon SageMaker instance as an endpoint for chat-ui. The config goes like this:
 ```env
 "endpoints": [
     {
-      "host" : "sagemaker",
-      "url": "", // your aws sagemaker url here
       "accessKey": "",
       "secretKey" : "",
-      "sessionToken": "", // optional
       "weight": 1
     }
 ]
 ```
 You can get the `accessKey` and `secretKey` from your AWS user, under programmatic access.
 #### Client Certificate Authentication (mTLS)

 You can change things like the parameters, or customize the preprompt to better suit your needs. You can also add more models by adding more objects to the array, with different preprompts for example.
+#### OpenAI API compatible models
+Chat UI can be used with any API server that supports OpenAI API compatibility, for example [text-generation-webui](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai), [LocalAI](https://github.com/go-skynet/LocalAI), [FastChat](https://github.com/lm-sys/FastChat/blob/main/docs/openai_api.md), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), and [ialacol](https://github.com/chenhunghan/ialacol).
+The following example config makes Chat UI works with [text-generation-webui](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai), the `endpoint.baseUrl` is the url of the OpenAI API compatible server, this overrides the baseUrl to be used by OpenAI instance. The `endpoint.completion` determine which endpoint to be used, default is `chat_completions` which uses `v1/chat/completions`, change to `endpoint.completion` to `completions` to use the `v1/completions` endpoint.
+```
+MODELS=`[
+  {
+    "name": "text-generation-webui",
+    "id": "text-generation-webui",
+    "parameters": {
+      "temperature": 0.9,
+      "top_p": 0.95,
+      "repetition_penalty": 1.2,
+      "top_k": 50,
+      "truncate": 1000,
+      "max_new_tokens": 1024,
+      "stop": []
+    },
+    "endpoints": [{
+      "type" : "openai",
+      "baseURL": "http://localhost:8000/v1"
+    }]
+  }
+]`
+```
+The `openai` type includes official OpenAI models. You can add, for example, GPT4/GPT3.5 as a "openai" model:
+```
+OPENAI_API_KEY=#your openai api key here
+MODELS=`[{
+      "name": "gpt-4",
+      "displayName": "GPT 4",
+      "endpoints" : [{
+        "type": "openai"
+      }]
+},
+      {
+      "name": "gpt-3.5-turbo",
+      "displayName": "GPT 3.5 Turbo",
+      "endpoints" : [{
+        "type": "openai"
+      }]
+}]`
+```
+#### Llama.cpp API server
+chat-ui also supports the llama.cpp API server directly without the need for an adapter. You can do this using the `llamacpp` endpoint type.
+If you want to run chat-ui with llama.cpp, you can do the following, using Zephyr as an example model:
+1. Get [the weights](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/tree/main) from the hub
+2. Run the server with the following command: `./server -m models/zephyr-7b-beta.Q4_K_M.gguf -c 2048 -np 3`
+3. Add the following to your `.env.local`:
+```env
+MODELS=[
+  {
+      "name": "Local Zephyr",
+      "chatPromptTemplate": "<|system|>\n{{preprompt}}</s>\n{{#each messages}}{{#ifUser}}<|user|>\n{{content}}</s>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}</s>\n{{/ifAssistant}}{{/each}}",
+      "parameters": {
+        "temperature": 0.1,
+        "top_p": 0.95,
+        "repetition_penalty": 1.2,
+        "top_k": 50,
+        "truncate": 1000,
+        "max_new_tokens": 2048,
+        "stop": ["</s>"]
+      },
+      "endpoints": [
+        {
+         "url": "http://127.0.0.1:8080",
+         "type": "llamacpp"
+        }
+      ]
+  }
+]
+```
+Start chat-ui with `npm run dev` and you should be able to chat with Zephyr locally.
 #### Custom prompt templates
 By default, the prompt is constructed using `userMessageToken`, `assistantMessageToken`, `userMessageEndToken`, `assistantMessageEndToken`, `preprompt` parameters and a series of default templates.
 ]
 ```
+### Amazon
+#### SageMaker
 You can also specify your Amazon SageMaker instance as an endpoint for chat-ui. The config goes like this:
 ```env
 "endpoints": [
     {
+      "type" : "aws",
+      "service" : "sagemaker"
+      "url": "",
       "accessKey": "",
       "secretKey" : "",
+      "sessionToken": "",
       "weight": 1
     }
 ]
 ```
+#### Lambda
+You can also specify your Amazon Lambda instance as an endpoint for chat-ui. The config goes like this:
+```env
+"endpoints" : [
+  {
+        "type": "aws",
+        "service": "lambda",
+        "url": "",
+        "accessKey": "",
+        "secretKey": "",
+        "sessionToken": "",
+        "region": "",
+        "weight": 1
+ }
+]
+```
 You can get the `accessKey` and `secretKey` from your AWS user, under programmatic access.
 #### Client Certificate Authentication (mTLS)

package-lock.json CHANGED Viewed

@@ -12,7 +12,6 @@
 				"@huggingface/inference": "^2.6.3",
 				"@xenova/transformers": "^2.6.0",
 				"autoprefixer": "^10.4.14",
-				"aws4fetch": "^1.0.17",
 				"date-fns": "^2.29.3",
 				"dotenv": "^16.0.3",
 				"handlebars": "^4.7.8",
@@ -55,6 +54,10 @@
 				"unplugin-icons": "^0.16.1",
 				"vite": "^4.3.9",
 				"vitest": "^0.31.0"
 			}
 		},
 		"node_modules/@ampproject/remapping": {
@@ -1120,6 +1123,16 @@
 			"resolved": "https://registry.npmjs.org/@types/node/-/node-18.13.0.tgz",
 			"integrity": "sha512-gC3TazRzGoOnoKAhUx+Q0t8S9Tzs74z7m0ipwGpSqQrleP14hKxP4/JUeEQcD3W1/aIpnWl8pHowI7WokuZpXg=="
 		},
 		"node_modules/@types/node-int64": {
 			"version": "0.4.29",
 			"resolved": "https://registry.npmjs.org/@types/node-int64/-/node-int64-0.4.29.tgz",
@@ -1478,6 +1491,18 @@
 			"resolved": "https://registry.npmjs.org/abab/-/abab-2.0.6.tgz",
 			"integrity": "sha512-j2afSsaIENvHZN2B8GOpF566vZ5WVk5opAiMTvWgaQT8DkbOqsTfvNAvHoRGU2zzP8cPoqys+xHTRDWW8L+/BA=="
 		},
 		"node_modules/acorn": {
 			"version": "8.10.0",
 			"resolved": "https://registry.npmjs.org/acorn/-/acorn-8.10.0.tgz",
@@ -1519,6 +1544,18 @@
 				"node": ">= 6.0.0"
 			}
 		},
 		"node_modules/ajv": {
 			"version": "6.12.6",
 			"resolved": "https://registry.npmjs.org/ajv/-/ajv-6.12.6.tgz",
@@ -1654,7 +1691,8 @@
 		"node_modules/aws4fetch": {
 			"version": "1.0.17",
 			"resolved": "https://registry.npmjs.org/aws4fetch/-/aws4fetch-1.0.17.tgz",
-			"integrity": "sha512-4IbOvsxqxeOSxI4oA+8xEO8SzBMVlzbSTgGy/EF83rHnQ/aKtP6Sc6YV/k0oiW0mqrcxuThlbDosnvetGOuO+g=="
 		},
 		"node_modules/axobject-query": {
 			"version": "3.2.1",
@@ -1675,6 +1713,12 @@
 			"resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-1.0.2.tgz",
 			"integrity": "sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw=="
 		},
 		"node_modules/base64-js": {
 			"version": "1.5.1",
 			"resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz",
@@ -1924,6 +1968,15 @@
 				"url": "https://github.com/chalk/chalk?sponsor=1"
 			}
 		},
 		"node_modules/check-error": {
 			"version": "1.0.2",
 			"resolved": "https://registry.npmjs.org/check-error/-/check-error-1.0.2.tgz",
@@ -2112,6 +2165,15 @@
 				"node": ">= 8"
 			}
 		},
 		"node_modules/css-tree": {
 			"version": "2.3.1",
 			"resolved": "https://registry.npmjs.org/css-tree/-/css-tree-2.3.1.tgz",
@@ -2331,6 +2393,16 @@
 				"node": ">=0.3.1"
 			}
 		},
 		"node_modules/dir-glob": {
 			"version": "3.0.1",
 			"resolved": "https://registry.npmjs.org/dir-glob/-/dir-glob-3.0.1.tgz",
@@ -2683,6 +2755,15 @@
 				"node": ">=0.10.0"
 			}
 		},
 		"node_modules/execa": {
 			"version": "5.1.1",
 			"resolved": "https://registry.npmjs.org/execa/-/execa-5.1.1.tgz",
@@ -2853,6 +2934,25 @@
 				"node": ">= 6"
 			}
 		},
 		"node_modules/fraction.js": {
 			"version": "4.2.0",
 			"resolved": "https://registry.npmjs.org/fraction.js/-/fraction.js-4.2.0.tgz",
@@ -3118,6 +3218,15 @@
 				"node": ">=10.17.0"
 			}
 		},
 		"node_modules/iconv-lite": {
 			"version": "0.6.3",
 			"resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.6.3.tgz",
@@ -3227,6 +3336,12 @@
 				"node": ">=8"
 			}
 		},
 		"node_modules/is-builtin-module": {
 			"version": "3.2.1",
 			"resolved": "https://registry.npmjs.org/is-builtin-module/-/is-builtin-module-3.2.1.tgz",
@@ -3662,6 +3777,17 @@
 				"marked": ">=4 <10"
 			}
 		},
 		"node_modules/md5-hex": {
 			"version": "3.0.1",
 			"resolved": "https://registry.npmjs.org/md5-hex/-/md5-hex-3.0.1.tgz",
@@ -3939,6 +4065,67 @@
 			"resolved": "https://registry.npmjs.org/node-addon-api/-/node-addon-api-6.1.0.tgz",
 			"integrity": "sha512-+eawOlIgy680F0kBzPUNFhMZGtJ1YmqM6l4+Crf4IkImjYrO/mqPwRMh352g23uIaQKFItcQ64I7KMaJxHgAVA=="
 		},
 		"node_modules/node-gyp-build": {
 			"version": "4.6.1",
 			"resolved": "https://registry.npmjs.org/node-gyp-build/-/node-gyp-build-4.6.1.tgz",
@@ -4089,6 +4276,35 @@
 				"platform": "^1.3.6"
 			}
 		},
 		"node_modules/openid-client": {
 			"version": "5.4.2",
 			"resolved": "https://registry.npmjs.org/openid-client/-/openid-client-5.4.2.tgz",
@@ -6260,6 +6476,15 @@
 				"node": ">=14"
 			}
 		},
 		"node_modules/webidl-conversions": {
 			"version": "7.0.0",
 			"resolved": "https://registry.npmjs.org/webidl-conversions/-/webidl-conversions-7.0.0.tgz",

 				"@huggingface/inference": "^2.6.3",
 				"@xenova/transformers": "^2.6.0",
 				"autoprefixer": "^10.4.14",
 				"date-fns": "^2.29.3",
 				"dotenv": "^16.0.3",
 				"handlebars": "^4.7.8",
 				"unplugin-icons": "^0.16.1",
 				"vite": "^4.3.9",
 				"vitest": "^0.31.0"
+			},
+			"optionalDependencies": {
+				"aws4fetch": "^1.0.17",
+				"openai": "^4.14.2"
 			}
 		},
 		"node_modules/@ampproject/remapping": {
 			"resolved": "https://registry.npmjs.org/@types/node/-/node-18.13.0.tgz",
 			"integrity": "sha512-gC3TazRzGoOnoKAhUx+Q0t8S9Tzs74z7m0ipwGpSqQrleP14hKxP4/JUeEQcD3W1/aIpnWl8pHowI7WokuZpXg=="
 		},
+		"node_modules/@types/node-fetch": {
+			"version": "2.6.5",
+			"resolved": "https://registry.npmjs.org/@types/node-fetch/-/node-fetch-2.6.5.tgz",
+			"integrity": "sha512-OZsUlr2nxvkqUFLSaY2ZbA+P1q22q+KrlxWOn/38RX+u5kTkYL2mTujEpzUhGkS+K/QCYp9oagfXG39XOzyySg==",
+			"optional": true,
+			"dependencies": {
+				"@types/node": "*",
+				"form-data": "^4.0.0"
+			}
+		},
 		"node_modules/@types/node-int64": {
 			"version": "0.4.29",
 			"resolved": "https://registry.npmjs.org/@types/node-int64/-/node-int64-0.4.29.tgz",
 			"resolved": "https://registry.npmjs.org/abab/-/abab-2.0.6.tgz",
 			"integrity": "sha512-j2afSsaIENvHZN2B8GOpF566vZ5WVk5opAiMTvWgaQT8DkbOqsTfvNAvHoRGU2zzP8cPoqys+xHTRDWW8L+/BA=="
 		},
+		"node_modules/abort-controller": {
+			"version": "3.0.0",
+			"resolved": "https://registry.npmjs.org/abort-controller/-/abort-controller-3.0.0.tgz",
+			"integrity": "sha512-h8lQ8tacZYnR3vNQTgibj+tODHI5/+l06Au2Pcriv/Gmet0eaj4TwWH41sO9wnHDiQsEj19q0drzdWdeAHtweg==",
+			"optional": true,
+			"dependencies": {
+				"event-target-shim": "^5.0.0"
+			},
+			"engines": {
+				"node": ">=6.5"
+			}
+		},
 		"node_modules/acorn": {
 			"version": "8.10.0",
 			"resolved": "https://registry.npmjs.org/acorn/-/acorn-8.10.0.tgz",
 				"node": ">= 6.0.0"
 			}
 		},
+		"node_modules/agentkeepalive": {
+			"version": "4.5.0",
+			"resolved": "https://registry.npmjs.org/agentkeepalive/-/agentkeepalive-4.5.0.tgz",
+			"integrity": "sha512-5GG/5IbQQpC9FpkRGsSvZI5QYeSCzlJHdpBQntCsuTOxhKD8lqKhrleg2Yi7yvMIf82Ycmmqln9U8V9qwEiJew==",
+			"optional": true,
+			"dependencies": {
+				"humanize-ms": "^1.2.1"
+			},
+			"engines": {
+				"node": ">= 8.0.0"
+			}
+		},
 		"node_modules/ajv": {
 			"version": "6.12.6",
 			"resolved": "https://registry.npmjs.org/ajv/-/ajv-6.12.6.tgz",
 		"node_modules/aws4fetch": {
 			"version": "1.0.17",
 			"resolved": "https://registry.npmjs.org/aws4fetch/-/aws4fetch-1.0.17.tgz",
+			"integrity": "sha512-4IbOvsxqxeOSxI4oA+8xEO8SzBMVlzbSTgGy/EF83rHnQ/aKtP6Sc6YV/k0oiW0mqrcxuThlbDosnvetGOuO+g==",
+			"optional": true
 		},
 		"node_modules/axobject-query": {
 			"version": "3.2.1",
 			"resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-1.0.2.tgz",
 			"integrity": "sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw=="
 		},
+		"node_modules/base-64": {
+			"version": "0.1.0",
+			"resolved": "https://registry.npmjs.org/base-64/-/base-64-0.1.0.tgz",
+			"integrity": "sha512-Y5gU45svrR5tI2Vt/X9GPd3L0HNIKzGu202EjxrXMpuc2V2CiKgemAbUUsqYmZJvPtCXoUKjNZwBJzsNScUbXA==",
+			"optional": true
+		},
 		"node_modules/base64-js": {
 			"version": "1.5.1",
 			"resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz",
 				"url": "https://github.com/chalk/chalk?sponsor=1"
 			}
 		},
+		"node_modules/charenc": {
+			"version": "0.0.2",
+			"resolved": "https://registry.npmjs.org/charenc/-/charenc-0.0.2.tgz",
+			"integrity": "sha512-yrLQ/yVUFXkzg7EDQsPieE/53+0RlaWTs+wBrvW36cyilJ2SaDWfl4Yj7MtLTXleV9uEKefbAGUPv2/iWSooRA==",
+			"optional": true,
+			"engines": {
+				"node": "*"
+			}
+		},
 		"node_modules/check-error": {
 			"version": "1.0.2",
 			"resolved": "https://registry.npmjs.org/check-error/-/check-error-1.0.2.tgz",
 				"node": ">= 8"
 			}
 		},
+		"node_modules/crypt": {
+			"version": "0.0.2",
+			"resolved": "https://registry.npmjs.org/crypt/-/crypt-0.0.2.tgz",
+			"integrity": "sha512-mCxBlsHFYh9C+HVpiEacem8FEBnMXgU9gy4zmNC+SXAZNB/1idgp/aulFJ4FgCi7GPEVbfyng092GqL2k2rmow==",
+			"optional": true,
+			"engines": {
+				"node": "*"
+			}
+		},
 		"node_modules/css-tree": {
 			"version": "2.3.1",
 			"resolved": "https://registry.npmjs.org/css-tree/-/css-tree-2.3.1.tgz",
 				"node": ">=0.3.1"
 			}
 		},
+		"node_modules/digest-fetch": {
+			"version": "1.3.0",
+			"resolved": "https://registry.npmjs.org/digest-fetch/-/digest-fetch-1.3.0.tgz",
+			"integrity": "sha512-CGJuv6iKNM7QyZlM2T3sPAdZWd/p9zQiRNS9G+9COUCwzWFTs0Xp8NF5iePx7wtvhDykReiRRrSeNb4oMmB8lA==",
+			"optional": true,
+			"dependencies": {
+				"base-64": "^0.1.0",
+				"md5": "^2.3.0"
+			}
+		},
 		"node_modules/dir-glob": {
 			"version": "3.0.1",
 			"resolved": "https://registry.npmjs.org/dir-glob/-/dir-glob-3.0.1.tgz",
 				"node": ">=0.10.0"
 			}
 		},
+		"node_modules/event-target-shim": {
+			"version": "5.0.1",
+			"resolved": "https://registry.npmjs.org/event-target-shim/-/event-target-shim-5.0.1.tgz",
+			"integrity": "sha512-i/2XbnSz/uxRCU6+NdVJgKWDTM427+MqYbkQzD321DuCQJUqOuJKIA0IM2+W2xtYHdKOmZ4dR6fExsd4SXL+WQ==",
+			"optional": true,
+			"engines": {
+				"node": ">=6"
+			}
+		},
 		"node_modules/execa": {
 			"version": "5.1.1",
 			"resolved": "https://registry.npmjs.org/execa/-/execa-5.1.1.tgz",
 				"node": ">= 6"
 			}
 		},
+		"node_modules/form-data-encoder": {
+			"version": "1.7.2",
+			"resolved": "https://registry.npmjs.org/form-data-encoder/-/form-data-encoder-1.7.2.tgz",
+			"integrity": "sha512-qfqtYan3rxrnCk1VYaA4H+Ms9xdpPqvLZa6xmMgFvhO32x7/3J/ExcTd6qpxM0vH2GdMI+poehyBZvqfMTto8A==",
+			"optional": true
+		},
+		"node_modules/formdata-node": {
+			"version": "4.4.1",
+			"resolved": "https://registry.npmjs.org/formdata-node/-/formdata-node-4.4.1.tgz",
+			"integrity": "sha512-0iirZp3uVDjVGt9p49aTaqjk84TrglENEDuqfdlZQ1roC9CWlPk6Avf8EEnZNcAqPonwkG35x4n3ww/1THYAeQ==",
+			"optional": true,
+			"dependencies": {
+				"node-domexception": "1.0.0",
+				"web-streams-polyfill": "4.0.0-beta.3"
+			},
+			"engines": {
+				"node": ">= 12.20"
+			}
+		},
 		"node_modules/fraction.js": {
 			"version": "4.2.0",
 			"resolved": "https://registry.npmjs.org/fraction.js/-/fraction.js-4.2.0.tgz",
 				"node": ">=10.17.0"
 			}
 		},
+		"node_modules/humanize-ms": {
+			"version": "1.2.1",
+			"resolved": "https://registry.npmjs.org/humanize-ms/-/humanize-ms-1.2.1.tgz",
+			"integrity": "sha512-Fl70vYtsAFb/C06PTS9dZBo7ihau+Tu/DNCk/OyHhea07S+aeMWpFFkUaXRa8fI+ScZbEI8dfSxwY7gxZ9SAVQ==",
+			"optional": true,
+			"dependencies": {
+				"ms": "^2.0.0"
+			}
+		},
 		"node_modules/iconv-lite": {
 			"version": "0.6.3",
 			"resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.6.3.tgz",
 				"node": ">=8"
 			}
 		},
+		"node_modules/is-buffer": {
+			"version": "1.1.6",
+			"resolved": "https://registry.npmjs.org/is-buffer/-/is-buffer-1.1.6.tgz",
+			"integrity": "sha512-NcdALwpXkTm5Zvvbk7owOUSvVvBKDgKP5/ewfXEznmQFfs4ZRmanOeKBTjRVjka3QFoN6XJ+9F3USqfHqTaU5w==",
+			"optional": true
+		},
 		"node_modules/is-builtin-module": {
 			"version": "3.2.1",
 			"resolved": "https://registry.npmjs.org/is-builtin-module/-/is-builtin-module-3.2.1.tgz",
 				"marked": ">=4 <10"
 			}
 		},
+		"node_modules/md5": {
+			"version": "2.3.0",
+			"resolved": "https://registry.npmjs.org/md5/-/md5-2.3.0.tgz",
+			"integrity": "sha512-T1GITYmFaKuO91vxyoQMFETst+O71VUPEU3ze5GNzDm0OWdP8v1ziTaAEPUr/3kLsY3Sftgz242A1SetQiDL7g==",
+			"optional": true,
+			"dependencies": {
+				"charenc": "0.0.2",
+				"crypt": "0.0.2",
+				"is-buffer": "~1.1.6"
+			}
+		},
 		"node_modules/md5-hex": {
 			"version": "3.0.1",
 			"resolved": "https://registry.npmjs.org/md5-hex/-/md5-hex-3.0.1.tgz",
 			"resolved": "https://registry.npmjs.org/node-addon-api/-/node-addon-api-6.1.0.tgz",
 			"integrity": "sha512-+eawOlIgy680F0kBzPUNFhMZGtJ1YmqM6l4+Crf4IkImjYrO/mqPwRMh352g23uIaQKFItcQ64I7KMaJxHgAVA=="
 		},
+		"node_modules/node-domexception": {
+			"version": "1.0.0",
+			"resolved": "https://registry.npmjs.org/node-domexception/-/node-domexception-1.0.0.tgz",
+			"integrity": "sha512-/jKZoMpw0F8GRwl4/eLROPA3cfcXtLApP0QzLmUT/HuPCZWyB7IY9ZrMeKw2O/nFIqPQB3PVM9aYm0F312AXDQ==",
+			"funding": [
+				{
+					"type": "github",
+					"url": "https://github.com/sponsors/jimmywarting"
+				},
+				{
+					"type": "github",
+					"url": "https://paypal.me/jimmywarting"
+				}
+			],
+			"optional": true,
+			"engines": {
+				"node": ">=10.5.0"
+			}
+		},
+		"node_modules/node-fetch": {
+			"version": "2.7.0",
+			"resolved": "https://registry.npmjs.org/node-fetch/-/node-fetch-2.7.0.tgz",
+			"integrity": "sha512-c4FRfUm/dbcWZ7U+1Wq0AwCyFL+3nt2bEw05wfxSz+DWpWsitgmSgYmy2dQdWyKC1694ELPqMs/YzUSNozLt8A==",
+			"optional": true,
+			"dependencies": {
+				"whatwg-url": "^5.0.0"
+			},
+			"engines": {
+				"node": "4.x || >=6.0.0"
+			},
+			"peerDependencies": {
+				"encoding": "^0.1.0"
+			},
+			"peerDependenciesMeta": {
+				"encoding": {
+					"optional": true
+				}
+			}
+		},
+		"node_modules/node-fetch/node_modules/tr46": {
+			"version": "0.0.3",
+			"resolved": "https://registry.npmjs.org/tr46/-/tr46-0.0.3.tgz",
+			"integrity": "sha512-N3WMsuqV66lT30CrXNbEjx4GEwlow3v6rr4mCcv6prnfwhS01rkgyFdjPNBYd9br7LpXV1+Emh01fHnq2Gdgrw==",
+			"optional": true
+		},
+		"node_modules/node-fetch/node_modules/webidl-conversions": {
+			"version": "3.0.1",
+			"resolved": "https://registry.npmjs.org/webidl-conversions/-/webidl-conversions-3.0.1.tgz",
+			"integrity": "sha512-2JAn3z8AR6rjK8Sm8orRC0h/bcl/DqL7tRPdGZ4I1CjdF+EaMLmYxBHyXuKL849eucPFhvBoxMsflfOb8kxaeQ==",
+			"optional": true
+		},
+		"node_modules/node-fetch/node_modules/whatwg-url": {
+			"version": "5.0.0",
+			"resolved": "https://registry.npmjs.org/whatwg-url/-/whatwg-url-5.0.0.tgz",
+			"integrity": "sha512-saE57nupxk6v3HY35+jzBwYa0rKSy0XR8JSxZPwgLr7ys0IBzhGviA1/TUGJLmSVqs8pb9AnvICXEuOHLprYTw==",
+			"optional": true,
+			"dependencies": {
+				"tr46": "~0.0.3",
+				"webidl-conversions": "^3.0.0"
+			}
+		},
 		"node_modules/node-gyp-build": {
 			"version": "4.6.1",
 			"resolved": "https://registry.npmjs.org/node-gyp-build/-/node-gyp-build-4.6.1.tgz",
 				"platform": "^1.3.6"
 			}
 		},
+		"node_modules/openai": {
+			"version": "4.14.2",
+			"resolved": "https://registry.npmjs.org/openai/-/openai-4.14.2.tgz",
+			"integrity": "sha512-JGlm7mMC7J+cyQZnQMOH7daD9cBqqWqLtlBsejElEkgoehPrYfdyxSxIGICz5xk4YimbwI5FlLATSVojLtCKXQ==",
+			"optional": true,
+			"dependencies": {
+				"@types/node": "^18.11.18",
+				"@types/node-fetch": "^2.6.4",
+				"abort-controller": "^3.0.0",
+				"agentkeepalive": "^4.2.1",
+				"digest-fetch": "^1.3.0",
+				"form-data-encoder": "1.7.2",
+				"formdata-node": "^4.3.2",
+				"node-fetch": "^2.6.7",
+				"web-streams-polyfill": "^3.2.1"
+			},
+			"bin": {
+				"openai": "bin/cli"
+			}
+		},
+		"node_modules/openai/node_modules/web-streams-polyfill": {
+			"version": "3.2.1",
+			"resolved": "https://registry.npmjs.org/web-streams-polyfill/-/web-streams-polyfill-3.2.1.tgz",
+			"integrity": "sha512-e0MO3wdXWKrLbL0DgGnUV7WHVuw9OUvL4hjgnPkIeEvESk74gAITi5G606JtZPp39cd8HA9VQzCIvA49LpPN5Q==",
+			"optional": true,
+			"engines": {
+				"node": ">= 8"
+			}
+		},
 		"node_modules/openid-client": {
 			"version": "5.4.2",
 			"resolved": "https://registry.npmjs.org/openid-client/-/openid-client-5.4.2.tgz",
 				"node": ">=14"
 			}
 		},
+		"node_modules/web-streams-polyfill": {
+			"version": "4.0.0-beta.3",
+			"resolved": "https://registry.npmjs.org/web-streams-polyfill/-/web-streams-polyfill-4.0.0-beta.3.tgz",
+			"integrity": "sha512-QW95TCTaHmsYfHDybGMwO5IJIM93I/6vTRk+daHTWFPhwh+C8Cg7j7XyKrwrj8Ib6vYXe0ocYNrmzY4xAAN6ug==",
+			"optional": true,
+			"engines": {
+				"node": ">= 14"
+			}
+		},
 		"node_modules/webidl-conversions": {
 			"version": "7.0.0",
 			"resolved": "https://registry.npmjs.org/webidl-conversions/-/webidl-conversions-7.0.0.tgz",

package.json CHANGED Viewed

@@ -48,7 +48,6 @@
 		"@huggingface/inference": "^2.6.3",
 		"@xenova/transformers": "^2.6.0",
 		"autoprefixer": "^10.4.14",
-		"aws4fetch": "^1.0.17",
 		"date-fns": "^2.29.3",
 		"dotenv": "^16.0.3",
 		"handlebars": "^4.7.8",
@@ -64,5 +63,9 @@
 		"tailwind-scrollbar": "^3.0.0",
 		"tailwindcss": "^3.3.1",
 		"zod": "^3.22.3"
 	}
 }

 		"@huggingface/inference": "^2.6.3",
 		"@xenova/transformers": "^2.6.0",
 		"autoprefixer": "^10.4.14",
 		"date-fns": "^2.29.3",
 		"dotenv": "^16.0.3",
 		"handlebars": "^4.7.8",
 		"tailwind-scrollbar": "^3.0.0",
 		"tailwindcss": "^3.3.1",
 		"zod": "^3.22.3"
+	},
+	"optionalDependencies": {
+		"aws4fetch": "^1.0.17",
+		"openai": "^4.14.2"
 	}
 }

src/lib/server/endpoints/aws/endpointAws.ts ADDED Viewed

	@@ -0,0 +1,64 @@

+import { buildPrompt } from "$lib/buildPrompt";
+import { textGenerationStream } from "@huggingface/inference";
+import { z } from "zod";
+import type { Endpoint } from "../endpoints";
+export const endpointAwsParametersSchema = z.object({
+	weight: z.number().int().positive().default(1),
+	model: z.any(),
+	type: z.literal("aws"),
+	url: z.string().url(),
+	accessKey: z.string().min(1),
+	secretKey: z.string().min(1),
+	sessionToken: z.string().optional(),
+	service: z.union([z.literal("sagemaker"), z.literal("lambda")]).default("sagemaker"),
+	region: z.string().optional(),
+});
+export async function endpointAws({
+	url,
+	accessKey,
+	secretKey,
+	sessionToken,
+	model,
+	region,
+	service,
+}: z.infer<typeof endpointAwsParametersSchema>): Promise<Endpoint> {
+	let AwsClient;
+	try {
+		AwsClient = (await import("aws4fetch")).AwsClient;
+	} catch (e) {
+		throw new Error("Failed to import aws4fetch");
+	}
+	const aws = new AwsClient({
+		accessKeyId: accessKey,
+		secretAccessKey: secretKey,
+		sessionToken,
+		service,
+		region,
+	});
+	return async ({ conversation }) => {
+		const prompt = await buildPrompt({
+			messages: conversation.messages,
+			webSearch: conversation.messages[conversation.messages.length - 1].webSearch,
+			preprompt: conversation.preprompt,
+			model,
+		});
+		return textGenerationStream(
+			{
+				parameters: { ...model.parameters, return_full_text: false },
+				model: url,
+				inputs: prompt,
+			},
+			{
+				use_cache: false,
+				fetch: aws.fetch.bind(aws) as typeof fetch,
+			}
+		);
+	};
+}
+export default endpointAws;

src/lib/server/endpoints/endpoints.ts ADDED Viewed

	@@ -0,0 +1,42 @@

+import type { Conversation } from "$lib/types/Conversation";
+import type { TextGenerationStreamOutput } from "@huggingface/inference";
+import { endpointTgi, endpointTgiParametersSchema } from "./tgi/endpointTgi";
+import { z } from "zod";
+import endpointAws, { endpointAwsParametersSchema } from "./aws/endpointAws";
+import { endpointOAIParametersSchema, endpointOai } from "./openai/endpointOai";
+import endpointLlamacpp, { endpointLlamacppParametersSchema } from "./llamacpp/endpointLlamacpp";
+// parameters passed when generating text
+interface EndpointParameters {
+	conversation: {
+		messages: Omit<Conversation["messages"][0], "id">[];
+		preprompt?: Conversation["preprompt"];
+	};
+}
+interface CommonEndpoint {
+	weight: number;
+}
+// type signature for the endpoint
+export type Endpoint = (
+	params: EndpointParameters
+) => Promise<AsyncGenerator<TextGenerationStreamOutput, void, void>>;
+// generator function that takes in parameters for defining the endpoint and return the endpoint
+export type EndpointGenerator<T extends CommonEndpoint> = (parameters: T) => Endpoint;
+// list of all endpoint generators
+export const endpoints = {
+	tgi: endpointTgi,
+	sagemaker: endpointAws,
+	openai: endpointOai,
+	llamacpp: endpointLlamacpp,
+};
+export const endpointSchema = z.discriminatedUnion("type", [
+	endpointAwsParametersSchema,
+	endpointOAIParametersSchema,
+	endpointTgiParametersSchema,
+	endpointLlamacppParametersSchema,
+]);
+export default endpoints;

src/lib/server/endpoints/llamacpp/endpointLlamacpp.ts ADDED Viewed

	@@ -0,0 +1,100 @@

+import { HF_ACCESS_TOKEN } from "$env/static/private";
+import { buildPrompt } from "$lib/buildPrompt";
+import type { TextGenerationStreamOutput } from "@huggingface/inference";
+import type { Endpoint } from "../endpoints";
+import { z } from "zod";
+export const endpointLlamacppParametersSchema = z.object({
+	weight: z.number().int().positive().default(1),
+	model: z.any(),
+	type: z.literal("llamacpp"),
+	url: z.string().url(),
+	accessToken: z.string().min(1).default(HF_ACCESS_TOKEN),
+});
+export function endpointLlamacpp({
+	url,
+	model,
+}: z.infer<typeof endpointLlamacppParametersSchema>): Endpoint {
+	return async ({ conversation }) => {
+		const prompt = await buildPrompt({
+			messages: conversation.messages,
+			webSearch: conversation.messages[conversation.messages.length - 1].webSearch,
+			preprompt: conversation.preprompt,
+			model,
+		});
+		const r = await fetch(`${url}/completion`, {
+			method: "POST",
+			headers: {
+				"Content-Type": "application/json",
+			},
+			body: JSON.stringify({
+				prompt,
+				stream: true,
+				temperature: model.parameters.temperature,
+				top_p: model.parameters.top_p,
+				top_k: model.parameters.top_k,
+				stop: model.parameters.stop,
+				repeat_penalty: model.parameters.repetition_penalty,
+				n_predict: model.parameters.max_new_tokens,
+			}),
+		});
+		if (!r.ok) {
+			throw new Error(`Failed to generate text: ${await r.text()}`);
+		}
+		const encoder = new TextDecoderStream();
+		const reader = r.body?.pipeThrough(encoder).getReader();
+		return (async function* () {
+			let stop = false;
+			let generatedText = "";
+			let tokenId = 0;
+			while (!stop) {
+				// read the stream and log the outputs to console
+				const out = (await reader?.read()) ?? { done: false, value: undefined };
+				// we read, if it's done we cancel
+				if (out.done) {
+					reader?.cancel();
+					return;
+				}
+				if (!out.value) {
+					return;
+				}
+				if (out.value.startsWith("data: ")) {
+					let data = null;
+					try {
+						data = JSON.parse(out.value.slice(6));
+					} catch (e) {
+						return;
+					}
+					if (data.content || data.stop) {
+						generatedText += data.content;
+						const output: TextGenerationStreamOutput = {
+							token: {
+								id: tokenId++,
+								text: data.content ?? "",
+								logprob: 0,
+								special: false,
+							},
+							generated_text: data.stop ? generatedText : null,
+							details: null,
+						};
+						if (data.stop) {
+							stop = true;
+							reader?.cancel();
+						}
+						yield output;
+						// take the data.content value and yield it
+					}
+				}
+			}
+		})();
+	};
+}
+export default endpointLlamacpp;

src/lib/server/endpoints/openai/endpointOai.ts ADDED Viewed

	@@ -0,0 +1,82 @@

+import { z } from "zod";
+import { openAICompletionToTextGenerationStream } from "./openAICompletionToTextGenerationStream";
+import { openAIChatToTextGenerationStream } from "./openAIChatToTextGenerationStream";
+import { buildPrompt } from "$lib/buildPrompt";
+import { OPENAI_API_KEY } from "$env/static/private";
+import type { Endpoint } from "../endpoints";
+export const endpointOAIParametersSchema = z.object({
+	weight: z.number().int().positive().default(1),
+	model: z.any(),
+	type: z.literal("openai"),
+	baseURL: z.string().url().default("https://api.openai.com/v1"),
+	apiKey: z.string().default(OPENAI_API_KEY ?? "sk-"),
+	completion: z
+		.union([z.literal("completions"), z.literal("chat_completions")])
+		.default("chat_completions"),
+});
+export async function endpointOai({
+	baseURL,
+	apiKey,
+	completion,
+	model,
+}: z.infer<typeof endpointOAIParametersSchema>): Promise<Endpoint> {
+	let OpenAI;
+	try {
+		OpenAI = (await import("openai")).OpenAI;
+	} catch (e) {
+		throw new Error("Failed to import OpenAI", { cause: e });
+	}
+	const openai = new OpenAI({
+		apiKey: apiKey ?? "sk-",
+		baseURL: baseURL,
+	});
+	if (completion === "completions") {
+		return async ({ conversation }) => {
+			return openAICompletionToTextGenerationStream(
+				await openai.completions.create({
+					model: model.id ?? model.name,
+					prompt: await buildPrompt({
+						messages: conversation.messages,
+						webSearch: conversation.messages[conversation.messages.length - 1].webSearch,
+						preprompt: conversation.preprompt,
+						model,
+					}),
+					stream: true,
+					max_tokens: model.parameters?.max_new_tokens,
+					stop: model.parameters?.stop,
+					temperature: model.parameters?.temperature,
+					top_p: model.parameters?.top_p,
+					frequency_penalty: model.parameters?.repetition_penalty,
+				})
+			);
+		};
+	} else if (completion === "chat_completions") {
+		return async ({ conversation }) => {
+			const messages = conversation.messages.map((message) => ({
+				role: message.from,
+				content: message.content,
+			}));
+			return openAIChatToTextGenerationStream(
+				await openai.chat.completions.create({
+					model: model.id ?? model.name,
+					messages: conversation.preprompt
+						? [{ role: "system", content: conversation.preprompt }, ...messages]
+						: messages,
+					stream: true,
+					max_tokens: model.parameters?.max_new_tokens,
+					stop: model.parameters?.stop,
+					temperature: model.parameters?.temperature,
+					top_p: model.parameters?.top_p,
+					frequency_penalty: model.parameters?.repetition_penalty,
+				})
+			);
+		};
+	} else {
+		throw new Error("Invalid completion type");
+	}
+}

src/lib/server/endpoints/openai/openAIChatToTextGenerationStream.ts ADDED Viewed

	@@ -0,0 +1,32 @@

+import type { TextGenerationStreamOutput } from "@huggingface/inference";
+import type OpenAI from "openai";
+import type { Stream } from "openai/streaming";
+/**
+ * Transform a stream of OpenAI.Chat.ChatCompletion into a stream of TextGenerationStreamOutput
+ */
+export async function* openAIChatToTextGenerationStream(
+	completionStream: Stream<OpenAI.Chat.Completions.ChatCompletionChunk>
+) {
+	let generatedText = "";
+	let tokenId = 0;
+	for await (const completion of completionStream) {
+		const { choices } = completion;
+		const content = choices[0]?.delta?.content ?? "";
+		const last = choices[0]?.finish_reason === "stop";
+		if (content) {
+			generatedText = generatedText + content;
+		}
+		const output: TextGenerationStreamOutput = {
+			token: {
+				id: tokenId++,
+				text: content ?? "",
+				logprob: 0,
+				special: false,
+			},
+			generated_text: last ? generatedText : null,
+			details: null,
+		};
+		yield output;
+	}
+}

src/lib/server/endpoints/openai/openAICompletionToTextGenerationStream.ts ADDED Viewed

	@@ -0,0 +1,32 @@

+import type { TextGenerationStreamOutput } from "@huggingface/inference";
+import type OpenAI from "openai";
+import type { Stream } from "openai/streaming";
+/**
+ * Transform a stream of OpenAI.Completions.Completion into a stream of TextGenerationStreamOutput
+ */
+export async function* openAICompletionToTextGenerationStream(
+	completionStream: Stream<OpenAI.Completions.Completion>
+) {
+	let generatedText = "";
+	let tokenId = 0;
+	for await (const completion of completionStream) {
+		const { choices } = completion;
+		const text = choices[0]?.text ?? "";
+		const last = choices[0]?.finish_reason === "stop";
+		if (text) {
+			generatedText = generatedText + text;
+		}
+		const output: TextGenerationStreamOutput = {
+			token: {
+				id: tokenId++,
+				text,
+				logprob: 0,
+				special: false,
+			},
+			generated_text: last ? generatedText : null,
+			details: null,
+		};
+		yield output;
+	}
+}

src/lib/server/endpoints/tgi/endpointTgi.ts ADDED Viewed

	@@ -0,0 +1,37 @@

+import { HF_ACCESS_TOKEN } from "$env/static/private";
+import { buildPrompt } from "$lib/buildPrompt";
+import { textGenerationStream } from "@huggingface/inference";
+import type { Endpoint } from "../endpoints";
+import { z } from "zod";
+export const endpointTgiParametersSchema = z.object({
+	weight: z.number().int().positive().default(1),
+	model: z.any(),
+	type: z.literal("tgi"),
+	url: z.string().url(),
+	accessToken: z.string().min(1).default(HF_ACCESS_TOKEN),
+});
+export function endpointTgi({
+	url,
+	accessToken,
+	model,
+}: z.infer<typeof endpointTgiParametersSchema>): Endpoint {
+	return async ({ conversation }) => {
+		const prompt = await buildPrompt({
+			messages: conversation.messages,
+			webSearch: conversation.messages[conversation.messages.length - 1].webSearch,
+			preprompt: conversation.preprompt,
+			model,
+		});
+		return textGenerationStream({
+			parameters: { ...model.parameters, return_full_text: false },
+			model: url,
+			inputs: prompt,
+			accessToken,
+		});
+	};
+}
+export default endpointTgi;

src/lib/server/generateFromDefaultEndpoint.ts CHANGED Viewed

@@ -1,110 +1,28 @@
 import { smallModel } from "$lib/server/models";
-import { modelEndpoint } from "./modelEndpoint";
-import { trimSuffix } from "$lib/utils/trimSuffix";
-import { trimPrefix } from "$lib/utils/trimPrefix";
-import { PUBLIC_SEP_TOKEN } from "$lib/constants/publicSepToken";
-import { AwsClient } from "aws4fetch";
-interface Parameters {
-	temperature: number;
-	truncate: number;
-	max_new_tokens: number;
-	stop: string[];
-}
-export async function generateFromDefaultEndpoint(
-	prompt: string,
-	parameters?: Partial<Parameters>
-): Promise<string> {
-	const newParameters = {
-		...smallModel.parameters,
-		...parameters,
-		return_full_text: false,
-		wait_for_model: true,
-	};
-	const randomEndpoint = modelEndpoint(smallModel);
-	const abortController = new AbortController();
-	let resp: Response;
-	if (randomEndpoint.host === "sagemaker") {
-		const requestParams = JSON.stringify({
-			parameters: newParameters,
-			inputs: prompt,
-		});
-		const aws = new AwsClient({
-			accessKeyId: randomEndpoint.accessKey,
-			secretAccessKey: randomEndpoint.secretKey,
-			sessionToken: randomEndpoint.sessionToken,
-			service: "sagemaker",
-		});
-		resp = await aws.fetch(randomEndpoint.url, {
-			method: "POST",
-			body: requestParams,
-			signal: abortController.signal,
-			headers: {
-				"Content-Type": "application/json",
-			},
-		});
-	} else {
-		resp = await fetch(randomEndpoint.url, {
-			headers: {
-				"Content-Type": "application/json",
-				Authorization: randomEndpoint.authorization,
-			},
-			method: "POST",
-			body: JSON.stringify({
-				parameters: newParameters,
-				inputs: prompt,
-			}),
-			signal: abortController.signal,
-		});
-	}
-	if (!resp.ok) {
-		throw new Error(await resp.text());
-	}
-	if (!resp.body) {
-		throw new Error("Body is empty");
-	}
-	const decoder = new TextDecoder();
-	const reader = resp.body.getReader();
-	let isDone = false;
-	let result = "";
-	while (!isDone) {
-		const { done, value } = await reader.read();
-		isDone = done;
-		result += decoder.decode(value, { stream: true }); // Convert current chunk to text
-	}
-	// Close the reader when done
-	reader.releaseLock();
-	let results;
-	if (result.startsWith("data:")) {
-		results = [JSON.parse(result.split("data:")?.pop() ?? "")];
-	} else {
-		results = JSON.parse(result);
-	}
-	let generated_text = trimSuffix(
-		trimPrefix(trimPrefix(results[0].generated_text, "<|startoftext|>"), prompt),
-		PUBLIC_SEP_TOKEN
-	).trimEnd();
-	for (const stop of [...(newParameters?.stop ?? []), "<|endoftext|>"]) {
-		if (generated_text.endsWith(stop)) {
-			generated_text = generated_text.slice(0, -stop.length).trimEnd();
 		}
 	}
-	return generated_text;
 }

 import { smallModel } from "$lib/server/models";
+import type { Conversation } from "$lib/types/Conversation";
+export async function generateFromDefaultEndpoint({
+	messages,
+	preprompt,
+}: {
+	messages: Omit<Conversation["messages"][0], "id">[];
+	preprompt?: string;
+}): Promise<string> {
+	const endpoint = await smallModel.getEndpoint();
+	const tokenStream = await endpoint({ conversation: { messages, preprompt } });
+	for await (const output of tokenStream) {
+		// if not generated_text is here it means the generation is not done
+		if (output.generated_text) {
+			let generated_text = output.generated_text;
+			for (const stop of [...(smallModel.parameters?.stop ?? []), "<|endoftext|>"]) {
+				if (generated_text.endsWith(stop)) {
+					generated_text = generated_text.slice(0, -stop.length).trimEnd();
+				}
+			}
+			return generated_text;
 		}
 	}
+	throw new Error("Generation failed");
 }

src/lib/server/modelEndpoint.ts DELETED Viewed

@@ -1,50 +0,0 @@
-import {
-	HF_ACCESS_TOKEN,
-	HF_API_ROOT,
-	USE_CLIENT_CERTIFICATE,
-	CERT_PATH,
-	KEY_PATH,
-	CA_PATH,
-	CLIENT_KEY_PASSWORD,
-	REJECT_UNAUTHORIZED,
-} from "$env/static/private";
-import { sum } from "$lib/utils/sum";
-import type { BackendModel, Endpoint } from "./models";
-import { loadClientCertificates } from "$lib/utils/loadClientCerts";
-if (USE_CLIENT_CERTIFICATE === "true") {
-	loadClientCertificates(
-		CERT_PATH,
-		KEY_PATH,
-		CA_PATH,
-		CLIENT_KEY_PASSWORD,
-		REJECT_UNAUTHORIZED === "true"
-	);
-}
-/**
- * Find a random load-balanced endpoint
- */
-export function modelEndpoint(model: BackendModel): Endpoint {
-	if (!model.endpoints) {
-		return {
-			host: "tgi",
-			url: `${HF_API_ROOT}/${model.name}`,
-			authorization: `Bearer ${HF_ACCESS_TOKEN}`,
-			weight: 1,
-		};
-	}
-	const endpoints = model.endpoints;
-	const totalWeight = sum(endpoints.map((e) => e.weight));
-	let random = Math.random() * totalWeight;
-	for (const endpoint of endpoints) {
-		if (random < endpoint.weight) {
-			return endpoint;
-		}
-		random -= endpoint.weight;
-	}
-	throw new Error("Invalid config, no endpoint found");
-}

src/lib/server/models.ts CHANGED Viewed

@@ -1,42 +1,13 @@
-import { HF_ACCESS_TOKEN, MODELS, OLD_MODELS, TASK_MODEL } from "$env/static/private";
 import type { ChatTemplateInput } from "$lib/types/Template";
 import { compileTemplate } from "$lib/utils/template";
 import { z } from "zod";
 type Optional<T, K extends keyof T> = Pick<Partial<T>, K> & Omit<T, K>;
-const sagemakerEndpoint = z.object({
-	host: z.literal("sagemaker"),
-	url: z.string().url(),
-	accessKey: z.string().min(1),
-	secretKey: z.string().min(1),
-	sessionToken: z.string().optional(),
-});
-const tgiEndpoint = z.object({
-	host: z.union([z.literal("tgi"), z.undefined()]),
-	url: z.string().url(),
-	authorization: z.string().min(1).default(`Bearer ${HF_ACCESS_TOKEN}`),
-});
-const commonEndpoint = z.object({
-	weight: z.number().int().positive().default(1),
-});
-const endpoint = z.lazy(() =>
-	z.union([sagemakerEndpoint.merge(commonEndpoint), tgiEndpoint.merge(commonEndpoint)])
-);
-const combinedEndpoint = endpoint.transform((data) => {
-	if (data.host === "tgi" || data.host === undefined) {
-		return tgiEndpoint.merge(commonEndpoint).parse(data);
-	} else if (data.host === "sagemaker") {
-		return sagemakerEndpoint.merge(commonEndpoint).parse(data);
-	} else {
-		throw new Error(`Invalid host: ${data.host}`);
-	}
-});
 const modelConfig = z.object({
 	/** Used as an identifier in DB */
 	id: z.string().optional(),
@@ -73,13 +44,16 @@ const modelConfig = z.object({
 			})
 		)
 		.optional(),
-	endpoints: z.array(combinedEndpoint).optional(),
 	parameters: z
 		.object({
 			temperature: z.number().min(0).max(1),
 			truncate: z.number().int().positive(),
 			max_new_tokens: z.number().int().positive(),
 			stop: z.array(z.string()).optional(),
 		})
 		.passthrough()
 		.optional(),
@@ -98,7 +72,48 @@ const processModel = async (m: z.infer<typeof modelConfig>) => ({
 	parameters: { ...m.parameters, stop_sequences: m.parameters?.stop },
 });
-export const models = await Promise.all(modelsRaw.map(processModel));
 // Models that have been deprecated
 export const oldModels = OLD_MODELS
@@ -114,18 +129,19 @@ export const oldModels = OLD_MODELS
 			.map((m) => ({ ...m, id: m.id || m.name, displayName: m.displayName || m.name }))
 	: [];
-export const defaultModel = models[0];
 export const validateModel = (_models: BackendModel[]) => {
 	// Zod enum function requires 2 parameters
 	return z.enum([_models[0].id, ..._models.slice(1).map((m) => m.id)]);
 };
 // if `TASK_MODEL` is the name of a model we use it, else we try to parse `TASK_MODEL` as a model config itself
 export const smallModel = TASK_MODEL
-	? models.find((m) => m.name === TASK_MODEL) ||
-	  (await processModel(modelConfig.parse(JSON.parse(TASK_MODEL))))
 	: defaultModel;
-export type BackendModel = Optional<(typeof models)[0], "preprompt" | "parameters">;
-export type Endpoint = z.infer<typeof endpoint>;

+import { HF_ACCESS_TOKEN, HF_API_ROOT, MODELS, OLD_MODELS, TASK_MODEL } from "$env/static/private";
 import type { ChatTemplateInput } from "$lib/types/Template";
 import { compileTemplate } from "$lib/utils/template";
 import { z } from "zod";
+import endpoints, { endpointSchema, type Endpoint } from "./endpoints/endpoints";
+import endpointTgi from "./endpoints/tgi/endpointTgi";
+import { sum } from "$lib/utils/sum";
 type Optional<T, K extends keyof T> = Pick<Partial<T>, K> & Omit<T, K>;
 const modelConfig = z.object({
 	/** Used as an identifier in DB */
 	id: z.string().optional(),
 			})
 		)
 		.optional(),
+	endpoints: z.array(endpointSchema).optional(),
 	parameters: z
 		.object({
 			temperature: z.number().min(0).max(1),
 			truncate: z.number().int().positive(),
 			max_new_tokens: z.number().int().positive(),
 			stop: z.array(z.string()).optional(),
+			top_p: z.number().positive().optional(),
+			top_k: z.number().positive().optional(),
+			repetition_penalty: z.number().min(-2).max(2).optional(),
 		})
 		.passthrough()
 		.optional(),
 	parameters: { ...m.parameters, stop_sequences: m.parameters?.stop },
 });
+const addEndpoint = (m: Awaited<ReturnType<typeof processModel>>) => ({
+	...m,
+	getEndpoint: async (): Promise<Endpoint> => {
+		if (!m.endpoints) {
+			return endpointTgi({
+				type: "tgi",
+				url: `${HF_API_ROOT}/${m.name}`,
+				accessToken: HF_ACCESS_TOKEN,
+				weight: 1,
+				model: m,
+			});
+		}
+		const totalWeight = sum(m.endpoints.map((e) => e.weight));
+		let random = Math.random() * totalWeight;
+		for (const endpoint of m.endpoints) {
+			if (random < endpoint.weight) {
+				const args = { ...endpoint, model: m };
+				if (args.type === "tgi") {
+					return endpoints.tgi(args);
+				} else if (args.type === "aws") {
+					return await endpoints.sagemaker(args);
+				} else if (args.type === "openai") {
+					return await endpoints.openai(args);
+				} else if (args.type === "llamacpp") {
+					return await endpoints.llamacpp(args);
+				} else {
+					// for legacy reason
+					return await endpoints.tgi(args);
+				}
+			}
+			random -= endpoint.weight;
+		}
+		throw new Error(`Failed to select endpoint`);
+	},
+});
+export const models = await Promise.all(modelsRaw.map((e) => processModel(e).then(addEndpoint)));
+export const defaultModel = models[0];
 // Models that have been deprecated
 export const oldModels = OLD_MODELS
 			.map((m) => ({ ...m, id: m.id || m.name, displayName: m.displayName || m.name }))
 	: [];
 export const validateModel = (_models: BackendModel[]) => {
 	// Zod enum function requires 2 parameters
 	return z.enum([_models[0].id, ..._models.slice(1).map((m) => m.id)]);
 };
 // if `TASK_MODEL` is the name of a model we use it, else we try to parse `TASK_MODEL` as a model config itself
 export const smallModel = TASK_MODEL
+	? (models.find((m) => m.name === TASK_MODEL) ||
+			(await processModel(modelConfig.parse(JSON.parse(TASK_MODEL))).then((m) =>
+				addEndpoint(m)
+			))) ??
+	  defaultModel
 	: defaultModel;
+export type BackendModel = Optional<typeof defaultModel, "preprompt" | "parameters">;

src/lib/server/summarize.ts CHANGED Viewed

@@ -1,6 +1,5 @@
 import { LLM_SUMMERIZATION } from "$env/static/private";
 import { generateFromDefaultEndpoint } from "$lib/server/generateFromDefaultEndpoint";
-import { smallModel } from "$lib/server/models";
 import type { Message } from "$lib/types/Message";
 export async function summarize(prompt: string) {
@@ -23,17 +22,13 @@ export async function summarize(prompt: string) {
 		{ from: "assistant", content: "🎥 Favorite movie" },
 		{ from: "user", content: "Explain the concept of artificial intelligence in one sentence" },
 		{ from: "assistant", content: "🤖 AI definition" },
-		{ from: "user", content: "Answer all my questions like chewbacca from now ok?" },
-		{ from: "assistant", content: "🐒 Answer as Chewbacca" },
 		{ from: "user", content: prompt },
 	];
-	const summaryPrompt = smallModel.chatPromptRender({
 		messages,
 		preprompt: `You are a summarization AI. You'll never answer a user's question directly, but instead summarize the user's request into a single short sentence of four words or less. Always start your answer with an emoji relevant to the summary.`,
-	});
-	return await generateFromDefaultEndpoint(summaryPrompt)
 		.then((summary) => {
 			// add an emoji if none is found in the first three characters
 			if (!/\p{Emoji}/u.test(summary.slice(0, 3))) {

 import { LLM_SUMMERIZATION } from "$env/static/private";
 import { generateFromDefaultEndpoint } from "$lib/server/generateFromDefaultEndpoint";
 import type { Message } from "$lib/types/Message";
 export async function summarize(prompt: string) {
 		{ from: "assistant", content: "🎥 Favorite movie" },
 		{ from: "user", content: "Explain the concept of artificial intelligence in one sentence" },
 		{ from: "assistant", content: "🤖 AI definition" },
 		{ from: "user", content: prompt },
 	];
+	return await generateFromDefaultEndpoint({
 		messages,
 		preprompt: `You are a summarization AI. You'll never answer a user's question directly, but instead summarize the user's request into a single short sentence of four words or less. Always start your answer with an emoji relevant to the summary.`,
+	})
 		.then((summary) => {
 			// add an emoji if none is found in the first three characters
 			if (!/\p{Emoji}/u.test(summary.slice(0, 3))) {

src/lib/server/websearch/generateQuery.ts CHANGED Viewed

@@ -1,7 +1,6 @@
 import type { Message } from "$lib/types/Message";
 import { format } from "date-fns";
 import { generateFromDefaultEndpoint } from "../generateFromDefaultEndpoint";
-import { smallModel } from "../models";
 export async function generateQuery(messages: Message[]) {
 	const currentDate = format(new Date(), "MMMM d, yyyy");
@@ -62,10 +61,8 @@ Current Question: Where is it being hosted ?`,
 		},
 	];
-	const promptQuery = smallModel.chatPromptRender({
-		preprompt: `You are tasked with generating web search queries. Give me an appropriate query to answer my question for google search. Answer with only the query. Today is ${currentDate}`,
 		messages: convQuery,
 	});
-	return await generateFromDefaultEndpoint(promptQuery);
 }

 import type { Message } from "$lib/types/Message";
 import { format } from "date-fns";
 import { generateFromDefaultEndpoint } from "../generateFromDefaultEndpoint";
 export async function generateQuery(messages: Message[]) {
 	const currentDate = format(new Date(), "MMMM d, yyyy");
 		},
 	];
+	return await generateFromDefaultEndpoint({
 		messages: convQuery,
+		preprompt: `You are tasked with generating web search queries. Give me an appropriate query to answer my question for google search. Answer with only the query. Today is ${currentDate}`,
 	});
 }

src/lib/utils/trimPrefix.ts DELETED Viewed

@@ -1,6 +0,0 @@
-export function trimPrefix(input: string, prefix: string) {
-	if (input.startsWith(prefix)) {
-		return input.slice(prefix.length);
-	}
-	return input;
-}

src/lib/utils/trimSuffix.ts DELETED Viewed

@@ -1,6 +0,0 @@
-export function trimSuffix(input: string, end: string): string {
-	if (input.endsWith(end)) {
-		return input.slice(0, input.length - end.length);
-	}
-	return input;
-}

src/routes/conversation/[id]/+page.svelte CHANGED Viewed

@@ -171,6 +171,8 @@
 											convId: $page.params.id,
 										};
 									}
 								}
 							}
 						} catch (parseError) {

 											convId: $page.params.id,
 										};
 									}
+								} else if (update.status === "error") {
+									$error = update.message ?? "An error has occurred";
 								}
 							}
 						} catch (parseError) {

src/routes/conversation/[id]/+server.ts CHANGED Viewed

@@ -1,26 +1,19 @@
-import { HF_ACCESS_TOKEN, MESSAGES_BEFORE_LOGIN, RATE_LIMIT } from "$env/static/private";
-import { buildPrompt } from "$lib/buildPrompt";
-import { PUBLIC_SEP_TOKEN } from "$lib/constants/publicSepToken";
 import { authCondition, requiresUser } from "$lib/server/auth";
 import { collections } from "$lib/server/database";
-import { modelEndpoint } from "$lib/server/modelEndpoint";
 import { models } from "$lib/server/models";
 import { ERROR_MESSAGES } from "$lib/stores/errors";
 import type { Message } from "$lib/types/Message";
-import { trimPrefix } from "$lib/utils/trimPrefix";
-import { trimSuffix } from "$lib/utils/trimSuffix";
-import { textGenerationStream } from "@huggingface/inference";
 import { error } from "@sveltejs/kit";
 import { ObjectId } from "mongodb";
 import { z } from "zod";
-import { AwsClient } from "aws4fetch";
 import type { MessageUpdate } from "$lib/types/MessageUpdate";
 import { runWebSearch } from "$lib/server/websearch/runWebSearch";
 import type { WebSearch } from "$lib/types/WebSearch";
 import { abortedGenerations } from "$lib/server/abortedGenerations";
 import { summarize } from "$lib/server/summarize";
-export async function POST({ request, fetch, locals, params, getClientAddress }) {
 	const id = z.string().parse(params.id);
 	const convId = new ObjectId(id);
 	const promptedAt = new Date();
@@ -191,138 +184,90 @@ export async function POST({ request, fetch, locals, params, getClientAddress })
 				webSearchResults = await runWebSearch(conv, newPrompt, update);
 			}
-			// we can now build the prompt using the messages
-			const prompt = await buildPrompt({
-				messages,
-				model,
-				webSearch: webSearchResults,
-				preprompt: conv.preprompt ?? model.preprompt,
-				locals: locals,
-			});
-			// fetch the endpoint
-			const randomEndpoint = modelEndpoint(model);
-			let usedFetch = fetch;
-			if (randomEndpoint.host === "sagemaker") {
-				const aws = new AwsClient({
-					accessKeyId: randomEndpoint.accessKey,
-					secretAccessKey: randomEndpoint.secretKey,
-					sessionToken: randomEndpoint.sessionToken,
-					service: "sagemaker",
-				});
-				usedFetch = aws.fetch.bind(aws) as typeof fetch;
-			}
-			async function saveLast(generated_text: string) {
-				if (!conv) {
-					throw error(404, "Conversation not found");
-				}
-				const lastMessage = messages[messages.length - 1];
-				if (lastMessage) {
-					// We could also check if PUBLIC_ASSISTANT_MESSAGE_TOKEN is present and use it to slice the text
-					if (generated_text.startsWith(prompt)) {
-						generated_text = generated_text.slice(prompt.length);
-					}
-					generated_text = trimSuffix(
-						trimPrefix(generated_text, "<|startoftext|>"),
-						PUBLIC_SEP_TOKEN
-					).trimEnd();
-					// remove the stop tokens
-					for (const stop of [...(model?.parameters?.stop ?? []), "<|endoftext|>"]) {
-						if (generated_text.endsWith(stop)) {
-							generated_text = generated_text.slice(0, -stop.length).trimEnd();
 						}
-					}
-					lastMessage.content = generated_text;
-					await collections.conversations.updateOne(
-						{
-							_id: convId,
-						},
-						{
-							$set: {
-								messages,
-								title: conv.title,
 								updatedAt: new Date(),
 							},
-						}
-					);
-					update({
-						type: "finalAnswer",
-						text: generated_text,
-					});
 				}
 			}
-			const tokenStream = textGenerationStream(
 				{
-					parameters: {
-						...models.find((m) => m.id === conv.model)?.parameters,
-						return_full_text: false,
-					},
-					model: randomEndpoint.url,
-					inputs: prompt,
-					accessToken: randomEndpoint.host === "sagemaker" ? undefined : HF_ACCESS_TOKEN,
 				},
 				{
-					use_cache: false,
-					fetch: usedFetch,
 				}
 			);
-			for await (const output of tokenStream) {
-				// if not generated_text is here it means the generation is not done
-				if (!output.generated_text) {
-					// else we get the next token
-					if (!output.token.special) {
-						const lastMessage = messages[messages.length - 1];
-						update({
-							type: "stream",
-							token: output.token.text,
-						});
-						// if the last message is not from assistant, it means this is the first token
-						if (lastMessage?.from !== "assistant") {
-							// so we create a new message
-							messages = [
-								...messages,
-								// id doesn't match the backend id but it's not important for assistant messages
-								// First token has a space at the beginning, trim it
-								{
-									from: "assistant",
-									content: output.token.text.trimStart(),
-									webSearch: webSearchResults,
-									updates: updates,
-									id: (responseId as Message["id"]) || crypto.randomUUID(),
-									createdAt: new Date(),
-									updatedAt: new Date(),
-								},
-							];
-						} else {
-							const date = abortedGenerations.get(convId.toString());
-							if (date && date > promptedAt) {
-								saveLast(lastMessage.content);
-							}
-							if (!output) {
-								break;
-							}
-							// otherwise we just concatenate tokens
-							lastMessage.content += output.token.text;
-						}
-					}
-				} else {
-					saveLast(output.generated_text);
-				}
-			}
 		},
 		async cancel() {
 			await collections.conversations.updateOne(

+import { MESSAGES_BEFORE_LOGIN, RATE_LIMIT } from "$env/static/private";
 import { authCondition, requiresUser } from "$lib/server/auth";
 import { collections } from "$lib/server/database";
 import { models } from "$lib/server/models";
 import { ERROR_MESSAGES } from "$lib/stores/errors";
 import type { Message } from "$lib/types/Message";
 import { error } from "@sveltejs/kit";
 import { ObjectId } from "mongodb";
 import { z } from "zod";
 import type { MessageUpdate } from "$lib/types/MessageUpdate";
 import { runWebSearch } from "$lib/server/websearch/runWebSearch";
 import type { WebSearch } from "$lib/types/WebSearch";
 import { abortedGenerations } from "$lib/server/abortedGenerations";
 import { summarize } from "$lib/server/summarize";
+export async function POST({ request, locals, params, getClientAddress }) {
 	const id = z.string().parse(params.id);
 	const convId = new ObjectId(id);
 	const promptedAt = new Date();
 				webSearchResults = await runWebSearch(conv, newPrompt, update);
 			}
+			messages[messages.length - 1].webSearch = webSearchResults;
+			conv.messages = messages;
+			try {
+				const endpoint = await model.getEndpoint();
+				for await (const output of await endpoint({ conversation: conv })) {
+					// if not generated_text is here it means the generation is not done
+					if (!output.generated_text) {
+						// else we get the next token
+						if (!output.token.special) {
+							update({
+								type: "stream",
+								token: output.token.text,
+							});
+							// if the last message is not from assistant, it means this is the first token
+							const lastMessage = messages[messages.length - 1];
+							if (lastMessage?.from !== "assistant") {
+								// so we create a new message
+								messages = [
+									...messages,
+									// id doesn't match the backend id but it's not important for assistant messages
+									// First token has a space at the beginning, trim it
+									{
+										from: "assistant",
+										content: output.token.text.trimStart(),
+										webSearch: webSearchResults,
+										updates: updates,
+										id: (responseId as Message["id"]) || crypto.randomUUID(),
+										createdAt: new Date(),
+										updatedAt: new Date(),
+									},
+								];
+							} else {
+								// abort check
+								const date = abortedGenerations.get(convId.toString());
+								if (date && date > promptedAt) {
+									break;
+								}
+								if (!output) {
+									break;
+								}
+								// otherwise we just concatenate tokens
+								lastMessage.content += output.token.text;
+							}
 						}
+					} else {
+						// add output.generated text to the last message
+						messages = [
+							...messages.slice(0, -1),
+							{
+								...messages[messages.length - 1],
+								content: output.generated_text,
+								updates: updates,
 								updatedAt: new Date(),
 							},
+						];
+					}
 				}
+			} catch (e) {
+				console.error(e);
+				update({ type: "status", status: "error", message: (e as Error).message });
 			}
+			await collections.conversations.updateOne(
 				{
+					_id: convId,
 				},
 				{
+					$set: {
+						messages,
+						title: conv?.title,
+						updatedAt: new Date(),
+					},
 				}
 			);
+			update({
+				type: "finalAnswer",
+				text: messages[messages.length - 1].content,
+			});
 		},
 		async cancel() {
 			await collections.conversations.updateOne(