nsarrazin HF staff chenhunghan Henry Chen Mishig coyotte508 HF staff commited on
Commit
9db8ced
·
unverified ·
1 Parent(s): 04a868e

Modular backends & support for openAI & AWS endpoints (#541)

Browse files

* Fix the response

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Should use /completions

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Use async generator

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Use openai npm

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Fix generateFromDefaultEndpoint

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Fix last char become undefined

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Better support for system prompt

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Updates

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Revert

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Update README

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Default system prompt

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* remove sk-

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Fixing types

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Fix lockfile

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Move .optional

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Add try...catch and controller.error(error)

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* baseURL

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Format

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Fix types

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Fix again

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Better error message

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Update README

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>

* Refactor backend to add support for modular backends

* readme fix

* readme update

* add support for lambda on aws endpoint

* upsate doc for lambda support

* fix typecheck

* make imports really optional

* readme fixes

* make endpoint creator async

* Update README.md

Co-authored-by: Henry Chen <1474479+chenhunghan@users.noreply.github.com>

* Update README.md

Co-authored-by: Henry Chen <1474479+chenhunghan@users.noreply.github.com>

* Update src/lib/server/endpoints/openai/endpointOai.ts

Co-authored-by: Henry Chen <1474479+chenhunghan@users.noreply.github.com>

* trailing comma

* Update README.md

Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu>

* change readme example name

* Update src/lib/server/models.ts

Co-authored-by: Eliott C. <coyotte508@gmail.com>

* fixed preprompt to use conversation.preprompt

* Make openAI endpoint compatible with Azure OpenAI

* surface errors in generation

* Added support for llamacpp endpoint

* fix llamacpp endpoint so it properly stops

* Add llamacpp example to readme

* Add support for legacy configs

---------

Signed-off-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>
Co-authored-by: Hung-Han (Henry) Chen <chenhungh@gmail.com>
Co-authored-by: Henry Chen <1474479+chenhunghan@users.noreply.github.com>
Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu>
Co-authored-by: Eliott C. <coyotte508@gmail.com>

.env CHANGED
@@ -8,6 +8,7 @@ MONGODB_DIRECT_CONNECTION=false
8
  COOKIE_NAME=hf-chat
9
  HF_ACCESS_TOKEN=#hf_<token> from from https://huggingface.co/settings/token
10
  HF_API_ROOT=https://api-inference.huggingface.co/models
 
11
 
12
  # used to activate search with web functionality. disabled if none are defined. choose one of the following:
13
  YDC_API_KEY=#your docs.you.com api key here
 
8
  COOKIE_NAME=hf-chat
9
  HF_ACCESS_TOKEN=#hf_<token> from from https://huggingface.co/settings/token
10
  HF_API_ROOT=https://api-inference.huggingface.co/models
11
+ OPENAI_API_KEY=#your openai api key here
12
 
13
  # used to activate search with web functionality. disabled if none are defined. choose one of the following:
14
  YDC_API_KEY=#your docs.you.com api key here
README.md CHANGED
@@ -168,6 +168,91 @@ MODELS=`[
168
 
169
  You can change things like the parameters, or customize the preprompt to better suit your needs. You can also add more models by adding more objects to the array, with different preprompts for example.
170
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
171
  #### Custom prompt templates
172
 
173
  By default, the prompt is constructed using `userMessageToken`, `assistantMessageToken`, `userMessageEndToken`, `assistantMessageEndToken`, `preprompt` parameters and a series of default templates.
@@ -258,23 +343,45 @@ You can then add the generated information and the `authorization` parameter to
258
  ]
259
  ```
260
 
261
- ### Amazon SageMaker
 
 
262
 
263
  You can also specify your Amazon SageMaker instance as an endpoint for chat-ui. The config goes like this:
264
 
265
  ```env
266
  "endpoints": [
267
  {
268
- "host" : "sagemaker",
269
- "url": "", // your aws sagemaker url here
 
270
  "accessKey": "",
271
  "secretKey" : "",
272
- "sessionToken": "", // optional
273
  "weight": 1
274
  }
275
  ]
276
  ```
277
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
278
  You can get the `accessKey` and `secretKey` from your AWS user, under programmatic access.
279
 
280
  #### Client Certificate Authentication (mTLS)
 
168
 
169
  You can change things like the parameters, or customize the preprompt to better suit your needs. You can also add more models by adding more objects to the array, with different preprompts for example.
170
 
171
+ #### OpenAI API compatible models
172
+
173
+ Chat UI can be used with any API server that supports OpenAI API compatibility, for example [text-generation-webui](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai), [LocalAI](https://github.com/go-skynet/LocalAI), [FastChat](https://github.com/lm-sys/FastChat/blob/main/docs/openai_api.md), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), and [ialacol](https://github.com/chenhunghan/ialacol).
174
+
175
+ The following example config makes Chat UI works with [text-generation-webui](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai), the `endpoint.baseUrl` is the url of the OpenAI API compatible server, this overrides the baseUrl to be used by OpenAI instance. The `endpoint.completion` determine which endpoint to be used, default is `chat_completions` which uses `v1/chat/completions`, change to `endpoint.completion` to `completions` to use the `v1/completions` endpoint.
176
+
177
+ ```
178
+ MODELS=`[
179
+ {
180
+ "name": "text-generation-webui",
181
+ "id": "text-generation-webui",
182
+ "parameters": {
183
+ "temperature": 0.9,
184
+ "top_p": 0.95,
185
+ "repetition_penalty": 1.2,
186
+ "top_k": 50,
187
+ "truncate": 1000,
188
+ "max_new_tokens": 1024,
189
+ "stop": []
190
+ },
191
+ "endpoints": [{
192
+ "type" : "openai",
193
+ "baseURL": "http://localhost:8000/v1"
194
+ }]
195
+ }
196
+ ]`
197
+
198
+ ```
199
+
200
+ The `openai` type includes official OpenAI models. You can add, for example, GPT4/GPT3.5 as a "openai" model:
201
+
202
+ ```
203
+ OPENAI_API_KEY=#your openai api key here
204
+ MODELS=`[{
205
+ "name": "gpt-4",
206
+ "displayName": "GPT 4",
207
+ "endpoints" : [{
208
+ "type": "openai"
209
+ }]
210
+ },
211
+ {
212
+ "name": "gpt-3.5-turbo",
213
+ "displayName": "GPT 3.5 Turbo",
214
+ "endpoints" : [{
215
+ "type": "openai"
216
+ }]
217
+ }]`
218
+ ```
219
+
220
+ #### Llama.cpp API server
221
+
222
+ chat-ui also supports the llama.cpp API server directly without the need for an adapter. You can do this using the `llamacpp` endpoint type.
223
+
224
+ If you want to run chat-ui with llama.cpp, you can do the following, using Zephyr as an example model:
225
+
226
+ 1. Get [the weights](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/tree/main) from the hub
227
+ 2. Run the server with the following command: `./server -m models/zephyr-7b-beta.Q4_K_M.gguf -c 2048 -np 3`
228
+ 3. Add the following to your `.env.local`:
229
+
230
+ ```env
231
+ MODELS=[
232
+ {
233
+ "name": "Local Zephyr",
234
+ "chatPromptTemplate": "<|system|>\n{{preprompt}}</s>\n{{#each messages}}{{#ifUser}}<|user|>\n{{content}}</s>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}</s>\n{{/ifAssistant}}{{/each}}",
235
+ "parameters": {
236
+ "temperature": 0.1,
237
+ "top_p": 0.95,
238
+ "repetition_penalty": 1.2,
239
+ "top_k": 50,
240
+ "truncate": 1000,
241
+ "max_new_tokens": 2048,
242
+ "stop": ["</s>"]
243
+ },
244
+ "endpoints": [
245
+ {
246
+ "url": "http://127.0.0.1:8080",
247
+ "type": "llamacpp"
248
+ }
249
+ ]
250
+ }
251
+ ]
252
+ ```
253
+
254
+ Start chat-ui with `npm run dev` and you should be able to chat with Zephyr locally.
255
+
256
  #### Custom prompt templates
257
 
258
  By default, the prompt is constructed using `userMessageToken`, `assistantMessageToken`, `userMessageEndToken`, `assistantMessageEndToken`, `preprompt` parameters and a series of default templates.
 
343
  ]
344
  ```
345
 
346
+ ### Amazon
347
+
348
+ #### SageMaker
349
 
350
  You can also specify your Amazon SageMaker instance as an endpoint for chat-ui. The config goes like this:
351
 
352
  ```env
353
  "endpoints": [
354
  {
355
+ "type" : "aws",
356
+ "service" : "sagemaker"
357
+ "url": "",
358
  "accessKey": "",
359
  "secretKey" : "",
360
+ "sessionToken": "",
361
  "weight": 1
362
  }
363
  ]
364
  ```
365
 
366
+ #### Lambda
367
+
368
+ You can also specify your Amazon Lambda instance as an endpoint for chat-ui. The config goes like this:
369
+
370
+ ```env
371
+ "endpoints" : [
372
+ {
373
+ "type": "aws",
374
+ "service": "lambda",
375
+ "url": "",
376
+ "accessKey": "",
377
+ "secretKey": "",
378
+ "sessionToken": "",
379
+ "region": "",
380
+ "weight": 1
381
+ }
382
+ ]
383
+ ```
384
+
385
  You can get the `accessKey` and `secretKey` from your AWS user, under programmatic access.
386
 
387
  #### Client Certificate Authentication (mTLS)
package-lock.json CHANGED
@@ -12,7 +12,6 @@
12
  "@huggingface/inference": "^2.6.3",
13
  "@xenova/transformers": "^2.6.0",
14
  "autoprefixer": "^10.4.14",
15
- "aws4fetch": "^1.0.17",
16
  "date-fns": "^2.29.3",
17
  "dotenv": "^16.0.3",
18
  "handlebars": "^4.7.8",
@@ -55,6 +54,10 @@
55
  "unplugin-icons": "^0.16.1",
56
  "vite": "^4.3.9",
57
  "vitest": "^0.31.0"
 
 
 
 
58
  }
59
  },
60
  "node_modules/@ampproject/remapping": {
@@ -1120,6 +1123,16 @@
1120
  "resolved": "https://registry.npmjs.org/@types/node/-/node-18.13.0.tgz",
1121
  "integrity": "sha512-gC3TazRzGoOnoKAhUx+Q0t8S9Tzs74z7m0ipwGpSqQrleP14hKxP4/JUeEQcD3W1/aIpnWl8pHowI7WokuZpXg=="
1122
  },
 
 
 
 
 
 
 
 
 
 
1123
  "node_modules/@types/node-int64": {
1124
  "version": "0.4.29",
1125
  "resolved": "https://registry.npmjs.org/@types/node-int64/-/node-int64-0.4.29.tgz",
@@ -1478,6 +1491,18 @@
1478
  "resolved": "https://registry.npmjs.org/abab/-/abab-2.0.6.tgz",
1479
  "integrity": "sha512-j2afSsaIENvHZN2B8GOpF566vZ5WVk5opAiMTvWgaQT8DkbOqsTfvNAvHoRGU2zzP8cPoqys+xHTRDWW8L+/BA=="
1480
  },
 
 
 
 
 
 
 
 
 
 
 
 
1481
  "node_modules/acorn": {
1482
  "version": "8.10.0",
1483
  "resolved": "https://registry.npmjs.org/acorn/-/acorn-8.10.0.tgz",
@@ -1519,6 +1544,18 @@
1519
  "node": ">= 6.0.0"
1520
  }
1521
  },
 
 
 
 
 
 
 
 
 
 
 
 
1522
  "node_modules/ajv": {
1523
  "version": "6.12.6",
1524
  "resolved": "https://registry.npmjs.org/ajv/-/ajv-6.12.6.tgz",
@@ -1654,7 +1691,8 @@
1654
  "node_modules/aws4fetch": {
1655
  "version": "1.0.17",
1656
  "resolved": "https://registry.npmjs.org/aws4fetch/-/aws4fetch-1.0.17.tgz",
1657
- "integrity": "sha512-4IbOvsxqxeOSxI4oA+8xEO8SzBMVlzbSTgGy/EF83rHnQ/aKtP6Sc6YV/k0oiW0mqrcxuThlbDosnvetGOuO+g=="
 
1658
  },
1659
  "node_modules/axobject-query": {
1660
  "version": "3.2.1",
@@ -1675,6 +1713,12 @@
1675
  "resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-1.0.2.tgz",
1676
  "integrity": "sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw=="
1677
  },
 
 
 
 
 
 
1678
  "node_modules/base64-js": {
1679
  "version": "1.5.1",
1680
  "resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz",
@@ -1924,6 +1968,15 @@
1924
  "url": "https://github.com/chalk/chalk?sponsor=1"
1925
  }
1926
  },
 
 
 
 
 
 
 
 
 
1927
  "node_modules/check-error": {
1928
  "version": "1.0.2",
1929
  "resolved": "https://registry.npmjs.org/check-error/-/check-error-1.0.2.tgz",
@@ -2112,6 +2165,15 @@
2112
  "node": ">= 8"
2113
  }
2114
  },
 
 
 
 
 
 
 
 
 
2115
  "node_modules/css-tree": {
2116
  "version": "2.3.1",
2117
  "resolved": "https://registry.npmjs.org/css-tree/-/css-tree-2.3.1.tgz",
@@ -2331,6 +2393,16 @@
2331
  "node": ">=0.3.1"
2332
  }
2333
  },
 
 
 
 
 
 
 
 
 
 
2334
  "node_modules/dir-glob": {
2335
  "version": "3.0.1",
2336
  "resolved": "https://registry.npmjs.org/dir-glob/-/dir-glob-3.0.1.tgz",
@@ -2683,6 +2755,15 @@
2683
  "node": ">=0.10.0"
2684
  }
2685
  },
 
 
 
 
 
 
 
 
 
2686
  "node_modules/execa": {
2687
  "version": "5.1.1",
2688
  "resolved": "https://registry.npmjs.org/execa/-/execa-5.1.1.tgz",
@@ -2853,6 +2934,25 @@
2853
  "node": ">= 6"
2854
  }
2855
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2856
  "node_modules/fraction.js": {
2857
  "version": "4.2.0",
2858
  "resolved": "https://registry.npmjs.org/fraction.js/-/fraction.js-4.2.0.tgz",
@@ -3118,6 +3218,15 @@
3118
  "node": ">=10.17.0"
3119
  }
3120
  },
 
 
 
 
 
 
 
 
 
3121
  "node_modules/iconv-lite": {
3122
  "version": "0.6.3",
3123
  "resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.6.3.tgz",
@@ -3227,6 +3336,12 @@
3227
  "node": ">=8"
3228
  }
3229
  },
 
 
 
 
 
 
3230
  "node_modules/is-builtin-module": {
3231
  "version": "3.2.1",
3232
  "resolved": "https://registry.npmjs.org/is-builtin-module/-/is-builtin-module-3.2.1.tgz",
@@ -3662,6 +3777,17 @@
3662
  "marked": ">=4 <10"
3663
  }
3664
  },
 
 
 
 
 
 
 
 
 
 
 
3665
  "node_modules/md5-hex": {
3666
  "version": "3.0.1",
3667
  "resolved": "https://registry.npmjs.org/md5-hex/-/md5-hex-3.0.1.tgz",
@@ -3939,6 +4065,67 @@
3939
  "resolved": "https://registry.npmjs.org/node-addon-api/-/node-addon-api-6.1.0.tgz",
3940
  "integrity": "sha512-+eawOlIgy680F0kBzPUNFhMZGtJ1YmqM6l4+Crf4IkImjYrO/mqPwRMh352g23uIaQKFItcQ64I7KMaJxHgAVA=="
3941
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3942
  "node_modules/node-gyp-build": {
3943
  "version": "4.6.1",
3944
  "resolved": "https://registry.npmjs.org/node-gyp-build/-/node-gyp-build-4.6.1.tgz",
@@ -4089,6 +4276,35 @@
4089
  "platform": "^1.3.6"
4090
  }
4091
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4092
  "node_modules/openid-client": {
4093
  "version": "5.4.2",
4094
  "resolved": "https://registry.npmjs.org/openid-client/-/openid-client-5.4.2.tgz",
@@ -6260,6 +6476,15 @@
6260
  "node": ">=14"
6261
  }
6262
  },
 
 
 
 
 
 
 
 
 
6263
  "node_modules/webidl-conversions": {
6264
  "version": "7.0.0",
6265
  "resolved": "https://registry.npmjs.org/webidl-conversions/-/webidl-conversions-7.0.0.tgz",
 
12
  "@huggingface/inference": "^2.6.3",
13
  "@xenova/transformers": "^2.6.0",
14
  "autoprefixer": "^10.4.14",
 
15
  "date-fns": "^2.29.3",
16
  "dotenv": "^16.0.3",
17
  "handlebars": "^4.7.8",
 
54
  "unplugin-icons": "^0.16.1",
55
  "vite": "^4.3.9",
56
  "vitest": "^0.31.0"
57
+ },
58
+ "optionalDependencies": {
59
+ "aws4fetch": "^1.0.17",
60
+ "openai": "^4.14.2"
61
  }
62
  },
63
  "node_modules/@ampproject/remapping": {
 
1123
  "resolved": "https://registry.npmjs.org/@types/node/-/node-18.13.0.tgz",
1124
  "integrity": "sha512-gC3TazRzGoOnoKAhUx+Q0t8S9Tzs74z7m0ipwGpSqQrleP14hKxP4/JUeEQcD3W1/aIpnWl8pHowI7WokuZpXg=="
1125
  },
1126
+ "node_modules/@types/node-fetch": {
1127
+ "version": "2.6.5",
1128
+ "resolved": "https://registry.npmjs.org/@types/node-fetch/-/node-fetch-2.6.5.tgz",
1129
+ "integrity": "sha512-OZsUlr2nxvkqUFLSaY2ZbA+P1q22q+KrlxWOn/38RX+u5kTkYL2mTujEpzUhGkS+K/QCYp9oagfXG39XOzyySg==",
1130
+ "optional": true,
1131
+ "dependencies": {
1132
+ "@types/node": "*",
1133
+ "form-data": "^4.0.0"
1134
+ }
1135
+ },
1136
  "node_modules/@types/node-int64": {
1137
  "version": "0.4.29",
1138
  "resolved": "https://registry.npmjs.org/@types/node-int64/-/node-int64-0.4.29.tgz",
 
1491
  "resolved": "https://registry.npmjs.org/abab/-/abab-2.0.6.tgz",
1492
  "integrity": "sha512-j2afSsaIENvHZN2B8GOpF566vZ5WVk5opAiMTvWgaQT8DkbOqsTfvNAvHoRGU2zzP8cPoqys+xHTRDWW8L+/BA=="
1493
  },
1494
+ "node_modules/abort-controller": {
1495
+ "version": "3.0.0",
1496
+ "resolved": "https://registry.npmjs.org/abort-controller/-/abort-controller-3.0.0.tgz",
1497
+ "integrity": "sha512-h8lQ8tacZYnR3vNQTgibj+tODHI5/+l06Au2Pcriv/Gmet0eaj4TwWH41sO9wnHDiQsEj19q0drzdWdeAHtweg==",
1498
+ "optional": true,
1499
+ "dependencies": {
1500
+ "event-target-shim": "^5.0.0"
1501
+ },
1502
+ "engines": {
1503
+ "node": ">=6.5"
1504
+ }
1505
+ },
1506
  "node_modules/acorn": {
1507
  "version": "8.10.0",
1508
  "resolved": "https://registry.npmjs.org/acorn/-/acorn-8.10.0.tgz",
 
1544
  "node": ">= 6.0.0"
1545
  }
1546
  },
1547
+ "node_modules/agentkeepalive": {
1548
+ "version": "4.5.0",
1549
+ "resolved": "https://registry.npmjs.org/agentkeepalive/-/agentkeepalive-4.5.0.tgz",
1550
+ "integrity": "sha512-5GG/5IbQQpC9FpkRGsSvZI5QYeSCzlJHdpBQntCsuTOxhKD8lqKhrleg2Yi7yvMIf82Ycmmqln9U8V9qwEiJew==",
1551
+ "optional": true,
1552
+ "dependencies": {
1553
+ "humanize-ms": "^1.2.1"
1554
+ },
1555
+ "engines": {
1556
+ "node": ">= 8.0.0"
1557
+ }
1558
+ },
1559
  "node_modules/ajv": {
1560
  "version": "6.12.6",
1561
  "resolved": "https://registry.npmjs.org/ajv/-/ajv-6.12.6.tgz",
 
1691
  "node_modules/aws4fetch": {
1692
  "version": "1.0.17",
1693
  "resolved": "https://registry.npmjs.org/aws4fetch/-/aws4fetch-1.0.17.tgz",
1694
+ "integrity": "sha512-4IbOvsxqxeOSxI4oA+8xEO8SzBMVlzbSTgGy/EF83rHnQ/aKtP6Sc6YV/k0oiW0mqrcxuThlbDosnvetGOuO+g==",
1695
+ "optional": true
1696
  },
1697
  "node_modules/axobject-query": {
1698
  "version": "3.2.1",
 
1713
  "resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-1.0.2.tgz",
1714
  "integrity": "sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw=="
1715
  },
1716
+ "node_modules/base-64": {
1717
+ "version": "0.1.0",
1718
+ "resolved": "https://registry.npmjs.org/base-64/-/base-64-0.1.0.tgz",
1719
+ "integrity": "sha512-Y5gU45svrR5tI2Vt/X9GPd3L0HNIKzGu202EjxrXMpuc2V2CiKgemAbUUsqYmZJvPtCXoUKjNZwBJzsNScUbXA==",
1720
+ "optional": true
1721
+ },
1722
  "node_modules/base64-js": {
1723
  "version": "1.5.1",
1724
  "resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz",
 
1968
  "url": "https://github.com/chalk/chalk?sponsor=1"
1969
  }
1970
  },
1971
+ "node_modules/charenc": {
1972
+ "version": "0.0.2",
1973
+ "resolved": "https://registry.npmjs.org/charenc/-/charenc-0.0.2.tgz",
1974
+ "integrity": "sha512-yrLQ/yVUFXkzg7EDQsPieE/53+0RlaWTs+wBrvW36cyilJ2SaDWfl4Yj7MtLTXleV9uEKefbAGUPv2/iWSooRA==",
1975
+ "optional": true,
1976
+ "engines": {
1977
+ "node": "*"
1978
+ }
1979
+ },
1980
  "node_modules/check-error": {
1981
  "version": "1.0.2",
1982
  "resolved": "https://registry.npmjs.org/check-error/-/check-error-1.0.2.tgz",
 
2165
  "node": ">= 8"
2166
  }
2167
  },
2168
+ "node_modules/crypt": {
2169
+ "version": "0.0.2",
2170
+ "resolved": "https://registry.npmjs.org/crypt/-/crypt-0.0.2.tgz",
2171
+ "integrity": "sha512-mCxBlsHFYh9C+HVpiEacem8FEBnMXgU9gy4zmNC+SXAZNB/1idgp/aulFJ4FgCi7GPEVbfyng092GqL2k2rmow==",
2172
+ "optional": true,
2173
+ "engines": {
2174
+ "node": "*"
2175
+ }
2176
+ },
2177
  "node_modules/css-tree": {
2178
  "version": "2.3.1",
2179
  "resolved": "https://registry.npmjs.org/css-tree/-/css-tree-2.3.1.tgz",
 
2393
  "node": ">=0.3.1"
2394
  }
2395
  },
2396
+ "node_modules/digest-fetch": {
2397
+ "version": "1.3.0",
2398
+ "resolved": "https://registry.npmjs.org/digest-fetch/-/digest-fetch-1.3.0.tgz",
2399
+ "integrity": "sha512-CGJuv6iKNM7QyZlM2T3sPAdZWd/p9zQiRNS9G+9COUCwzWFTs0Xp8NF5iePx7wtvhDykReiRRrSeNb4oMmB8lA==",
2400
+ "optional": true,
2401
+ "dependencies": {
2402
+ "base-64": "^0.1.0",
2403
+ "md5": "^2.3.0"
2404
+ }
2405
+ },
2406
  "node_modules/dir-glob": {
2407
  "version": "3.0.1",
2408
  "resolved": "https://registry.npmjs.org/dir-glob/-/dir-glob-3.0.1.tgz",
 
2755
  "node": ">=0.10.0"
2756
  }
2757
  },
2758
+ "node_modules/event-target-shim": {
2759
+ "version": "5.0.1",
2760
+ "resolved": "https://registry.npmjs.org/event-target-shim/-/event-target-shim-5.0.1.tgz",
2761
+ "integrity": "sha512-i/2XbnSz/uxRCU6+NdVJgKWDTM427+MqYbkQzD321DuCQJUqOuJKIA0IM2+W2xtYHdKOmZ4dR6fExsd4SXL+WQ==",
2762
+ "optional": true,
2763
+ "engines": {
2764
+ "node": ">=6"
2765
+ }
2766
+ },
2767
  "node_modules/execa": {
2768
  "version": "5.1.1",
2769
  "resolved": "https://registry.npmjs.org/execa/-/execa-5.1.1.tgz",
 
2934
  "node": ">= 6"
2935
  }
2936
  },
2937
+ "node_modules/form-data-encoder": {
2938
+ "version": "1.7.2",
2939
+ "resolved": "https://registry.npmjs.org/form-data-encoder/-/form-data-encoder-1.7.2.tgz",
2940
+ "integrity": "sha512-qfqtYan3rxrnCk1VYaA4H+Ms9xdpPqvLZa6xmMgFvhO32x7/3J/ExcTd6qpxM0vH2GdMI+poehyBZvqfMTto8A==",
2941
+ "optional": true
2942
+ },
2943
+ "node_modules/formdata-node": {
2944
+ "version": "4.4.1",
2945
+ "resolved": "https://registry.npmjs.org/formdata-node/-/formdata-node-4.4.1.tgz",
2946
+ "integrity": "sha512-0iirZp3uVDjVGt9p49aTaqjk84TrglENEDuqfdlZQ1roC9CWlPk6Avf8EEnZNcAqPonwkG35x4n3ww/1THYAeQ==",
2947
+ "optional": true,
2948
+ "dependencies": {
2949
+ "node-domexception": "1.0.0",
2950
+ "web-streams-polyfill": "4.0.0-beta.3"
2951
+ },
2952
+ "engines": {
2953
+ "node": ">= 12.20"
2954
+ }
2955
+ },
2956
  "node_modules/fraction.js": {
2957
  "version": "4.2.0",
2958
  "resolved": "https://registry.npmjs.org/fraction.js/-/fraction.js-4.2.0.tgz",
 
3218
  "node": ">=10.17.0"
3219
  }
3220
  },
3221
+ "node_modules/humanize-ms": {
3222
+ "version": "1.2.1",
3223
+ "resolved": "https://registry.npmjs.org/humanize-ms/-/humanize-ms-1.2.1.tgz",
3224
+ "integrity": "sha512-Fl70vYtsAFb/C06PTS9dZBo7ihau+Tu/DNCk/OyHhea07S+aeMWpFFkUaXRa8fI+ScZbEI8dfSxwY7gxZ9SAVQ==",
3225
+ "optional": true,
3226
+ "dependencies": {
3227
+ "ms": "^2.0.0"
3228
+ }
3229
+ },
3230
  "node_modules/iconv-lite": {
3231
  "version": "0.6.3",
3232
  "resolved": "https://registry.npmjs.org/iconv-lite/-/iconv-lite-0.6.3.tgz",
 
3336
  "node": ">=8"
3337
  }
3338
  },
3339
+ "node_modules/is-buffer": {
3340
+ "version": "1.1.6",
3341
+ "resolved": "https://registry.npmjs.org/is-buffer/-/is-buffer-1.1.6.tgz",
3342
+ "integrity": "sha512-NcdALwpXkTm5Zvvbk7owOUSvVvBKDgKP5/ewfXEznmQFfs4ZRmanOeKBTjRVjka3QFoN6XJ+9F3USqfHqTaU5w==",
3343
+ "optional": true
3344
+ },
3345
  "node_modules/is-builtin-module": {
3346
  "version": "3.2.1",
3347
  "resolved": "https://registry.npmjs.org/is-builtin-module/-/is-builtin-module-3.2.1.tgz",
 
3777
  "marked": ">=4 <10"
3778
  }
3779
  },
3780
+ "node_modules/md5": {
3781
+ "version": "2.3.0",
3782
+ "resolved": "https://registry.npmjs.org/md5/-/md5-2.3.0.tgz",
3783
+ "integrity": "sha512-T1GITYmFaKuO91vxyoQMFETst+O71VUPEU3ze5GNzDm0OWdP8v1ziTaAEPUr/3kLsY3Sftgz242A1SetQiDL7g==",
3784
+ "optional": true,
3785
+ "dependencies": {
3786
+ "charenc": "0.0.2",
3787
+ "crypt": "0.0.2",
3788
+ "is-buffer": "~1.1.6"
3789
+ }
3790
+ },
3791
  "node_modules/md5-hex": {
3792
  "version": "3.0.1",
3793
  "resolved": "https://registry.npmjs.org/md5-hex/-/md5-hex-3.0.1.tgz",
 
4065
  "resolved": "https://registry.npmjs.org/node-addon-api/-/node-addon-api-6.1.0.tgz",
4066
  "integrity": "sha512-+eawOlIgy680F0kBzPUNFhMZGtJ1YmqM6l4+Crf4IkImjYrO/mqPwRMh352g23uIaQKFItcQ64I7KMaJxHgAVA=="
4067
  },
4068
+ "node_modules/node-domexception": {
4069
+ "version": "1.0.0",
4070
+ "resolved": "https://registry.npmjs.org/node-domexception/-/node-domexception-1.0.0.tgz",
4071
+ "integrity": "sha512-/jKZoMpw0F8GRwl4/eLROPA3cfcXtLApP0QzLmUT/HuPCZWyB7IY9ZrMeKw2O/nFIqPQB3PVM9aYm0F312AXDQ==",
4072
+ "funding": [
4073
+ {
4074
+ "type": "github",
4075
+ "url": "https://github.com/sponsors/jimmywarting"
4076
+ },
4077
+ {
4078
+ "type": "github",
4079
+ "url": "https://paypal.me/jimmywarting"
4080
+ }
4081
+ ],
4082
+ "optional": true,
4083
+ "engines": {
4084
+ "node": ">=10.5.0"
4085
+ }
4086
+ },
4087
+ "node_modules/node-fetch": {
4088
+ "version": "2.7.0",
4089
+ "resolved": "https://registry.npmjs.org/node-fetch/-/node-fetch-2.7.0.tgz",
4090
+ "integrity": "sha512-c4FRfUm/dbcWZ7U+1Wq0AwCyFL+3nt2bEw05wfxSz+DWpWsitgmSgYmy2dQdWyKC1694ELPqMs/YzUSNozLt8A==",
4091
+ "optional": true,
4092
+ "dependencies": {
4093
+ "whatwg-url": "^5.0.0"
4094
+ },
4095
+ "engines": {
4096
+ "node": "4.x || >=6.0.0"
4097
+ },
4098
+ "peerDependencies": {
4099
+ "encoding": "^0.1.0"
4100
+ },
4101
+ "peerDependenciesMeta": {
4102
+ "encoding": {
4103
+ "optional": true
4104
+ }
4105
+ }
4106
+ },
4107
+ "node_modules/node-fetch/node_modules/tr46": {
4108
+ "version": "0.0.3",
4109
+ "resolved": "https://registry.npmjs.org/tr46/-/tr46-0.0.3.tgz",
4110
+ "integrity": "sha512-N3WMsuqV66lT30CrXNbEjx4GEwlow3v6rr4mCcv6prnfwhS01rkgyFdjPNBYd9br7LpXV1+Emh01fHnq2Gdgrw==",
4111
+ "optional": true
4112
+ },
4113
+ "node_modules/node-fetch/node_modules/webidl-conversions": {
4114
+ "version": "3.0.1",
4115
+ "resolved": "https://registry.npmjs.org/webidl-conversions/-/webidl-conversions-3.0.1.tgz",
4116
+ "integrity": "sha512-2JAn3z8AR6rjK8Sm8orRC0h/bcl/DqL7tRPdGZ4I1CjdF+EaMLmYxBHyXuKL849eucPFhvBoxMsflfOb8kxaeQ==",
4117
+ "optional": true
4118
+ },
4119
+ "node_modules/node-fetch/node_modules/whatwg-url": {
4120
+ "version": "5.0.0",
4121
+ "resolved": "https://registry.npmjs.org/whatwg-url/-/whatwg-url-5.0.0.tgz",
4122
+ "integrity": "sha512-saE57nupxk6v3HY35+jzBwYa0rKSy0XR8JSxZPwgLr7ys0IBzhGviA1/TUGJLmSVqs8pb9AnvICXEuOHLprYTw==",
4123
+ "optional": true,
4124
+ "dependencies": {
4125
+ "tr46": "~0.0.3",
4126
+ "webidl-conversions": "^3.0.0"
4127
+ }
4128
+ },
4129
  "node_modules/node-gyp-build": {
4130
  "version": "4.6.1",
4131
  "resolved": "https://registry.npmjs.org/node-gyp-build/-/node-gyp-build-4.6.1.tgz",
 
4276
  "platform": "^1.3.6"
4277
  }
4278
  },
4279
+ "node_modules/openai": {
4280
+ "version": "4.14.2",
4281
+ "resolved": "https://registry.npmjs.org/openai/-/openai-4.14.2.tgz",
4282
+ "integrity": "sha512-JGlm7mMC7J+cyQZnQMOH7daD9cBqqWqLtlBsejElEkgoehPrYfdyxSxIGICz5xk4YimbwI5FlLATSVojLtCKXQ==",
4283
+ "optional": true,
4284
+ "dependencies": {
4285
+ "@types/node": "^18.11.18",
4286
+ "@types/node-fetch": "^2.6.4",
4287
+ "abort-controller": "^3.0.0",
4288
+ "agentkeepalive": "^4.2.1",
4289
+ "digest-fetch": "^1.3.0",
4290
+ "form-data-encoder": "1.7.2",
4291
+ "formdata-node": "^4.3.2",
4292
+ "node-fetch": "^2.6.7",
4293
+ "web-streams-polyfill": "^3.2.1"
4294
+ },
4295
+ "bin": {
4296
+ "openai": "bin/cli"
4297
+ }
4298
+ },
4299
+ "node_modules/openai/node_modules/web-streams-polyfill": {
4300
+ "version": "3.2.1",
4301
+ "resolved": "https://registry.npmjs.org/web-streams-polyfill/-/web-streams-polyfill-3.2.1.tgz",
4302
+ "integrity": "sha512-e0MO3wdXWKrLbL0DgGnUV7WHVuw9OUvL4hjgnPkIeEvESk74gAITi5G606JtZPp39cd8HA9VQzCIvA49LpPN5Q==",
4303
+ "optional": true,
4304
+ "engines": {
4305
+ "node": ">= 8"
4306
+ }
4307
+ },
4308
  "node_modules/openid-client": {
4309
  "version": "5.4.2",
4310
  "resolved": "https://registry.npmjs.org/openid-client/-/openid-client-5.4.2.tgz",
 
6476
  "node": ">=14"
6477
  }
6478
  },
6479
+ "node_modules/web-streams-polyfill": {
6480
+ "version": "4.0.0-beta.3",
6481
+ "resolved": "https://registry.npmjs.org/web-streams-polyfill/-/web-streams-polyfill-4.0.0-beta.3.tgz",
6482
+ "integrity": "sha512-QW95TCTaHmsYfHDybGMwO5IJIM93I/6vTRk+daHTWFPhwh+C8Cg7j7XyKrwrj8Ib6vYXe0ocYNrmzY4xAAN6ug==",
6483
+ "optional": true,
6484
+ "engines": {
6485
+ "node": ">= 14"
6486
+ }
6487
+ },
6488
  "node_modules/webidl-conversions": {
6489
  "version": "7.0.0",
6490
  "resolved": "https://registry.npmjs.org/webidl-conversions/-/webidl-conversions-7.0.0.tgz",
package.json CHANGED
@@ -48,7 +48,6 @@
48
  "@huggingface/inference": "^2.6.3",
49
  "@xenova/transformers": "^2.6.0",
50
  "autoprefixer": "^10.4.14",
51
- "aws4fetch": "^1.0.17",
52
  "date-fns": "^2.29.3",
53
  "dotenv": "^16.0.3",
54
  "handlebars": "^4.7.8",
@@ -64,5 +63,9 @@
64
  "tailwind-scrollbar": "^3.0.0",
65
  "tailwindcss": "^3.3.1",
66
  "zod": "^3.22.3"
 
 
 
 
67
  }
68
  }
 
48
  "@huggingface/inference": "^2.6.3",
49
  "@xenova/transformers": "^2.6.0",
50
  "autoprefixer": "^10.4.14",
 
51
  "date-fns": "^2.29.3",
52
  "dotenv": "^16.0.3",
53
  "handlebars": "^4.7.8",
 
63
  "tailwind-scrollbar": "^3.0.0",
64
  "tailwindcss": "^3.3.1",
65
  "zod": "^3.22.3"
66
+ },
67
+ "optionalDependencies": {
68
+ "aws4fetch": "^1.0.17",
69
+ "openai": "^4.14.2"
70
  }
71
  }
src/lib/server/endpoints/aws/endpointAws.ts ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { buildPrompt } from "$lib/buildPrompt";
2
+ import { textGenerationStream } from "@huggingface/inference";
3
+ import { z } from "zod";
4
+ import type { Endpoint } from "../endpoints";
5
+
6
+ export const endpointAwsParametersSchema = z.object({
7
+ weight: z.number().int().positive().default(1),
8
+ model: z.any(),
9
+ type: z.literal("aws"),
10
+ url: z.string().url(),
11
+ accessKey: z.string().min(1),
12
+ secretKey: z.string().min(1),
13
+ sessionToken: z.string().optional(),
14
+ service: z.union([z.literal("sagemaker"), z.literal("lambda")]).default("sagemaker"),
15
+ region: z.string().optional(),
16
+ });
17
+
18
+ export async function endpointAws({
19
+ url,
20
+ accessKey,
21
+ secretKey,
22
+ sessionToken,
23
+ model,
24
+ region,
25
+ service,
26
+ }: z.infer<typeof endpointAwsParametersSchema>): Promise<Endpoint> {
27
+ let AwsClient;
28
+ try {
29
+ AwsClient = (await import("aws4fetch")).AwsClient;
30
+ } catch (e) {
31
+ throw new Error("Failed to import aws4fetch");
32
+ }
33
+
34
+ const aws = new AwsClient({
35
+ accessKeyId: accessKey,
36
+ secretAccessKey: secretKey,
37
+ sessionToken,
38
+ service,
39
+ region,
40
+ });
41
+
42
+ return async ({ conversation }) => {
43
+ const prompt = await buildPrompt({
44
+ messages: conversation.messages,
45
+ webSearch: conversation.messages[conversation.messages.length - 1].webSearch,
46
+ preprompt: conversation.preprompt,
47
+ model,
48
+ });
49
+
50
+ return textGenerationStream(
51
+ {
52
+ parameters: { ...model.parameters, return_full_text: false },
53
+ model: url,
54
+ inputs: prompt,
55
+ },
56
+ {
57
+ use_cache: false,
58
+ fetch: aws.fetch.bind(aws) as typeof fetch,
59
+ }
60
+ );
61
+ };
62
+ }
63
+
64
+ export default endpointAws;
src/lib/server/endpoints/endpoints.ts ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import type { Conversation } from "$lib/types/Conversation";
2
+ import type { TextGenerationStreamOutput } from "@huggingface/inference";
3
+ import { endpointTgi, endpointTgiParametersSchema } from "./tgi/endpointTgi";
4
+ import { z } from "zod";
5
+ import endpointAws, { endpointAwsParametersSchema } from "./aws/endpointAws";
6
+ import { endpointOAIParametersSchema, endpointOai } from "./openai/endpointOai";
7
+ import endpointLlamacpp, { endpointLlamacppParametersSchema } from "./llamacpp/endpointLlamacpp";
8
+
9
+ // parameters passed when generating text
10
+ interface EndpointParameters {
11
+ conversation: {
12
+ messages: Omit<Conversation["messages"][0], "id">[];
13
+ preprompt?: Conversation["preprompt"];
14
+ };
15
+ }
16
+
17
+ interface CommonEndpoint {
18
+ weight: number;
19
+ }
20
+ // type signature for the endpoint
21
+ export type Endpoint = (
22
+ params: EndpointParameters
23
+ ) => Promise<AsyncGenerator<TextGenerationStreamOutput, void, void>>;
24
+
25
+ // generator function that takes in parameters for defining the endpoint and return the endpoint
26
+ export type EndpointGenerator<T extends CommonEndpoint> = (parameters: T) => Endpoint;
27
+
28
+ // list of all endpoint generators
29
+ export const endpoints = {
30
+ tgi: endpointTgi,
31
+ sagemaker: endpointAws,
32
+ openai: endpointOai,
33
+ llamacpp: endpointLlamacpp,
34
+ };
35
+
36
+ export const endpointSchema = z.discriminatedUnion("type", [
37
+ endpointAwsParametersSchema,
38
+ endpointOAIParametersSchema,
39
+ endpointTgiParametersSchema,
40
+ endpointLlamacppParametersSchema,
41
+ ]);
42
+ export default endpoints;
src/lib/server/endpoints/llamacpp/endpointLlamacpp.ts ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { HF_ACCESS_TOKEN } from "$env/static/private";
2
+ import { buildPrompt } from "$lib/buildPrompt";
3
+ import type { TextGenerationStreamOutput } from "@huggingface/inference";
4
+ import type { Endpoint } from "../endpoints";
5
+ import { z } from "zod";
6
+
7
+ export const endpointLlamacppParametersSchema = z.object({
8
+ weight: z.number().int().positive().default(1),
9
+ model: z.any(),
10
+ type: z.literal("llamacpp"),
11
+ url: z.string().url(),
12
+ accessToken: z.string().min(1).default(HF_ACCESS_TOKEN),
13
+ });
14
+
15
+ export function endpointLlamacpp({
16
+ url,
17
+ model,
18
+ }: z.infer<typeof endpointLlamacppParametersSchema>): Endpoint {
19
+ return async ({ conversation }) => {
20
+ const prompt = await buildPrompt({
21
+ messages: conversation.messages,
22
+ webSearch: conversation.messages[conversation.messages.length - 1].webSearch,
23
+ preprompt: conversation.preprompt,
24
+ model,
25
+ });
26
+
27
+ const r = await fetch(`${url}/completion`, {
28
+ method: "POST",
29
+ headers: {
30
+ "Content-Type": "application/json",
31
+ },
32
+ body: JSON.stringify({
33
+ prompt,
34
+ stream: true,
35
+ temperature: model.parameters.temperature,
36
+ top_p: model.parameters.top_p,
37
+ top_k: model.parameters.top_k,
38
+ stop: model.parameters.stop,
39
+ repeat_penalty: model.parameters.repetition_penalty,
40
+ n_predict: model.parameters.max_new_tokens,
41
+ }),
42
+ });
43
+
44
+ if (!r.ok) {
45
+ throw new Error(`Failed to generate text: ${await r.text()}`);
46
+ }
47
+
48
+ const encoder = new TextDecoderStream();
49
+ const reader = r.body?.pipeThrough(encoder).getReader();
50
+
51
+ return (async function* () {
52
+ let stop = false;
53
+ let generatedText = "";
54
+ let tokenId = 0;
55
+ while (!stop) {
56
+ // read the stream and log the outputs to console
57
+ const out = (await reader?.read()) ?? { done: false, value: undefined };
58
+ // we read, if it's done we cancel
59
+ if (out.done) {
60
+ reader?.cancel();
61
+ return;
62
+ }
63
+
64
+ if (!out.value) {
65
+ return;
66
+ }
67
+
68
+ if (out.value.startsWith("data: ")) {
69
+ let data = null;
70
+ try {
71
+ data = JSON.parse(out.value.slice(6));
72
+ } catch (e) {
73
+ return;
74
+ }
75
+ if (data.content || data.stop) {
76
+ generatedText += data.content;
77
+ const output: TextGenerationStreamOutput = {
78
+ token: {
79
+ id: tokenId++,
80
+ text: data.content ?? "",
81
+ logprob: 0,
82
+ special: false,
83
+ },
84
+ generated_text: data.stop ? generatedText : null,
85
+ details: null,
86
+ };
87
+ if (data.stop) {
88
+ stop = true;
89
+ reader?.cancel();
90
+ }
91
+ yield output;
92
+ // take the data.content value and yield it
93
+ }
94
+ }
95
+ }
96
+ })();
97
+ };
98
+ }
99
+
100
+ export default endpointLlamacpp;
src/lib/server/endpoints/openai/endpointOai.ts ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { z } from "zod";
2
+ import { openAICompletionToTextGenerationStream } from "./openAICompletionToTextGenerationStream";
3
+ import { openAIChatToTextGenerationStream } from "./openAIChatToTextGenerationStream";
4
+ import { buildPrompt } from "$lib/buildPrompt";
5
+ import { OPENAI_API_KEY } from "$env/static/private";
6
+ import type { Endpoint } from "../endpoints";
7
+
8
+ export const endpointOAIParametersSchema = z.object({
9
+ weight: z.number().int().positive().default(1),
10
+ model: z.any(),
11
+ type: z.literal("openai"),
12
+ baseURL: z.string().url().default("https://api.openai.com/v1"),
13
+ apiKey: z.string().default(OPENAI_API_KEY ?? "sk-"),
14
+ completion: z
15
+ .union([z.literal("completions"), z.literal("chat_completions")])
16
+ .default("chat_completions"),
17
+ });
18
+
19
+ export async function endpointOai({
20
+ baseURL,
21
+ apiKey,
22
+ completion,
23
+ model,
24
+ }: z.infer<typeof endpointOAIParametersSchema>): Promise<Endpoint> {
25
+ let OpenAI;
26
+ try {
27
+ OpenAI = (await import("openai")).OpenAI;
28
+ } catch (e) {
29
+ throw new Error("Failed to import OpenAI", { cause: e });
30
+ }
31
+
32
+ const openai = new OpenAI({
33
+ apiKey: apiKey ?? "sk-",
34
+ baseURL: baseURL,
35
+ });
36
+
37
+ if (completion === "completions") {
38
+ return async ({ conversation }) => {
39
+ return openAICompletionToTextGenerationStream(
40
+ await openai.completions.create({
41
+ model: model.id ?? model.name,
42
+ prompt: await buildPrompt({
43
+ messages: conversation.messages,
44
+ webSearch: conversation.messages[conversation.messages.length - 1].webSearch,
45
+ preprompt: conversation.preprompt,
46
+ model,
47
+ }),
48
+ stream: true,
49
+ max_tokens: model.parameters?.max_new_tokens,
50
+ stop: model.parameters?.stop,
51
+ temperature: model.parameters?.temperature,
52
+ top_p: model.parameters?.top_p,
53
+ frequency_penalty: model.parameters?.repetition_penalty,
54
+ })
55
+ );
56
+ };
57
+ } else if (completion === "chat_completions") {
58
+ return async ({ conversation }) => {
59
+ const messages = conversation.messages.map((message) => ({
60
+ role: message.from,
61
+ content: message.content,
62
+ }));
63
+
64
+ return openAIChatToTextGenerationStream(
65
+ await openai.chat.completions.create({
66
+ model: model.id ?? model.name,
67
+ messages: conversation.preprompt
68
+ ? [{ role: "system", content: conversation.preprompt }, ...messages]
69
+ : messages,
70
+ stream: true,
71
+ max_tokens: model.parameters?.max_new_tokens,
72
+ stop: model.parameters?.stop,
73
+ temperature: model.parameters?.temperature,
74
+ top_p: model.parameters?.top_p,
75
+ frequency_penalty: model.parameters?.repetition_penalty,
76
+ })
77
+ );
78
+ };
79
+ } else {
80
+ throw new Error("Invalid completion type");
81
+ }
82
+ }
src/lib/server/endpoints/openai/openAIChatToTextGenerationStream.ts ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import type { TextGenerationStreamOutput } from "@huggingface/inference";
2
+ import type OpenAI from "openai";
3
+ import type { Stream } from "openai/streaming";
4
+
5
+ /**
6
+ * Transform a stream of OpenAI.Chat.ChatCompletion into a stream of TextGenerationStreamOutput
7
+ */
8
+ export async function* openAIChatToTextGenerationStream(
9
+ completionStream: Stream<OpenAI.Chat.Completions.ChatCompletionChunk>
10
+ ) {
11
+ let generatedText = "";
12
+ let tokenId = 0;
13
+ for await (const completion of completionStream) {
14
+ const { choices } = completion;
15
+ const content = choices[0]?.delta?.content ?? "";
16
+ const last = choices[0]?.finish_reason === "stop";
17
+ if (content) {
18
+ generatedText = generatedText + content;
19
+ }
20
+ const output: TextGenerationStreamOutput = {
21
+ token: {
22
+ id: tokenId++,
23
+ text: content ?? "",
24
+ logprob: 0,
25
+ special: false,
26
+ },
27
+ generated_text: last ? generatedText : null,
28
+ details: null,
29
+ };
30
+ yield output;
31
+ }
32
+ }
src/lib/server/endpoints/openai/openAICompletionToTextGenerationStream.ts ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import type { TextGenerationStreamOutput } from "@huggingface/inference";
2
+ import type OpenAI from "openai";
3
+ import type { Stream } from "openai/streaming";
4
+
5
+ /**
6
+ * Transform a stream of OpenAI.Completions.Completion into a stream of TextGenerationStreamOutput
7
+ */
8
+ export async function* openAICompletionToTextGenerationStream(
9
+ completionStream: Stream<OpenAI.Completions.Completion>
10
+ ) {
11
+ let generatedText = "";
12
+ let tokenId = 0;
13
+ for await (const completion of completionStream) {
14
+ const { choices } = completion;
15
+ const text = choices[0]?.text ?? "";
16
+ const last = choices[0]?.finish_reason === "stop";
17
+ if (text) {
18
+ generatedText = generatedText + text;
19
+ }
20
+ const output: TextGenerationStreamOutput = {
21
+ token: {
22
+ id: tokenId++,
23
+ text,
24
+ logprob: 0,
25
+ special: false,
26
+ },
27
+ generated_text: last ? generatedText : null,
28
+ details: null,
29
+ };
30
+ yield output;
31
+ }
32
+ }
src/lib/server/endpoints/tgi/endpointTgi.ts ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { HF_ACCESS_TOKEN } from "$env/static/private";
2
+ import { buildPrompt } from "$lib/buildPrompt";
3
+ import { textGenerationStream } from "@huggingface/inference";
4
+ import type { Endpoint } from "../endpoints";
5
+ import { z } from "zod";
6
+
7
+ export const endpointTgiParametersSchema = z.object({
8
+ weight: z.number().int().positive().default(1),
9
+ model: z.any(),
10
+ type: z.literal("tgi"),
11
+ url: z.string().url(),
12
+ accessToken: z.string().min(1).default(HF_ACCESS_TOKEN),
13
+ });
14
+
15
+ export function endpointTgi({
16
+ url,
17
+ accessToken,
18
+ model,
19
+ }: z.infer<typeof endpointTgiParametersSchema>): Endpoint {
20
+ return async ({ conversation }) => {
21
+ const prompt = await buildPrompt({
22
+ messages: conversation.messages,
23
+ webSearch: conversation.messages[conversation.messages.length - 1].webSearch,
24
+ preprompt: conversation.preprompt,
25
+ model,
26
+ });
27
+
28
+ return textGenerationStream({
29
+ parameters: { ...model.parameters, return_full_text: false },
30
+ model: url,
31
+ inputs: prompt,
32
+ accessToken,
33
+ });
34
+ };
35
+ }
36
+
37
+ export default endpointTgi;
src/lib/server/generateFromDefaultEndpoint.ts CHANGED
@@ -1,110 +1,28 @@
1
  import { smallModel } from "$lib/server/models";
2
- import { modelEndpoint } from "./modelEndpoint";
3
- import { trimSuffix } from "$lib/utils/trimSuffix";
4
- import { trimPrefix } from "$lib/utils/trimPrefix";
5
- import { PUBLIC_SEP_TOKEN } from "$lib/constants/publicSepToken";
6
- import { AwsClient } from "aws4fetch";
7
-
8
- interface Parameters {
9
- temperature: number;
10
- truncate: number;
11
- max_new_tokens: number;
12
- stop: string[];
13
- }
14
- export async function generateFromDefaultEndpoint(
15
- prompt: string,
16
- parameters?: Partial<Parameters>
17
- ): Promise<string> {
18
- const newParameters = {
19
- ...smallModel.parameters,
20
- ...parameters,
21
- return_full_text: false,
22
- wait_for_model: true,
23
- };
24
-
25
- const randomEndpoint = modelEndpoint(smallModel);
26
-
27
- const abortController = new AbortController();
28
-
29
- let resp: Response;
30
-
31
- if (randomEndpoint.host === "sagemaker") {
32
- const requestParams = JSON.stringify({
33
- parameters: newParameters,
34
- inputs: prompt,
35
- });
36
-
37
- const aws = new AwsClient({
38
- accessKeyId: randomEndpoint.accessKey,
39
- secretAccessKey: randomEndpoint.secretKey,
40
- sessionToken: randomEndpoint.sessionToken,
41
- service: "sagemaker",
42
- });
43
-
44
- resp = await aws.fetch(randomEndpoint.url, {
45
- method: "POST",
46
- body: requestParams,
47
- signal: abortController.signal,
48
- headers: {
49
- "Content-Type": "application/json",
50
- },
51
- });
52
- } else {
53
- resp = await fetch(randomEndpoint.url, {
54
- headers: {
55
- "Content-Type": "application/json",
56
- Authorization: randomEndpoint.authorization,
57
- },
58
- method: "POST",
59
- body: JSON.stringify({
60
- parameters: newParameters,
61
- inputs: prompt,
62
- }),
63
- signal: abortController.signal,
64
- });
65
- }
66
-
67
- if (!resp.ok) {
68
- throw new Error(await resp.text());
69
- }
70
-
71
- if (!resp.body) {
72
- throw new Error("Body is empty");
73
- }
74
-
75
- const decoder = new TextDecoder();
76
- const reader = resp.body.getReader();
77
-
78
- let isDone = false;
79
- let result = "";
80
-
81
- while (!isDone) {
82
- const { done, value } = await reader.read();
83
-
84
- isDone = done;
85
- result += decoder.decode(value, { stream: true }); // Convert current chunk to text
86
- }
87
-
88
- // Close the reader when done
89
- reader.releaseLock();
90
-
91
- let results;
92
- if (result.startsWith("data:")) {
93
- results = [JSON.parse(result.split("data:")?.pop() ?? "")];
94
- } else {
95
- results = JSON.parse(result);
96
- }
97
-
98
- let generated_text = trimSuffix(
99
- trimPrefix(trimPrefix(results[0].generated_text, "<|startoftext|>"), prompt),
100
- PUBLIC_SEP_TOKEN
101
- ).trimEnd();
102
-
103
- for (const stop of [...(newParameters?.stop ?? []), "<|endoftext|>"]) {
104
- if (generated_text.endsWith(stop)) {
105
- generated_text = generated_text.slice(0, -stop.length).trimEnd();
106
  }
107
  }
108
-
109
- return generated_text;
110
  }
 
1
  import { smallModel } from "$lib/server/models";
2
+ import type { Conversation } from "$lib/types/Conversation";
3
+
4
+ export async function generateFromDefaultEndpoint({
5
+ messages,
6
+ preprompt,
7
+ }: {
8
+ messages: Omit<Conversation["messages"][0], "id">[];
9
+ preprompt?: string;
10
+ }): Promise<string> {
11
+ const endpoint = await smallModel.getEndpoint();
12
+
13
+ const tokenStream = await endpoint({ conversation: { messages, preprompt } });
14
+
15
+ for await (const output of tokenStream) {
16
+ // if not generated_text is here it means the generation is not done
17
+ if (output.generated_text) {
18
+ let generated_text = output.generated_text;
19
+ for (const stop of [...(smallModel.parameters?.stop ?? []), "<|endoftext|>"]) {
20
+ if (generated_text.endsWith(stop)) {
21
+ generated_text = generated_text.slice(0, -stop.length).trimEnd();
22
+ }
23
+ }
24
+ return generated_text;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  }
26
  }
27
+ throw new Error("Generation failed");
 
28
  }
src/lib/server/modelEndpoint.ts DELETED
@@ -1,50 +0,0 @@
1
- import {
2
- HF_ACCESS_TOKEN,
3
- HF_API_ROOT,
4
- USE_CLIENT_CERTIFICATE,
5
- CERT_PATH,
6
- KEY_PATH,
7
- CA_PATH,
8
- CLIENT_KEY_PASSWORD,
9
- REJECT_UNAUTHORIZED,
10
- } from "$env/static/private";
11
- import { sum } from "$lib/utils/sum";
12
- import type { BackendModel, Endpoint } from "./models";
13
-
14
- import { loadClientCertificates } from "$lib/utils/loadClientCerts";
15
-
16
- if (USE_CLIENT_CERTIFICATE === "true") {
17
- loadClientCertificates(
18
- CERT_PATH,
19
- KEY_PATH,
20
- CA_PATH,
21
- CLIENT_KEY_PASSWORD,
22
- REJECT_UNAUTHORIZED === "true"
23
- );
24
- }
25
-
26
- /**
27
- * Find a random load-balanced endpoint
28
- */
29
- export function modelEndpoint(model: BackendModel): Endpoint {
30
- if (!model.endpoints) {
31
- return {
32
- host: "tgi",
33
- url: `${HF_API_ROOT}/${model.name}`,
34
- authorization: `Bearer ${HF_ACCESS_TOKEN}`,
35
- weight: 1,
36
- };
37
- }
38
- const endpoints = model.endpoints;
39
- const totalWeight = sum(endpoints.map((e) => e.weight));
40
-
41
- let random = Math.random() * totalWeight;
42
- for (const endpoint of endpoints) {
43
- if (random < endpoint.weight) {
44
- return endpoint;
45
- }
46
- random -= endpoint.weight;
47
- }
48
-
49
- throw new Error("Invalid config, no endpoint found");
50
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/lib/server/models.ts CHANGED
@@ -1,42 +1,13 @@
1
- import { HF_ACCESS_TOKEN, MODELS, OLD_MODELS, TASK_MODEL } from "$env/static/private";
2
  import type { ChatTemplateInput } from "$lib/types/Template";
3
  import { compileTemplate } from "$lib/utils/template";
4
  import { z } from "zod";
 
 
 
5
 
6
  type Optional<T, K extends keyof T> = Pick<Partial<T>, K> & Omit<T, K>;
7
 
8
- const sagemakerEndpoint = z.object({
9
- host: z.literal("sagemaker"),
10
- url: z.string().url(),
11
- accessKey: z.string().min(1),
12
- secretKey: z.string().min(1),
13
- sessionToken: z.string().optional(),
14
- });
15
-
16
- const tgiEndpoint = z.object({
17
- host: z.union([z.literal("tgi"), z.undefined()]),
18
- url: z.string().url(),
19
- authorization: z.string().min(1).default(`Bearer ${HF_ACCESS_TOKEN}`),
20
- });
21
-
22
- const commonEndpoint = z.object({
23
- weight: z.number().int().positive().default(1),
24
- });
25
-
26
- const endpoint = z.lazy(() =>
27
- z.union([sagemakerEndpoint.merge(commonEndpoint), tgiEndpoint.merge(commonEndpoint)])
28
- );
29
-
30
- const combinedEndpoint = endpoint.transform((data) => {
31
- if (data.host === "tgi" || data.host === undefined) {
32
- return tgiEndpoint.merge(commonEndpoint).parse(data);
33
- } else if (data.host === "sagemaker") {
34
- return sagemakerEndpoint.merge(commonEndpoint).parse(data);
35
- } else {
36
- throw new Error(`Invalid host: ${data.host}`);
37
- }
38
- });
39
-
40
  const modelConfig = z.object({
41
  /** Used as an identifier in DB */
42
  id: z.string().optional(),
@@ -73,13 +44,16 @@ const modelConfig = z.object({
73
  })
74
  )
75
  .optional(),
76
- endpoints: z.array(combinedEndpoint).optional(),
77
  parameters: z
78
  .object({
79
  temperature: z.number().min(0).max(1),
80
  truncate: z.number().int().positive(),
81
  max_new_tokens: z.number().int().positive(),
82
  stop: z.array(z.string()).optional(),
 
 
 
83
  })
84
  .passthrough()
85
  .optional(),
@@ -98,7 +72,48 @@ const processModel = async (m: z.infer<typeof modelConfig>) => ({
98
  parameters: { ...m.parameters, stop_sequences: m.parameters?.stop },
99
  });
100
 
101
- export const models = await Promise.all(modelsRaw.map(processModel));
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
 
103
  // Models that have been deprecated
104
  export const oldModels = OLD_MODELS
@@ -114,18 +129,19 @@ export const oldModels = OLD_MODELS
114
  .map((m) => ({ ...m, id: m.id || m.name, displayName: m.displayName || m.name }))
115
  : [];
116
 
117
- export const defaultModel = models[0];
118
-
119
  export const validateModel = (_models: BackendModel[]) => {
120
  // Zod enum function requires 2 parameters
121
  return z.enum([_models[0].id, ..._models.slice(1).map((m) => m.id)]);
122
  };
123
 
124
  // if `TASK_MODEL` is the name of a model we use it, else we try to parse `TASK_MODEL` as a model config itself
 
125
  export const smallModel = TASK_MODEL
126
- ? models.find((m) => m.name === TASK_MODEL) ||
127
- (await processModel(modelConfig.parse(JSON.parse(TASK_MODEL))))
 
 
 
128
  : defaultModel;
129
 
130
- export type BackendModel = Optional<(typeof models)[0], "preprompt" | "parameters">;
131
- export type Endpoint = z.infer<typeof endpoint>;
 
1
+ import { HF_ACCESS_TOKEN, HF_API_ROOT, MODELS, OLD_MODELS, TASK_MODEL } from "$env/static/private";
2
  import type { ChatTemplateInput } from "$lib/types/Template";
3
  import { compileTemplate } from "$lib/utils/template";
4
  import { z } from "zod";
5
+ import endpoints, { endpointSchema, type Endpoint } from "./endpoints/endpoints";
6
+ import endpointTgi from "./endpoints/tgi/endpointTgi";
7
+ import { sum } from "$lib/utils/sum";
8
 
9
  type Optional<T, K extends keyof T> = Pick<Partial<T>, K> & Omit<T, K>;
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  const modelConfig = z.object({
12
  /** Used as an identifier in DB */
13
  id: z.string().optional(),
 
44
  })
45
  )
46
  .optional(),
47
+ endpoints: z.array(endpointSchema).optional(),
48
  parameters: z
49
  .object({
50
  temperature: z.number().min(0).max(1),
51
  truncate: z.number().int().positive(),
52
  max_new_tokens: z.number().int().positive(),
53
  stop: z.array(z.string()).optional(),
54
+ top_p: z.number().positive().optional(),
55
+ top_k: z.number().positive().optional(),
56
+ repetition_penalty: z.number().min(-2).max(2).optional(),
57
  })
58
  .passthrough()
59
  .optional(),
 
72
  parameters: { ...m.parameters, stop_sequences: m.parameters?.stop },
73
  });
74
 
75
+ const addEndpoint = (m: Awaited<ReturnType<typeof processModel>>) => ({
76
+ ...m,
77
+ getEndpoint: async (): Promise<Endpoint> => {
78
+ if (!m.endpoints) {
79
+ return endpointTgi({
80
+ type: "tgi",
81
+ url: `${HF_API_ROOT}/${m.name}`,
82
+ accessToken: HF_ACCESS_TOKEN,
83
+ weight: 1,
84
+ model: m,
85
+ });
86
+ }
87
+ const totalWeight = sum(m.endpoints.map((e) => e.weight));
88
+
89
+ let random = Math.random() * totalWeight;
90
+
91
+ for (const endpoint of m.endpoints) {
92
+ if (random < endpoint.weight) {
93
+ const args = { ...endpoint, model: m };
94
+ if (args.type === "tgi") {
95
+ return endpoints.tgi(args);
96
+ } else if (args.type === "aws") {
97
+ return await endpoints.sagemaker(args);
98
+ } else if (args.type === "openai") {
99
+ return await endpoints.openai(args);
100
+ } else if (args.type === "llamacpp") {
101
+ return await endpoints.llamacpp(args);
102
+ } else {
103
+ // for legacy reason
104
+ return await endpoints.tgi(args);
105
+ }
106
+ }
107
+ random -= endpoint.weight;
108
+ }
109
+
110
+ throw new Error(`Failed to select endpoint`);
111
+ },
112
+ });
113
+
114
+ export const models = await Promise.all(modelsRaw.map((e) => processModel(e).then(addEndpoint)));
115
+
116
+ export const defaultModel = models[0];
117
 
118
  // Models that have been deprecated
119
  export const oldModels = OLD_MODELS
 
129
  .map((m) => ({ ...m, id: m.id || m.name, displayName: m.displayName || m.name }))
130
  : [];
131
 
 
 
132
  export const validateModel = (_models: BackendModel[]) => {
133
  // Zod enum function requires 2 parameters
134
  return z.enum([_models[0].id, ..._models.slice(1).map((m) => m.id)]);
135
  };
136
 
137
  // if `TASK_MODEL` is the name of a model we use it, else we try to parse `TASK_MODEL` as a model config itself
138
+
139
  export const smallModel = TASK_MODEL
140
+ ? (models.find((m) => m.name === TASK_MODEL) ||
141
+ (await processModel(modelConfig.parse(JSON.parse(TASK_MODEL))).then((m) =>
142
+ addEndpoint(m)
143
+ ))) ??
144
+ defaultModel
145
  : defaultModel;
146
 
147
+ export type BackendModel = Optional<typeof defaultModel, "preprompt" | "parameters">;
 
src/lib/server/summarize.ts CHANGED
@@ -1,6 +1,5 @@
1
  import { LLM_SUMMERIZATION } from "$env/static/private";
2
  import { generateFromDefaultEndpoint } from "$lib/server/generateFromDefaultEndpoint";
3
- import { smallModel } from "$lib/server/models";
4
  import type { Message } from "$lib/types/Message";
5
 
6
  export async function summarize(prompt: string) {
@@ -23,17 +22,13 @@ export async function summarize(prompt: string) {
23
  { from: "assistant", content: "🎥 Favorite movie" },
24
  { from: "user", content: "Explain the concept of artificial intelligence in one sentence" },
25
  { from: "assistant", content: "🤖 AI definition" },
26
- { from: "user", content: "Answer all my questions like chewbacca from now ok?" },
27
- { from: "assistant", content: "🐒 Answer as Chewbacca" },
28
  { from: "user", content: prompt },
29
  ];
30
 
31
- const summaryPrompt = smallModel.chatPromptRender({
32
  messages,
33
  preprompt: `You are a summarization AI. You'll never answer a user's question directly, but instead summarize the user's request into a single short sentence of four words or less. Always start your answer with an emoji relevant to the summary.`,
34
- });
35
-
36
- return await generateFromDefaultEndpoint(summaryPrompt)
37
  .then((summary) => {
38
  // add an emoji if none is found in the first three characters
39
  if (!/\p{Emoji}/u.test(summary.slice(0, 3))) {
 
1
  import { LLM_SUMMERIZATION } from "$env/static/private";
2
  import { generateFromDefaultEndpoint } from "$lib/server/generateFromDefaultEndpoint";
 
3
  import type { Message } from "$lib/types/Message";
4
 
5
  export async function summarize(prompt: string) {
 
22
  { from: "assistant", content: "🎥 Favorite movie" },
23
  { from: "user", content: "Explain the concept of artificial intelligence in one sentence" },
24
  { from: "assistant", content: "🤖 AI definition" },
 
 
25
  { from: "user", content: prompt },
26
  ];
27
 
28
+ return await generateFromDefaultEndpoint({
29
  messages,
30
  preprompt: `You are a summarization AI. You'll never answer a user's question directly, but instead summarize the user's request into a single short sentence of four words or less. Always start your answer with an emoji relevant to the summary.`,
31
+ })
 
 
32
  .then((summary) => {
33
  // add an emoji if none is found in the first three characters
34
  if (!/\p{Emoji}/u.test(summary.slice(0, 3))) {
src/lib/server/websearch/generateQuery.ts CHANGED
@@ -1,7 +1,6 @@
1
  import type { Message } from "$lib/types/Message";
2
  import { format } from "date-fns";
3
  import { generateFromDefaultEndpoint } from "../generateFromDefaultEndpoint";
4
- import { smallModel } from "../models";
5
 
6
  export async function generateQuery(messages: Message[]) {
7
  const currentDate = format(new Date(), "MMMM d, yyyy");
@@ -62,10 +61,8 @@ Current Question: Where is it being hosted ?`,
62
  },
63
  ];
64
 
65
- const promptQuery = smallModel.chatPromptRender({
66
- preprompt: `You are tasked with generating web search queries. Give me an appropriate query to answer my question for google search. Answer with only the query. Today is ${currentDate}`,
67
  messages: convQuery,
 
68
  });
69
-
70
- return await generateFromDefaultEndpoint(promptQuery);
71
  }
 
1
  import type { Message } from "$lib/types/Message";
2
  import { format } from "date-fns";
3
  import { generateFromDefaultEndpoint } from "../generateFromDefaultEndpoint";
 
4
 
5
  export async function generateQuery(messages: Message[]) {
6
  const currentDate = format(new Date(), "MMMM d, yyyy");
 
61
  },
62
  ];
63
 
64
+ return await generateFromDefaultEndpoint({
 
65
  messages: convQuery,
66
+ preprompt: `You are tasked with generating web search queries. Give me an appropriate query to answer my question for google search. Answer with only the query. Today is ${currentDate}`,
67
  });
 
 
68
  }
src/lib/utils/trimPrefix.ts DELETED
@@ -1,6 +0,0 @@
1
- export function trimPrefix(input: string, prefix: string) {
2
- if (input.startsWith(prefix)) {
3
- return input.slice(prefix.length);
4
- }
5
- return input;
6
- }
 
 
 
 
 
 
 
src/lib/utils/trimSuffix.ts DELETED
@@ -1,6 +0,0 @@
1
- export function trimSuffix(input: string, end: string): string {
2
- if (input.endsWith(end)) {
3
- return input.slice(0, input.length - end.length);
4
- }
5
- return input;
6
- }
 
 
 
 
 
 
 
src/routes/conversation/[id]/+page.svelte CHANGED
@@ -171,6 +171,8 @@
171
  convId: $page.params.id,
172
  };
173
  }
 
 
174
  }
175
  }
176
  } catch (parseError) {
 
171
  convId: $page.params.id,
172
  };
173
  }
174
+ } else if (update.status === "error") {
175
+ $error = update.message ?? "An error has occurred";
176
  }
177
  }
178
  } catch (parseError) {
src/routes/conversation/[id]/+server.ts CHANGED
@@ -1,26 +1,19 @@
1
- import { HF_ACCESS_TOKEN, MESSAGES_BEFORE_LOGIN, RATE_LIMIT } from "$env/static/private";
2
- import { buildPrompt } from "$lib/buildPrompt";
3
- import { PUBLIC_SEP_TOKEN } from "$lib/constants/publicSepToken";
4
  import { authCondition, requiresUser } from "$lib/server/auth";
5
  import { collections } from "$lib/server/database";
6
- import { modelEndpoint } from "$lib/server/modelEndpoint";
7
  import { models } from "$lib/server/models";
8
  import { ERROR_MESSAGES } from "$lib/stores/errors";
9
  import type { Message } from "$lib/types/Message";
10
- import { trimPrefix } from "$lib/utils/trimPrefix";
11
- import { trimSuffix } from "$lib/utils/trimSuffix";
12
- import { textGenerationStream } from "@huggingface/inference";
13
  import { error } from "@sveltejs/kit";
14
  import { ObjectId } from "mongodb";
15
  import { z } from "zod";
16
- import { AwsClient } from "aws4fetch";
17
  import type { MessageUpdate } from "$lib/types/MessageUpdate";
18
  import { runWebSearch } from "$lib/server/websearch/runWebSearch";
19
  import type { WebSearch } from "$lib/types/WebSearch";
20
  import { abortedGenerations } from "$lib/server/abortedGenerations";
21
  import { summarize } from "$lib/server/summarize";
22
 
23
- export async function POST({ request, fetch, locals, params, getClientAddress }) {
24
  const id = z.string().parse(params.id);
25
  const convId = new ObjectId(id);
26
  const promptedAt = new Date();
@@ -191,138 +184,90 @@ export async function POST({ request, fetch, locals, params, getClientAddress })
191
  webSearchResults = await runWebSearch(conv, newPrompt, update);
192
  }
193
 
194
- // we can now build the prompt using the messages
195
- const prompt = await buildPrompt({
196
- messages,
197
- model,
198
- webSearch: webSearchResults,
199
- preprompt: conv.preprompt ?? model.preprompt,
200
- locals: locals,
201
- });
202
-
203
- // fetch the endpoint
204
- const randomEndpoint = modelEndpoint(model);
205
-
206
- let usedFetch = fetch;
207
-
208
- if (randomEndpoint.host === "sagemaker") {
209
- const aws = new AwsClient({
210
- accessKeyId: randomEndpoint.accessKey,
211
- secretAccessKey: randomEndpoint.secretKey,
212
- sessionToken: randomEndpoint.sessionToken,
213
- service: "sagemaker",
214
- });
215
-
216
- usedFetch = aws.fetch.bind(aws) as typeof fetch;
217
- }
218
-
219
- async function saveLast(generated_text: string) {
220
- if (!conv) {
221
- throw error(404, "Conversation not found");
222
- }
223
-
224
- const lastMessage = messages[messages.length - 1];
225
-
226
- if (lastMessage) {
227
- // We could also check if PUBLIC_ASSISTANT_MESSAGE_TOKEN is present and use it to slice the text
228
- if (generated_text.startsWith(prompt)) {
229
- generated_text = generated_text.slice(prompt.length);
230
- }
231
-
232
- generated_text = trimSuffix(
233
- trimPrefix(generated_text, "<|startoftext|>"),
234
- PUBLIC_SEP_TOKEN
235
- ).trimEnd();
236
-
237
- // remove the stop tokens
238
- for (const stop of [...(model?.parameters?.stop ?? []), "<|endoftext|>"]) {
239
- if (generated_text.endsWith(stop)) {
240
- generated_text = generated_text.slice(0, -stop.length).trimEnd();
 
 
241
  }
242
- }
243
- lastMessage.content = generated_text;
244
-
245
- await collections.conversations.updateOne(
246
- {
247
- _id: convId,
248
- },
249
- {
250
- $set: {
251
- messages,
252
- title: conv.title,
253
  updatedAt: new Date(),
254
  },
255
- }
256
- );
257
-
258
- update({
259
- type: "finalAnswer",
260
- text: generated_text,
261
- });
262
  }
 
 
 
263
  }
264
-
265
- const tokenStream = textGenerationStream(
266
  {
267
- parameters: {
268
- ...models.find((m) => m.id === conv.model)?.parameters,
269
- return_full_text: false,
270
- },
271
- model: randomEndpoint.url,
272
- inputs: prompt,
273
- accessToken: randomEndpoint.host === "sagemaker" ? undefined : HF_ACCESS_TOKEN,
274
  },
275
  {
276
- use_cache: false,
277
- fetch: usedFetch,
 
 
 
278
  }
279
  );
280
 
281
- for await (const output of tokenStream) {
282
- // if not generated_text is here it means the generation is not done
283
- if (!output.generated_text) {
284
- // else we get the next token
285
- if (!output.token.special) {
286
- const lastMessage = messages[messages.length - 1];
287
- update({
288
- type: "stream",
289
- token: output.token.text,
290
- });
291
-
292
- // if the last message is not from assistant, it means this is the first token
293
- if (lastMessage?.from !== "assistant") {
294
- // so we create a new message
295
- messages = [
296
- ...messages,
297
- // id doesn't match the backend id but it's not important for assistant messages
298
- // First token has a space at the beginning, trim it
299
- {
300
- from: "assistant",
301
- content: output.token.text.trimStart(),
302
- webSearch: webSearchResults,
303
- updates: updates,
304
- id: (responseId as Message["id"]) || crypto.randomUUID(),
305
- createdAt: new Date(),
306
- updatedAt: new Date(),
307
- },
308
- ];
309
- } else {
310
- const date = abortedGenerations.get(convId.toString());
311
- if (date && date > promptedAt) {
312
- saveLast(lastMessage.content);
313
- }
314
- if (!output) {
315
- break;
316
- }
317
-
318
- // otherwise we just concatenate tokens
319
- lastMessage.content += output.token.text;
320
- }
321
- }
322
- } else {
323
- saveLast(output.generated_text);
324
- }
325
- }
326
  },
327
  async cancel() {
328
  await collections.conversations.updateOne(
 
1
+ import { MESSAGES_BEFORE_LOGIN, RATE_LIMIT } from "$env/static/private";
 
 
2
  import { authCondition, requiresUser } from "$lib/server/auth";
3
  import { collections } from "$lib/server/database";
 
4
  import { models } from "$lib/server/models";
5
  import { ERROR_MESSAGES } from "$lib/stores/errors";
6
  import type { Message } from "$lib/types/Message";
 
 
 
7
  import { error } from "@sveltejs/kit";
8
  import { ObjectId } from "mongodb";
9
  import { z } from "zod";
 
10
  import type { MessageUpdate } from "$lib/types/MessageUpdate";
11
  import { runWebSearch } from "$lib/server/websearch/runWebSearch";
12
  import type { WebSearch } from "$lib/types/WebSearch";
13
  import { abortedGenerations } from "$lib/server/abortedGenerations";
14
  import { summarize } from "$lib/server/summarize";
15
 
16
+ export async function POST({ request, locals, params, getClientAddress }) {
17
  const id = z.string().parse(params.id);
18
  const convId = new ObjectId(id);
19
  const promptedAt = new Date();
 
184
  webSearchResults = await runWebSearch(conv, newPrompt, update);
185
  }
186
 
187
+ messages[messages.length - 1].webSearch = webSearchResults;
188
+
189
+ conv.messages = messages;
190
+
191
+ try {
192
+ const endpoint = await model.getEndpoint();
193
+ for await (const output of await endpoint({ conversation: conv })) {
194
+ // if not generated_text is here it means the generation is not done
195
+ if (!output.generated_text) {
196
+ // else we get the next token
197
+ if (!output.token.special) {
198
+ update({
199
+ type: "stream",
200
+ token: output.token.text,
201
+ });
202
+
203
+ // if the last message is not from assistant, it means this is the first token
204
+ const lastMessage = messages[messages.length - 1];
205
+
206
+ if (lastMessage?.from !== "assistant") {
207
+ // so we create a new message
208
+ messages = [
209
+ ...messages,
210
+ // id doesn't match the backend id but it's not important for assistant messages
211
+ // First token has a space at the beginning, trim it
212
+ {
213
+ from: "assistant",
214
+ content: output.token.text.trimStart(),
215
+ webSearch: webSearchResults,
216
+ updates: updates,
217
+ id: (responseId as Message["id"]) || crypto.randomUUID(),
218
+ createdAt: new Date(),
219
+ updatedAt: new Date(),
220
+ },
221
+ ];
222
+ } else {
223
+ // abort check
224
+ const date = abortedGenerations.get(convId.toString());
225
+ if (date && date > promptedAt) {
226
+ break;
227
+ }
228
+
229
+ if (!output) {
230
+ break;
231
+ }
232
+
233
+ // otherwise we just concatenate tokens
234
+ lastMessage.content += output.token.text;
235
+ }
236
  }
237
+ } else {
238
+ // add output.generated text to the last message
239
+ messages = [
240
+ ...messages.slice(0, -1),
241
+ {
242
+ ...messages[messages.length - 1],
243
+ content: output.generated_text,
244
+ updates: updates,
 
 
 
245
  updatedAt: new Date(),
246
  },
247
+ ];
248
+ }
 
 
 
 
 
249
  }
250
+ } catch (e) {
251
+ console.error(e);
252
+ update({ type: "status", status: "error", message: (e as Error).message });
253
  }
254
+ await collections.conversations.updateOne(
 
255
  {
256
+ _id: convId,
 
 
 
 
 
 
257
  },
258
  {
259
+ $set: {
260
+ messages,
261
+ title: conv?.title,
262
+ updatedAt: new Date(),
263
+ },
264
  }
265
  );
266
 
267
+ update({
268
+ type: "finalAnswer",
269
+ text: messages[messages.length - 1].content,
270
+ });
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
271
  },
272
  async cancel() {
273
  await collections.conversations.updateOne(