Spaces:

speaches-ai
/

speaches

Running on CPU Upgrade

App Files Files Community

Fedir Zadniprovskyi commited on 19 days ago

Commit

43cc67a

1 Parent(s): 9922993

rename to `speaches`

Browse files

Files changed (45) hide show

Dockerfile +4 -4
README.md +13 -9
Taskfile.yaml +2 -2
compose.cpu.yaml +3 -3
compose.cuda-cdi.yaml +2 -2
compose.cuda.yaml +3 -3
compose.observability.yaml +1 -1
compose.yaml +2 -2
docs/configuration.md +2 -2
docs/installation.md +19 -19
docs/introduction.md +5 -4
docs/openapi.json +1 -1
docs/usage/open-webui-integration.md +4 -4
docs/usage/text-to-speech.md +4 -5
examples/javascript/index.js +1 -1
examples/live-audio/script.sh +3 -3
examples/youtube/script.sh +3 -3
mkdocs.yml +3 -3
pyproject.toml +1 -1
src/{faster_whisper_server → speaches}/__init__.py +0 -0
src/{faster_whisper_server → speaches}/api_models.py +2 -2
src/{faster_whisper_server → speaches}/asr.py +3 -3
src/{faster_whisper_server → speaches}/audio.py +1 -1
src/{faster_whisper_server → speaches}/config.py +0 -0
src/{faster_whisper_server → speaches}/dependencies.py +4 -4
src/{faster_whisper_server → speaches}/gradio_app.py +5 -5
src/{faster_whisper_server → speaches}/hf_utils.py +1 -1
src/{faster_whisper_server → speaches}/logger.py +0 -0
src/{faster_whisper_server → speaches}/main.py +7 -7
src/{faster_whisper_server → speaches}/model_manager.py +2 -2
src/{faster_whisper_server → speaches}/routers/__init__.py +0 -0
src/{faster_whisper_server → speaches}/routers/misc.py +2 -2
src/{faster_whisper_server → speaches}/routers/models.py +2 -2
src/{faster_whisper_server → speaches}/routers/speech.py +2 -2
src/{faster_whisper_server → speaches}/routers/stt.py +8 -8
src/{faster_whisper_server → speaches}/text_utils.py +2 -2
src/{faster_whisper_server → speaches}/text_utils_test.py +2 -2
src/{faster_whisper_server → speaches}/transcriber.py +4 -4
tests/api_timestamp_granularities_test.py +1 -1
tests/conftest.py +6 -6
tests/model_manager_test.py +1 -1
tests/openai_timestamp_granularities_test.py +1 -1
tests/speech_test.py +1 -1
tests/sse_test.py +1 -1
uv.lock +109 -109

Dockerfile CHANGED Viewed

@@ -1,7 +1,7 @@
 ARG BASE_IMAGE=nvidia/cuda:12.6.3-cudnn-runtime-ubuntu24.04
 # hadolint ignore=DL3006
 FROM ${BASE_IMAGE}
-LABEL org.opencontainers.image.source="https://github.com/fedirz/faster-whisper-server"
 LABEL org.opencontainers.image.licenses="MIT"
 # `ffmpeg` is installed because without it `gradio` won't work with mp3(possible others as well) files
 # hadolint ignore=DL3008
@@ -15,7 +15,7 @@ RUN apt-get update && \
 USER ubuntu
 ENV HOME=/home/ubuntu \
     PATH=/home/ubuntu/.local/bin:$PATH
-WORKDIR $HOME/faster-whisper-server
 # https://docs.astral.sh/uv/guides/integration/docker/#installing-uv
 COPY --chown=ubuntu --from=ghcr.io/astral-sh/uv:0.5.14 /uv /bin/uv
 # https://docs.astral.sh/uv/guides/integration/docker/#intermediate-layers
@@ -35,7 +35,7 @@ RUN mkdir -p $HOME/.cache/huggingface/hub
 ENV WHISPER__MODEL=Systran/faster-whisper-large-v3
 ENV UVICORN_HOST=0.0.0.0
 ENV UVICORN_PORT=8000
-ENV PATH="$HOME/faster-whisper-server/.venv/bin:$PATH"
 # https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables#hfhubenablehftransfer
 # NOTE: I've disabled this because it doesn't inside of Docker container. I couldn't pinpoint the exact reason. This doesn't happen when running the server locally.
 # RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.
@@ -44,4 +44,4 @@ ENV HF_HUB_ENABLE_HF_TRANSFER=0
 # https://www.reddit.com/r/StableDiffusion/comments/1f6asvd/gradio_sends_ip_address_telemetry_by_default/
 ENV DO_NOT_TRACK=1
 EXPOSE 8000
-CMD ["uvicorn", "--factory", "faster_whisper_server.main:create_app"]

 ARG BASE_IMAGE=nvidia/cuda:12.6.3-cudnn-runtime-ubuntu24.04
 # hadolint ignore=DL3006
 FROM ${BASE_IMAGE}
+LABEL org.opencontainers.image.source="https://github.com/speaches-ai/speaches"
 LABEL org.opencontainers.image.licenses="MIT"
 # `ffmpeg` is installed because without it `gradio` won't work with mp3(possible others as well) files
 # hadolint ignore=DL3008
 USER ubuntu
 ENV HOME=/home/ubuntu \
     PATH=/home/ubuntu/.local/bin:$PATH
+WORKDIR $HOME/speaches
 # https://docs.astral.sh/uv/guides/integration/docker/#installing-uv
 COPY --chown=ubuntu --from=ghcr.io/astral-sh/uv:0.5.14 /uv /bin/uv
 # https://docs.astral.sh/uv/guides/integration/docker/#intermediate-layers
 ENV WHISPER__MODEL=Systran/faster-whisper-large-v3
 ENV UVICORN_HOST=0.0.0.0
 ENV UVICORN_PORT=8000
+ENV PATH="$HOME/speaches/.venv/bin:$PATH"
 # https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables#hfhubenablehftransfer
 # NOTE: I've disabled this because it doesn't inside of Docker container. I couldn't pinpoint the exact reason. This doesn't happen when running the server locally.
 # RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.
 # https://www.reddit.com/r/StableDiffusion/comments/1f6asvd/gradio_sends_ip_address_telemetry_by_default/
 ENV DO_NOT_TRACK=1
 EXPOSE 8000
+CMD ["uvicorn", "--factory", "speaches.main:create_app"]

README.md CHANGED Viewed

@@ -1,11 +1,15 @@
-# Faster Whisper Server
-`faster-whisper-server` is an OpenAI API-compatible transcription server which uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as its backend.
 Features:
 - GPU and CPU support.
 - Easily deployable using Docker.
-- **Configurable through environment variables (see [config.py](./src/faster_whisper_server/config.py))**.
 - OpenAI API compatible.
 - Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
 - Live transcription support (audio is sent via websocket as it's generated).
@@ -18,7 +22,7 @@ Please create an issue if you find a bug, have a question, or a feature suggesti
 See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio) for more information.
 - Audio file transcription via `POST /v1/audio/transcriptions` endpoint.
-  - Unlike OpenAI's API, `faster-whisper-server` also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs.
 - Audio file translation via `POST /v1/audio/translations` endpoint.
 - Live audio transcription via `WS /v1/audio/transcriptions` endpoint.
   - LocalAgreement2 ([paper](https://aclanthology.org/2023.ijcnlp-demo.3.pdf) | [original implementation](https://github.com/ufal/whisper_streaming)) algorithm is used for live transcription.
@@ -35,13 +39,13 @@ See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio)
 NOTE: I'm using newer Docker Compsose features. If you are using an older version of Docker Compose, you may need need to update.
 ```bash
-curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
 # for GPU support
-curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cuda.yaml
 docker compose --file compose.cuda.yaml up --detach
 # for CPU only (use this if you don't have a GPU, as the image is much smaller)
-curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cpu.yaml
 docker compose --file compose.cpu.yaml up --detach
 ```
@@ -49,9 +53,9 @@ docker compose --file compose.cpu.yaml up --detach
 ```bash
 # for GPU support
-docker run --gpus=all --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --detach fedirz/faster-whisper-server:latest-cuda
 # for CPU only (use this if you don't have a GPU, as the image is much smaller)
-docker run --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=Systran/faster-whisper-small --detach fedirz/faster-whisper-server:latest-cpu
 ```
 ### Using Kubernetes

+> [!NOTE]
+> This project was previously named `faster-whisper-server`. I've decided to change the name from `faster-whisper-server`, as the project has evolved to support more than just transcription.
+# Speaches
+`speaches` is an OpenAI API-compatible server supporting transcription, translation, and speech generation. For transcription/translation it uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for text-to-speech [piper](https://github.com/rhasspy/piper) is used.
 Features:
 - GPU and CPU support.
 - Easily deployable using Docker.
+- **Configurable through environment variables (see [config.py](./src/speaches/config.py))**.
 - OpenAI API compatible.
 - Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
 - Live transcription support (audio is sent via websocket as it's generated).
 See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio) for more information.
 - Audio file transcription via `POST /v1/audio/transcriptions` endpoint.
+  - Unlike OpenAI's API, `speaches` also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs.
 - Audio file translation via `POST /v1/audio/translations` endpoint.
 - Live audio transcription via `WS /v1/audio/transcriptions` endpoint.
   - LocalAgreement2 ([paper](https://aclanthology.org/2023.ijcnlp-demo.3.pdf) | [original implementation](https://github.com/ufal/whisper_streaming)) algorithm is used for live transcription.
 NOTE: I'm using newer Docker Compsose features. If you are using an older version of Docker Compose, you may need need to update.
 ```bash
+curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
 # for GPU support
+curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda.yaml
 docker compose --file compose.cuda.yaml up --detach
 # for CPU only (use this if you don't have a GPU, as the image is much smaller)
+curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cpu.yaml
 docker compose --file compose.cpu.yaml up --detach
 ```
 ```bash
 # for GPU support
+docker run --gpus=all --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --detach ghcr.io/speaches-ai/speaches:latest-cuda
 # for CPU only (use this if you don't have a GPU, as the image is much smaller)
+docker run --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=Systran/faster-whisper-small --detach ghcr.io/speaches-ai/speaches:latest-cpu
 ```
 ### Using Kubernetes

Taskfile.yaml CHANGED Viewed

@@ -2,8 +2,8 @@ version: "3"
 tasks:
   server:
     cmds:
-      - pkill --signal SIGKILL --echo --full 'uvicorn --factory --host 0.0.0.0 faster_whisper_server.main:create_app' || true
-      - opentelemetry-instrument uvicorn --factory --host 0.0.0.0 faster_whisper_server.main:create_app {{.CLI_ARGS}}
     sources:
       - src/**/*.py
   test:

 tasks:
   server:
     cmds:
+      - pkill --signal SIGKILL --echo --full 'uvicorn --factory --host 0.0.0.0 speaches.main:create_app' || true
+      - opentelemetry-instrument uvicorn --factory --host 0.0.0.0 speaches.main:create_app {{.CLI_ARGS}}
     sources:
       - src/**/*.py
   test:

compose.cpu.yaml CHANGED Viewed

@@ -1,11 +1,11 @@
 # include:
 #   - compose.observability.yaml
 services:
-  faster-whisper-server:
     extends:
       file: compose.yaml
-      service: faster-whisper-server
-    image: fedirz/faster-whisper-server:latest-cpu
     build:
       args:
         BASE_IMAGE: ubuntu:24.04

 # include:
 #   - compose.observability.yaml
 services:
+  speaches:
     extends:
       file: compose.yaml
+      service: speaches
+    image: ghcr.io/speaches-ai/speaches:latest-cpu
     build:
       args:
         BASE_IMAGE: ubuntu:24.04

compose.cuda-cdi.yaml CHANGED Viewed

@@ -4,10 +4,10 @@
 # https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html
 # https://docs.docker.com/reference/cli/dockerd/#enable-cdi-devices
 services:
-  faster-whisper-server:
     extends:
       file: compose.cuda.yaml
-      service: faster-whisper-server
     volumes:
       - hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
     deploy:

 # https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html
 # https://docs.docker.com/reference/cli/dockerd/#enable-cdi-devices
 services:
+  speaches:
     extends:
       file: compose.cuda.yaml
+      service: speaches
     volumes:
       - hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
     deploy:

compose.cuda.yaml CHANGED Viewed

@@ -1,11 +1,11 @@
 # include:
 #   - compose.observability.yaml
 services:
-  faster-whisper-server:
     extends:
       file: compose.yaml
-      service: faster-whisper-server
-    image: fedirz/faster-whisper-server:latest-cuda
     build:
       args:
         BASE_IMAGE: nvidia/cuda:12.6.2-cudnn-runtime-ubuntu24.04

 # include:
 #   - compose.observability.yaml
 services:
+  speaches:
     extends:
       file: compose.yaml
+      service: speaches
+    image: ghcr.io/speaches-ai/speaches:latest-cuda
     build:
       args:
         BASE_IMAGE: nvidia/cuda:12.6.2-cudnn-runtime-ubuntu24.04

compose.observability.yaml CHANGED Viewed

@@ -5,7 +5,7 @@ services:
     volumes:
       - ./configuration/opentelemetry-collector.yaml:/etc/opentelemetry-collector.yaml
     ports:
-      # NOTE: when `faster-whisper-server` is also running as a Docker Compose service, this doesn't need to be exposed.
       - 4317:4317 # OTLP gRPC receiver
       # - 4318:4318 # OTLP HTTP receiver
       # - 8888:8888 # Prometheus metrics exposed by the Collector

     volumes:
       - ./configuration/opentelemetry-collector.yaml:/etc/opentelemetry-collector.yaml
     ports:
+      # NOTE: when `speaches` is also running as a Docker Compose service, this doesn't need to be exposed.
       - 4317:4317 # OTLP gRPC receiver
       # - 4318:4318 # OTLP HTTP receiver
       # - 8888:8888 # Prometheus metrics exposed by the Collector

compose.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 # TODO: https://docs.astral.sh/uv/guides/integration/docker/#configuring-watch-with-docker-compose
 services:
-  faster-whisper-server:
-    container_name: faster-whisper-server
     build:
       dockerfile: Dockerfile
       context: .

 # TODO: https://docs.astral.sh/uv/guides/integration/docker/#configuring-watch-with-docker-compose
 services:
+  speaches:
+    container_name: speaches
     build:
       dockerfile: Dockerfile
       context: .

docs/configuration.md CHANGED Viewed

@@ -1,5 +1,5 @@
 <!-- https://mkdocstrings.github.io/python/usage/configuration/general/ -->
-::: faster_whisper_server.config.Config
     options:
         show_bases: true
         show_if_no_docstring: true
@@ -16,7 +16,7 @@
             - "!speech_*"
             - "!transcription_*"
-::: faster_whisper_server.config.WhisperConfig
 <!-- TODO: nested model `whisper`  -->
 <!-- TODO: Insert new lines for multi-line docstrings  -->

 <!-- https://mkdocstrings.github.io/python/usage/configuration/general/ -->
+::: speaches.config.Config
     options:
         show_bases: true
         show_if_no_docstring: true
             - "!speech_*"
             - "!transcription_*"
+::: speaches.config.WhisperConfig
 <!-- TODO: nested model `whisper`  -->
 <!-- TODO: Insert new lines for multi-line docstrings  -->

docs/installation.md CHANGED Viewed

@@ -9,25 +9,25 @@ Download the necessary Docker Compose files
 === "CUDA"
     ```bash
-    curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
-    curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cuda.yaml
     export COMPOSE_FILE=compose.cuda.yaml
     ```
 === "CUDA (with CDI feature enabled)"
     ```bash
-    curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
-    curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cuda.yaml
-    curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cuda-cdi.yaml
     export COMPOSE_FILE=compose.cuda-cdi.yaml
     ```
 === "CPU"
     ```bash
-    curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
-    curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cpu.yaml
     export COMPOSE_FILE=compose.cpu.yaml
     ```
@@ -58,10 +58,10 @@ docker compose up --detach
       --rm \
       --detach \
       --publish 8000:8000 \
-      --name faster-whisper-server \
       --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
       --gpus=all \
-      fedirz/faster-whisper-server:latest-cuda
     ```
 === "CUDA (with CDI feature enabled)"
@@ -71,10 +71,10 @@ docker compose up --detach
       --rm \
       --detach \
       --publish 8000:8000 \
-      --name faster-whisper-server \
       --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
       --device=nvidia.com/gpu=all \
-      fedirz/faster-whisper-server:latest-cuda
     ```
 === "CPU"
@@ -84,31 +84,31 @@ docker compose up --detach
       --rm \
       --detach \
       --publish 8000:8000 \
-      --name faster-whisper-server \
       --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
-      fedirz/faster-whisper-server:latest-cpu
     ```
 ??? note "Build from source"
     ```bash
-    docker build --tag faster-whisper-server .
     # NOTE: you need to install and enable [buildx](https://github.com/docker/buildx) for multi-platform builds
     # Build image for both amd64 and arm64
-    docker buildx build --tag faster-whisper-server --platform linux/amd64,linux/arm64 .
     # Build image without CUDA support
-    docker build --tag faster-whisper-server --build-arg BASE_IMAGE=ubuntu:24.04 .
     ```
 ## Python (requires Python 3.12+ and `uv` package manager)
 ```bash
-git clone https://github.com/fedirz/faster-whisper-server.git
-cd faster-whisper-server
 uv venv
 sourve .venv/bin/activate
 uv sync --all-extras
-uvicorn --factory --host 0.0.0.0 faster_whisper_server.main:create_app
 ```

 === "CUDA"
     ```bash
+    curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
+    curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda.yaml
     export COMPOSE_FILE=compose.cuda.yaml
     ```
 === "CUDA (with CDI feature enabled)"
     ```bash
+    curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
+    curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda.yaml
+    curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda-cdi.yaml
     export COMPOSE_FILE=compose.cuda-cdi.yaml
     ```
 === "CPU"
     ```bash
+    curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
+    curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cpu.yaml
     export COMPOSE_FILE=compose.cpu.yaml
     ```
       --rm \
       --detach \
       --publish 8000:8000 \
+      --name speaches \
       --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
       --gpus=all \
+      ghcr.io/speaches-ai/speaches:latest-cuda
     ```
 === "CUDA (with CDI feature enabled)"
       --rm \
       --detach \
       --publish 8000:8000 \
+      --name speaches \
       --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
       --device=nvidia.com/gpu=all \
+      ghcr.io/speaches-ai/speaches:latest-cuda
     ```
 === "CPU"
       --rm \
       --detach \
       --publish 8000:8000 \
+      --name speaches \
       --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
+      ghcr.io/speaches-ai/speaches:latest-cpu
     ```
 ??? note "Build from source"
     ```bash
+    docker build --tag speaches .
     # NOTE: you need to install and enable [buildx](https://github.com/docker/buildx) for multi-platform builds
     # Build image for both amd64 and arm64
+    docker buildx build --tag speaches --platform linux/amd64,linux/arm64 .
     # Build image without CUDA support
+    docker build --tag speaches --build-arg BASE_IMAGE=ubuntu:24.04 .
     ```
 ## Python (requires Python 3.12+ and `uv` package manager)
 ```bash
+git clone https://github.com/speaches-ai/speaches.git
+cd speaches
 uv venv
 sourve .venv/bin/activate
 uv sync --all-extras
+uvicorn --factory --host 0.0.0.0 speaches.main:create_app
 ```

docs/introduction.md CHANGED Viewed

@@ -8,19 +8,20 @@
 TODO: add HuggingFace Space URL
-# Faster Whisper Server
-`faster-whisper-server` is an OpenAI API-compatible server supporting transcription, translation, and speech generation. For transcription/translation it uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for text-to-speech [piper](https://github.com/rhasspy/piper) is used.
 ## Features:
 - GPU and CPU support.
 - [Deployable via Docker Compose / Docker](./installation.md)
 - [Highly configurable](./configuration.md)
-- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with `faster-whisper-server`.
 - Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
 - Live transcription support (audio is sent via websocket as it's generated).
 - Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
 - (Coming soon) Audio generation (chat completions endpoint) | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
   - Generate a spoken audio summary of a body of text (text in, audio out)
   - Perform sentiment analysis on a recording (audio in, text out)
@@ -34,7 +35,7 @@ Please create an issue if you find a bug, have a question, or a feature suggesti
 See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio) for more information.
 - Audio file transcription via `POST /v1/audio/transcriptions` endpoint.
-  - Unlike OpenAI's API, `faster-whisper-server` also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs.
 - Audio file translation via `POST /v1/audio/translations` endpoint.
 - Live audio transcription via `WS /v1/audio/transcriptions` endpoint.
   - LocalAgreement2 ([paper](https://aclanthology.org/2023.ijcnlp-demo.3.pdf) | [original implementation](https://github.com/ufal/whisper_streaming)) algorithm is used for live transcription.

 TODO: add HuggingFace Space URL
+# Speaches
+`speaches` is an OpenAI API-compatible server supporting transcription, translation, and speech generation. For transcription/translation it uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for text-to-speech [piper](https://github.com/rhasspy/piper) is used.
 ## Features:
 - GPU and CPU support.
 - [Deployable via Docker Compose / Docker](./installation.md)
 - [Highly configurable](./configuration.md)
+- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with `speaches`.
 - Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
 - Live transcription support (audio is sent via websocket as it's generated).
 - Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
+- [Text-to-speech (TTS) via `piper`]
 - (Coming soon) Audio generation (chat completions endpoint) | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
   - Generate a spoken audio summary of a body of text (text in, audio out)
   - Perform sentiment analysis on a recording (audio in, text out)
 See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio) for more information.
 - Audio file transcription via `POST /v1/audio/transcriptions` endpoint.
+  - Unlike OpenAI's API, `speaches` also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs.
 - Audio file translation via `POST /v1/audio/translations` endpoint.
 - Live audio transcription via `WS /v1/audio/transcriptions` endpoint.
   - LocalAgreement2 ([paper](https://aclanthology.org/2023.ijcnlp-demo.3.pdf) | [original implementation](https://github.com/ufal/whisper_streaming)) algorithm is used for live transcription.

docs/openapi.json CHANGED Viewed

@@ -1 +1 @@

- {"openapi":"3.1.0","info":{"title":"FastAPI","version":"0.1.0"},"paths":{"/v1/audio/translations":{"post":{"tags":["automatic-speech-recognition"],"summary":"Translate File","operationId":"translate_file_v1_audio_translations_post","requestBody":{"content":{"application/x-www-form-urlencoded":{"schema":{"$ref":"#/components/schemas/Body_translate_file_v1_audio_translations_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"anyOf":[{"type":"string"},{"$ref":"#/components/schemas/CreateTranscriptionResponseJson"},{"$ref":"#/components/schemas/CreateTranscriptionResponseVerboseJson"}],"title":"Response Translate File V1 Audio Translations Post"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v1/audio/transcriptions":{"post":{"tags":["automatic-speech-recognition"],"summary":"Transcribe File","operationId":"transcribe_file_v1_audio_transcriptions_post","requestBody":{"content":{"application/x-www-form-urlencoded":{"schema":{"$ref":"#/components/schemas/Body_transcribe_file_v1_audio_transcriptions_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"anyOf":[{"type":"string"},{"$ref":"#/components/schemas/CreateTranscriptionResponseJson"},{"$ref":"#/components/schemas/CreateTranscriptionResponseVerboseJson"}],"title":"Response Transcribe File V1 Audio Transcriptions Post"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v1/models":{"get":{"tags":["models"],"summary":"Get Models","operationId":"get_models_v1_models_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ListModelsResponse"}}}}}}},"/v1/models/{model_name}":{"get":{"tags":["models"],"summary":"Get Model","operationId":"get_model_v1_models__model_name__get","parameters":[{"name":"model_name","in":"path","required":true,"schema":{"type":"string","title":"Model Name"},"example":"Systran/faster-distil-whisper-large-v3"}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/Model"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/health":{"get":{"tags":["diagnostic"],"summary":"Health","operationId":"health_health_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}}}}},"/api/pull/{model_name}":{"post":{"tags":["experimental"],"summary":"Download a model from Hugging Face.","operationId":"pull_model_api_pull__model_name__post","parameters":[{"name":"model_name","in":"path","required":true,"schema":{"type":"string","title":"Model Name"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/api/ps":{"get":{"tags":["experimental"],"summary":"Get a list of loaded models.","operationId":"get_running_models_api_ps_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"additionalProperties":{"items":{"type":"string"},"type":"array"},"type":"object","title":"Response Get Running Models Api Ps Get"}}}}}}},"/api/ps/{model_name}":{"post":{"tags":["experimental"],"summary":"Load a model into memory.","operationId":"load_model_route_api_ps__model_name__post","parameters":[{"name":"model_name","in":"path","required":true,"schema":{"type":"string","title":"Model Name"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}},"delete":{"tags":["experimental"],"summary":"Unload a model from memory.","operationId":"stop_running_model_api_ps__model_name__delete","parameters":[{"name":"model_name","in":"path","required":true,"schema":{"type":"string","title":"Model Name"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v1/audio/speech":{"post":{"tags":["speech-to-text"],"summary":"Synthesize","operationId":"synthesize_v1_audio_speech_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/CreateSpeechRequestBody"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v1/audio/speech/voices":{"get":{"tags":["speech-to-text"],"summary":"List Voices","operationId":"list_voices_v1_audio_speech_voices_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"items":{"$ref":"#/components/schemas/PiperModel"},"type":"array","title":"Response List Voices V1 Audio Speech Voices Get"}}}}}}}},"components":{"schemas":{"Body_transcribe_file_v1_audio_transcriptions_post":{"properties":{"model":{"anyOf":[{"type":"string","description":"The ID of the model. You can get a list of available models by calling `/v1/models`.","examples":["Systran/faster-distil-whisper-large-v3","bofenghuang/whisper-large-v2-cv11-french-ct2"]},{"type":"null"}],"title":"Model"},"language":{"anyOf":[{"$ref":"#/components/schemas/Language"},{"type":"null"}]},"prompt":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Prompt"},"response_format":{"anyOf":[{"$ref":"#/components/schemas/faster_whisper_server__config__ResponseFormat"},{"type":"null"}]},"temperature":{"type":"number","title":"Temperature","default":0.0},"timestamp_granularities":{"items":{"type":"string","enum":["segment","word"]},"type":"array","title":"Timestamp Granularities","default":["segment"]},"stream":{"type":"boolean","title":"Stream","default":false},"hotwords":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Hotwords"},"vad_filter":{"type":"boolean","title":"Vad Filter","default":false},"file":{"type":"string","format":"binary","title":"File"}},"type":"object","required":["file"],"title":"Body_transcribe_file_v1_audio_transcriptions_post"},"Body_translate_file_v1_audio_translations_post":{"properties":{"model":{"anyOf":[{"type":"string","description":"The ID of the model. You can get a list of available models by calling `/v1/models`.","examples":["Systran/faster-distil-whisper-large-v3","bofenghuang/whisper-large-v2-cv11-french-ct2"]},{"type":"null"}],"title":"Model"},"prompt":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Prompt"},"response_format":{"anyOf":[{"$ref":"#/components/schemas/faster_whisper_server__config__ResponseFormat"},{"type":"null"}]},"temperature":{"type":"number","title":"Temperature","default":0.0},"stream":{"type":"boolean","title":"Stream","default":false},"vad_filter":{"type":"boolean","title":"Vad Filter","default":false},"file":{"type":"string","format":"binary","title":"File"}},"type":"object","required":["file"],"title":"Body_translate_file_v1_audio_translations_post"},"CreateSpeechRequestBody":{"properties":{"model":{"type":"string","enum":["piper"],"const":"piper","title":"Model","description":"The ID of the model. The only supported model is 'piper'.","default":"piper","examples":["piper"]},"input":{"type":"string","title":"Input","description":"The text to generate audio for. ","examples":["A rainbow is an optical phenomenon caused by refraction, internal reflection and dispersion of light in water droplets resulting in a continuous spectrum of light appearing in the sky. The rainbow takes the form of a multicoloured circular arc. Rainbows caused by sunlight always appear in the section of sky directly opposite the Sun. Rainbows can be caused by many forms of airborne water. These include not only rain, but also mist, spray, and airborne dew."]},"voice":{"type":"string","title":"Voice","default":"en_US-amy-medium"},"response_format":{"$ref":"#/components/schemas/faster_whisper_server__routers__speech__ResponseFormat","description":"The format to audio in. Supported formats are mp3, flac, wav, pcm. opus, aac are not supported","default":"mp3","examples":["mp3","flac","wav","pcm"]},"speed":{"type":"number","maximum":4.0,"minimum":0.25,"title":"Speed","default":1.0},"sample_rate":{"anyOf":[{"type":"integer","maximum":48000.0,"minimum":8000.0},{"type":"null"}],"title":"Sample Rate"}},"type":"object","required":["input"],"title":"CreateSpeechRequestBody"},"CreateTranscriptionResponseJson":{"properties":{"text":{"type":"string","title":"Text"}},"type":"object","required":["text"],"title":"CreateTranscriptionResponseJson"},"CreateTranscriptionResponseVerboseJson":{"properties":{"task":{"type":"string","title":"Task","default":"transcribe"},"language":{"type":"string","title":"Language"},"duration":{"type":"number","title":"Duration"},"text":{"type":"string","title":"Text"},"words":{"anyOf":[{"items":{"$ref":"#/components/schemas/TranscriptionWord"},"type":"array"},{"type":"null"}],"title":"Words"},"segments":{"items":{"$ref":"#/components/schemas/TranscriptionSegment"},"type":"array","title":"Segments"}},"type":"object","required":["language","duration","text","words","segments"],"title":"CreateTranscriptionResponseVerboseJson"},"HTTPValidationError":{"properties":{"detail":{"items":{"$ref":"#/components/schemas/ValidationError"},"type":"array","title":"Detail"}},"type":"object","title":"HTTPValidationError"},"Language":{"type":"string","enum":["af","am","ar","as","az","ba","be","bg","bn","bo","br","bs","ca","cs","cy","da","de","el","en","es","et","eu","fa","fi","fo","fr","gl","gu","ha","haw","he","hi","hr","ht","hu","hy","id","is","it","ja","jw","ka","kk","km","kn","ko","la","lb","ln","lo","lt","lv","mg","mi","mk","ml","mn","mr","ms","mt","my","ne","nl","nn","no","oc","pa","pl","ps","pt","ro","ru","sa","sd","si","sk","sl","sn","so","sq","sr","su","sv","sw","ta","te","tg","th","tk","tl","tr","tt","uk","ur","uz","vi","yi","yo","yue","zh"],"title":"Language"},"ListModelsResponse":{"properties":{"data":{"items":{"$ref":"#/components/schemas/Model"},"type":"array","title":"Data"},"object":{"type":"string","enum":["list"],"const":"list","title":"Object","default":"list"}},"type":"object","required":["data"],"title":"ListModelsResponse"},"Model":{"properties":{"id":{"type":"string","title":"Id"},"created":{"type":"integer","title":"Created"},"object":{"type":"string","enum":["model"],"const":"model","title":"Object"},"owned_by":{"type":"string","title":"Owned By"},"language":{"items":{"type":"string"},"type":"array","title":"Language"}},"type":"object","required":["id","created","object","owned_by"],"title":"Model","examples":[{"created":1700732060,"id":"Systran/faster-whisper-large-v3","object":"model","owned_by":"Systran"},{"created":1711378296,"id":"Systran/faster-distil-whisper-large-v3","object":"model","owned_by":"Systran"},{"created":1687968011,"id":"bofenghuang/whisper-large-v2-cv11-french-ct2","object":"model","owned_by":"bofenghuang"}]},"PiperModel":{"properties":{"object":{"type":"string","enum":["model"],"const":"model","title":"Object","default":"model"},"created":{"type":"integer","title":"Created"},"owned_by":{"type":"string","enum":["rhasspy"],"const":"rhasspy","title":"Owned By","default":"rhasspy"},"model_path":{"type":"string","format":"path","title":"Model Path","examples":["/home/nixos/.cache/huggingface/hub/models--rhasspy--piper-voices/snapshots/3d796cc2f2c884b3517c527507e084f7bb245aea/en/en_US/amy/medium/en_US-amy-medium.onnx"]},"id":{"type":"string","title":"Id","readOnly":true,"examples":["rhasspy/piper-voices/en_US-amy-medium"]},"voice":{"type":"string","title":"Voice","readOnly":true,"examples":["rhasspy/piper-voices/en_US-amy-medium"]},"config_path":{"type":"string","format":"path","title":"Config Path","readOnly":true},"quality":{"type":"string","enum":["x_low","low","medium","high"],"title":"Quality","readOnly":true},"sample_rate":{"type":"integer","title":"Sample Rate","readOnly":true}},"type":"object","required":["created","model_path","id","voice","config_path","quality","sample_rate"],"title":"PiperModel","description":"Similar structure to the GET /v1/models response but with extra fields."},"TranscriptionSegment":{"properties":{"id":{"type":"integer","title":"Id"},"seek":{"type":"integer","title":"Seek"},"start":{"type":"number","title":"Start"},"end":{"type":"number","title":"End"},"text":{"type":"string","title":"Text"},"tokens":{"items":{"type":"integer"},"type":"array","title":"Tokens"},"temperature":{"type":"number","title":"Temperature"},"avg_logprob":{"type":"number","title":"Avg Logprob"},"compression_ratio":{"type":"number","title":"Compression Ratio"},"no_speech_prob":{"type":"number","title":"No Speech Prob"},"words":{"anyOf":[{"items":{"$ref":"#/components/schemas/TranscriptionWord"},"type":"array"},{"type":"null"}],"title":"Words"}},"type":"object","required":["id","seek","start","end","text","tokens","temperature","avg_logprob","compression_ratio","no_speech_prob","words"],"title":"TranscriptionSegment"},"TranscriptionWord":{"properties":{"start":{"type":"number","title":"Start"},"end":{"type":"number","title":"End"},"word":{"type":"string","title":"Word"},"probability":{"type":"number","title":"Probability"}},"type":"object","required":["start","end","word","probability"],"title":"TranscriptionWord"},"ValidationError":{"properties":{"loc":{"items":{"anyOf":[{"type":"string"},{"type":"integer"}]},"type":"array","title":"Location"},"msg":{"type":"string","title":"Message"},"type":{"type":"string","title":"Error Type"}},"type":"object","required":["loc","msg","type"],"title":"ValidationError"},"faster_whisper_server__config__ResponseFormat":{"type":"string","enum":["text","json","verbose_json","srt","vtt"],"title":"ResponseFormat"},"faster_whisper_server__routers__speech__ResponseFormat":{"type":"string","enum":["mp3","flac","wav","pcm"]}}},"tags":[{"name":"automatic-speech-recognition"},{"name":"speech-to-text"},{"name":"models"},{"name":"diagnostic"},{"name":"experimental","description":"Not meant for public use yet. May change or be removed at any time."}]}

+ {"openapi":"3.1.0","info":{"title":"FastAPI","version":"0.1.0"},"paths":{"/v1/audio/translations":{"post":{"tags":["automatic-speech-recognition"],"summary":"Translate File","operationId":"translate_file_v1_audio_translations_post","requestBody":{"content":{"application/x-www-form-urlencoded":{"schema":{"$ref":"#/components/schemas/Body_translate_file_v1_audio_translations_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"anyOf":[{"type":"string"},{"$ref":"#/components/schemas/CreateTranscriptionResponseJson"},{"$ref":"#/components/schemas/CreateTranscriptionResponseVerboseJson"}],"title":"Response Translate File V1 Audio Translations Post"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v1/audio/transcriptions":{"post":{"tags":["automatic-speech-recognition"],"summary":"Transcribe File","operationId":"transcribe_file_v1_audio_transcriptions_post","requestBody":{"content":{"application/x-www-form-urlencoded":{"schema":{"$ref":"#/components/schemas/Body_transcribe_file_v1_audio_transcriptions_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"anyOf":[{"type":"string"},{"$ref":"#/components/schemas/CreateTranscriptionResponseJson"},{"$ref":"#/components/schemas/CreateTranscriptionResponseVerboseJson"}],"title":"Response Transcribe File V1 Audio Transcriptions Post"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v1/models":{"get":{"tags":["models"],"summary":"Get Models","operationId":"get_models_v1_models_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ListModelsResponse"}}}}}}},"/v1/models/{model_name}":{"get":{"tags":["models"],"summary":"Get Model","operationId":"get_model_v1_models__model_name__get","parameters":[{"name":"model_name","in":"path","required":true,"schema":{"type":"string","title":"Model Name"},"example":"Systran/faster-distil-whisper-large-v3"}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/Model"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/health":{"get":{"tags":["diagnostic"],"summary":"Health","operationId":"health_health_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}}}}},"/api/pull/{model_name}":{"post":{"tags":["experimental"],"summary":"Download a model from Hugging Face.","operationId":"pull_model_api_pull__model_name__post","parameters":[{"name":"model_name","in":"path","required":true,"schema":{"type":"string","title":"Model Name"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/api/ps":{"get":{"tags":["experimental"],"summary":"Get a list of loaded models.","operationId":"get_running_models_api_ps_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"additionalProperties":{"items":{"type":"string"},"type":"array"},"type":"object","title":"Response Get Running Models Api Ps Get"}}}}}}},"/api/ps/{model_name}":{"post":{"tags":["experimental"],"summary":"Load a model into memory.","operationId":"load_model_route_api_ps__model_name__post","parameters":[{"name":"model_name","in":"path","required":true,"schema":{"type":"string","title":"Model Name"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}},"delete":{"tags":["experimental"],"summary":"Unload a model from memory.","operationId":"stop_running_model_api_ps__model_name__delete","parameters":[{"name":"model_name","in":"path","required":true,"schema":{"type":"string","title":"Model Name"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v1/audio/speech":{"post":{"tags":["speech-to-text"],"summary":"Synthesize","operationId":"synthesize_v1_audio_speech_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/CreateSpeechRequestBody"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v1/audio/speech/voices":{"get":{"tags":["speech-to-text"],"summary":"List Voices","operationId":"list_voices_v1_audio_speech_voices_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"items":{"$ref":"#/components/schemas/PiperModel"},"type":"array","title":"Response List Voices V1 Audio Speech Voices Get"}}}}}}}},"components":{"schemas":{"Body_transcribe_file_v1_audio_transcriptions_post":{"properties":{"model":{"anyOf":[{"type":"string","description":"The ID of the model. You can get a list of available models by calling `/v1/models`.","examples":["Systran/faster-distil-whisper-large-v3","bofenghuang/whisper-large-v2-cv11-french-ct2"]},{"type":"null"}],"title":"Model"},"language":{"anyOf":[{"$ref":"#/components/schemas/Language"},{"type":"null"}]},"prompt":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Prompt"},"response_format":{"anyOf":[{"$ref":"#/components/schemas/speaches__config__ResponseFormat"},{"type":"null"}]},"temperature":{"type":"number","title":"Temperature","default":0.0},"timestamp_granularities":{"items":{"type":"string","enum":["segment","word"]},"type":"array","title":"Timestamp Granularities","default":["segment"]},"stream":{"type":"boolean","title":"Stream","default":false},"hotwords":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Hotwords"},"vad_filter":{"type":"boolean","title":"Vad Filter","default":false},"file":{"type":"string","format":"binary","title":"File"}},"type":"object","required":["file"],"title":"Body_transcribe_file_v1_audio_transcriptions_post"},"Body_translate_file_v1_audio_translations_post":{"properties":{"model":{"anyOf":[{"type":"string","description":"The ID of the model. You can get a list of available models by calling `/v1/models`.","examples":["Systran/faster-distil-whisper-large-v3","bofenghuang/whisper-large-v2-cv11-french-ct2"]},{"type":"null"}],"title":"Model"},"prompt":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Prompt"},"response_format":{"anyOf":[{"$ref":"#/components/schemas/speaches__config__ResponseFormat"},{"type":"null"}]},"temperature":{"type":"number","title":"Temperature","default":0.0},"stream":{"type":"boolean","title":"Stream","default":false},"vad_filter":{"type":"boolean","title":"Vad Filter","default":false},"file":{"type":"string","format":"binary","title":"File"}},"type":"object","required":["file"],"title":"Body_translate_file_v1_audio_translations_post"},"CreateSpeechRequestBody":{"properties":{"model":{"type":"string","enum":["piper"],"const":"piper","title":"Model","description":"The ID of the model. The only supported model is 'piper'.","default":"piper","examples":["piper"]},"input":{"type":"string","title":"Input","description":"The text to generate audio for. ","examples":["A rainbow is an optical phenomenon caused by refraction, internal reflection and dispersion of light in water droplets resulting in a continuous spectrum of light appearing in the sky. The rainbow takes the form of a multicoloured circular arc. Rainbows caused by sunlight always appear in the section of sky directly opposite the Sun. Rainbows can be caused by many forms of airborne water. These include not only rain, but also mist, spray, and airborne dew."]},"voice":{"type":"string","title":"Voice","default":"en_US-amy-medium"},"response_format":{"$ref":"#/components/schemas/speaches__routers__speech__ResponseFormat","description":"The format to audio in. Supported formats are mp3, flac, wav, pcm. opus, aac are not supported","default":"mp3","examples":["mp3","flac","wav","pcm"]},"speed":{"type":"number","maximum":4.0,"minimum":0.25,"title":"Speed","default":1.0},"sample_rate":{"anyOf":[{"type":"integer","maximum":48000.0,"minimum":8000.0},{"type":"null"}],"title":"Sample Rate"}},"type":"object","required":["input"],"title":"CreateSpeechRequestBody"},"CreateTranscriptionResponseJson":{"properties":{"text":{"type":"string","title":"Text"}},"type":"object","required":["text"],"title":"CreateTranscriptionResponseJson"},"CreateTranscriptionResponseVerboseJson":{"properties":{"task":{"type":"string","title":"Task","default":"transcribe"},"language":{"type":"string","title":"Language"},"duration":{"type":"number","title":"Duration"},"text":{"type":"string","title":"Text"},"words":{"anyOf":[{"items":{"$ref":"#/components/schemas/TranscriptionWord"},"type":"array"},{"type":"null"}],"title":"Words"},"segments":{"items":{"$ref":"#/components/schemas/TranscriptionSegment"},"type":"array","title":"Segments"}},"type":"object","required":["language","duration","text","words","segments"],"title":"CreateTranscriptionResponseVerboseJson"},"HTTPValidationError":{"properties":{"detail":{"items":{"$ref":"#/components/schemas/ValidationError"},"type":"array","title":"Detail"}},"type":"object","title":"HTTPValidationError"},"Language":{"type":"string","enum":["af","am","ar","as","az","ba","be","bg","bn","bo","br","bs","ca","cs","cy","da","de","el","en","es","et","eu","fa","fi","fo","fr","gl","gu","ha","haw","he","hi","hr","ht","hu","hy","id","is","it","ja","jw","ka","kk","km","kn","ko","la","lb","ln","lo","lt","lv","mg","mi","mk","ml","mn","mr","ms","mt","my","ne","nl","nn","no","oc","pa","pl","ps","pt","ro","ru","sa","sd","si","sk","sl","sn","so","sq","sr","su","sv","sw","ta","te","tg","th","tk","tl","tr","tt","uk","ur","uz","vi","yi","yo","yue","zh"],"title":"Language"},"ListModelsResponse":{"properties":{"data":{"items":{"$ref":"#/components/schemas/Model"},"type":"array","title":"Data"},"object":{"type":"string","enum":["list"],"const":"list","title":"Object","default":"list"}},"type":"object","required":["data"],"title":"ListModelsResponse"},"Model":{"properties":{"id":{"type":"string","title":"Id"},"created":{"type":"integer","title":"Created"},"object":{"type":"string","enum":["model"],"const":"model","title":"Object"},"owned_by":{"type":"string","title":"Owned By"},"language":{"items":{"type":"string"},"type":"array","title":"Language"}},"type":"object","required":["id","created","object","owned_by"],"title":"Model","examples":[{"created":1700732060,"id":"Systran/faster-whisper-large-v3","object":"model","owned_by":"Systran"},{"created":1711378296,"id":"Systran/faster-distil-whisper-large-v3","object":"model","owned_by":"Systran"},{"created":1687968011,"id":"bofenghuang/whisper-large-v2-cv11-french-ct2","object":"model","owned_by":"bofenghuang"}]},"PiperModel":{"properties":{"object":{"type":"string","enum":["model"],"const":"model","title":"Object","default":"model"},"created":{"type":"integer","title":"Created"},"owned_by":{"type":"string","enum":["rhasspy"],"const":"rhasspy","title":"Owned By","default":"rhasspy"},"model_path":{"type":"string","format":"path","title":"Model Path","examples":["/home/nixos/.cache/huggingface/hub/models--rhasspy--piper-voices/snapshots/3d796cc2f2c884b3517c527507e084f7bb245aea/en/en_US/amy/medium/en_US-amy-medium.onnx"]},"id":{"type":"string","title":"Id","readOnly":true,"examples":["rhasspy/piper-voices/en_US-amy-medium"]},"voice":{"type":"string","title":"Voice","readOnly":true,"examples":["rhasspy/piper-voices/en_US-amy-medium"]},"config_path":{"type":"string","format":"path","title":"Config Path","readOnly":true},"quality":{"type":"string","enum":["x_low","low","medium","high"],"title":"Quality","readOnly":true},"sample_rate":{"type":"integer","title":"Sample Rate","readOnly":true}},"type":"object","required":["created","model_path","id","voice","config_path","quality","sample_rate"],"title":"PiperModel","description":"Similar structure to the GET /v1/models response but with extra fields."},"TranscriptionSegment":{"properties":{"id":{"type":"integer","title":"Id"},"seek":{"type":"integer","title":"Seek"},"start":{"type":"number","title":"Start"},"end":{"type":"number","title":"End"},"text":{"type":"string","title":"Text"},"tokens":{"items":{"type":"integer"},"type":"array","title":"Tokens"},"temperature":{"type":"number","title":"Temperature"},"avg_logprob":{"type":"number","title":"Avg Logprob"},"compression_ratio":{"type":"number","title":"Compression Ratio"},"no_speech_prob":{"type":"number","title":"No Speech Prob"},"words":{"anyOf":[{"items":{"$ref":"#/components/schemas/TranscriptionWord"},"type":"array"},{"type":"null"}],"title":"Words"}},"type":"object","required":["id","seek","start","end","text","tokens","temperature","avg_logprob","compression_ratio","no_speech_prob","words"],"title":"TranscriptionSegment"},"TranscriptionWord":{"properties":{"start":{"type":"number","title":"Start"},"end":{"type":"number","title":"End"},"word":{"type":"string","title":"Word"},"probability":{"type":"number","title":"Probability"}},"type":"object","required":["start","end","word","probability"],"title":"TranscriptionWord"},"ValidationError":{"properties":{"loc":{"items":{"anyOf":[{"type":"string"},{"type":"integer"}]},"type":"array","title":"Location"},"msg":{"type":"string","title":"Message"},"type":{"type":"string","title":"Error Type"}},"type":"object","required":["loc","msg","type"],"title":"ValidationError"},"speaches__config__ResponseFormat":{"type":"string","enum":["text","json","verbose_json","srt","vtt"],"title":"ResponseFormat"},"speaches__routers__speech__ResponseFormat":{"type":"string","enum":["mp3","flac","wav","pcm"]}}},"tags":[{"name":"automatic-speech-recognition"},{"name":"speech-to-text"},{"name":"models"},{"name":"diagnostic"},{"name":"experimental","description":"Not meant for public use yet. May change or be removed at any time."}]}

docs/usage/open-webui-integration.md CHANGED Viewed

@@ -6,7 +6,7 @@
 2. Click on the "Audio" tab
 3. Update settings
    - Speech-to-Text Engine: OpenAI
-   - API Base URL: http://faster-whisper-server:8000/v1
    - API Key: does-not-matter-what-you-put-but-should-not-be-empty
    - Model: Systran/faster-distil-whisper-large-v3
 4. Click "Save"
@@ -27,10 +27,10 @@ services:
       ...
       # Environment variables are documented here https://docs.openwebui.com/getting-started/env-configuration#speech-to-text
       AUDIO_STT_ENGINE: "openai"
-      AUDIO_STT_OPENAI_API_BASE_URL: "http://faster-whisper-server:8000/v1"
       AUDIO_STT_OPENAI_API_KEY: "does-not-matter-what-you-put-but-should-not-be-empty"
       AUDIO_STT_MODEL: "Systran/faster-distil-whisper-large-v3"
-  faster-whisper-server:
-    image: fedirz/faster-whisper-server:latest-cuda
     ...
 ```

 2. Click on the "Audio" tab
 3. Update settings
    - Speech-to-Text Engine: OpenAI
+   - API Base URL: http://speaches:8000/v1
    - API Key: does-not-matter-what-you-put-but-should-not-be-empty
    - Model: Systran/faster-distil-whisper-large-v3
 4. Click "Save"
       ...
       # Environment variables are documented here https://docs.openwebui.com/getting-started/env-configuration#speech-to-text
       AUDIO_STT_ENGINE: "openai"
+      AUDIO_STT_OPENAI_API_BASE_URL: "http://speaches:8000/v1"
       AUDIO_STT_OPENAI_API_KEY: "does-not-matter-what-you-put-but-should-not-be-empty"
       AUDIO_STT_MODEL: "Systran/faster-distil-whisper-large-v3"
+  speaches:
+    image: ghcr.io/speaches-ai/speaches:latest-cuda
     ...
 ```

docs/usage/text-to-speech.md CHANGED Viewed

@@ -2,7 +2,6 @@
     This feature not supported on ARM devices only x86_64. I was unable to build [piper-phonemize](https://github.com/rhasspy/piper-phonemize)(my [fork](https://github.com/fedirz/piper-phonemize))
-http://localhost:8001/faster-whisper-server/api/
 TODO: add a note about automatic downloads
 TODO: add a demo
 TODO: add a note about tts only running on cpu
@@ -19,13 +18,13 @@ Download the piper voices from [HuggingFace model repository](https://huggingfac
 ```bash
 # Download all voices (~15 minutes / 7.7 Gbs)
-docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices
 # Download all English voices (~4.5 minutes)
-docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/**/*' 'voices.json'
 # Download all qualities of a specific voice (~4 seconds)
-docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/**/*' 'voices.json'
 # Download specific quality of a specific voice (~2 seconds)
-docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/medium/*' 'voices.json'
 ```
 !!! note

     This feature not supported on ARM devices only x86_64. I was unable to build [piper-phonemize](https://github.com/rhasspy/piper-phonemize)(my [fork](https://github.com/fedirz/piper-phonemize))
 TODO: add a note about automatic downloads
 TODO: add a demo
 TODO: add a note about tts only running on cpu
 ```bash
 # Download all voices (~15 minutes / 7.7 Gbs)
+docker exec -it speaches huggingface-cli download rhasspy/piper-voices
 # Download all English voices (~4.5 minutes)
+docker exec -it speaches huggingface-cli download rhasspy/piper-voices --include 'en/**/*' 'voices.json'
 # Download all qualities of a specific voice (~4 seconds)
+docker exec -it speaches huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/**/*' 'voices.json'
 # Download specific quality of a specific voice (~2 seconds)
+docker exec -it speaches huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/medium/*' 'voices.json'
 ```
 !!! note

examples/javascript/index.js CHANGED Viewed

@@ -1,5 +1,5 @@
 /**
- * Example provided by https://github.com/Gan-Xing in https://github.com/fedirz/faster-whisper-server/issues/26
  */
 import 'dotenv/config';
 import fs from 'node:fs';

 /**
+ * Example provided by https://github.com/Gan-Xing in https://github.com/speaches-ai/speaches/issues/26
  */
 import 'dotenv/config';
 import fs from 'node:fs';

examples/live-audio/script.sh CHANGED Viewed

@@ -9,10 +9,10 @@ set -e
 export WHISPER__MODEL=Systran/faster-distil-whisper-large-v3 # or Systran/faster-whisper-tiny.en if you are running on a CPU for a faster inference.
-# Ensure you have `faster-whisper-server` running. If this is your first time running it expect to wait up-to a minute for the model to be downloaded and loaded into memory. You can run `curl localhost:8000/health` to check if the server is ready or watch the logs with `docker logs -f <container_id>`.
-docker run --detach --gpus=all --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=$WHISPER__MODEL fedirz/faster-whisper-server:latest-cuda
 # or you can run it on a CPU
-# docker run --detach --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=$WHISPER__MODEL fedirz/faster-whisper-server:latest-cpu
 # `pv` is used to limit the rate at which the audio is streamed to the server. Audio is being streamed at a rate of 32kb/s(16000 sample rate * 16-bit sample / 8 bits per byte = 32000 bytes per second). This emulutes live audio input from a microphone: `ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le -`
 # shellcheck disable=SC2002

 export WHISPER__MODEL=Systran/faster-distil-whisper-large-v3 # or Systran/faster-whisper-tiny.en if you are running on a CPU for a faster inference.
+# Ensure you have `speaches` running. If this is your first time running it expect to wait up-to a minute for the model to be downloaded and loaded into memory. You can run `curl localhost:8000/health` to check if the server is ready or watch the logs with `docker logs -f <container_id>`.
+docker run --detach --gpus=all --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=$WHISPER__MODEL ghcr.io/speaches-ai/speaches:latest-cuda
 # or you can run it on a CPU
+# docker run --detach --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=$WHISPER__MODEL ghcr.io/speaches-ai/speaches:latest-cpu
 # `pv` is used to limit the rate at which the audio is streamed to the server. Audio is being streamed at a rate of 32kb/s(16000 sample rate * 16-bit sample / 8 bits per byte = 32000 bytes per second). This emulutes live audio input from a microphone: `ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le -`
 # shellcheck disable=SC2002

examples/youtube/script.sh CHANGED Viewed

@@ -5,10 +5,10 @@ set -e
 # NOTE: do not use any distil-* model other than the large ones as they don't work on long audio files for some reason.
 export WHISPER__MODEL=Systran/faster-distil-whisper-large-v3 # or Systran/faster-whisper-tiny.en if you are running on a CPU for a faster inference.
-# Ensure you have `faster-whisper-server` running. If this is your first time running it expect to wait up-to a minute for the model to be downloaded and loaded into memory. You can run `curl localhost:8000/health` to check if the server is ready or watch the logs with `docker logs -f <container_id>`.
-docker run --detach --gpus=all --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=$WHISPER__MODEL fedirz/faster-whisper-server:latest-cuda
 # or you can run it on a CPU
-# docker run --detach --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=$WHISPER__MODEL fedirz/faster-whisper-server:latest-cpu
 # Download the audio from a YouTube video. In this example I'm downloading "The Evolution of the Operating System" by Asionometry YouTube channel. I highly checking this channel out, the guy produces very high content. If you don't have `youtube-dl`, you'll have to install it. https://github.com/ytdl-org/youtube-dl
 youtube-dl --extract-audio --audio-format mp3 -o the-evolution-of-the-operating-system.mp3 'https://www.youtube.com/watch?v=1lG7lFLXBIs'

 # NOTE: do not use any distil-* model other than the large ones as they don't work on long audio files for some reason.
 export WHISPER__MODEL=Systran/faster-distil-whisper-large-v3 # or Systran/faster-whisper-tiny.en if you are running on a CPU for a faster inference.
+# Ensure you have `speaches` running. If this is your first time running it expect to wait up-to a minute for the model to be downloaded and loaded into memory. You can run `curl localhost:8000/health` to check if the server is ready or watch the logs with `docker logs -f <container_id>`.
+docker run --detach --gpus=all --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=$WHISPER__MODEL ghcr.io/speaches-ai/speaches:latest-cuda
 # or you can run it on a CPU
+# docker run --detach --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=$WHISPER__MODEL ghcr.io/speaches-ai/speaches:latest-cpu
 # Download the audio from a YouTube video. In this example I'm downloading "The Evolution of the Operating System" by Asionometry YouTube channel. I highly checking this channel out, the guy produces very high content. If you don't have `youtube-dl`, you'll have to install it. https://github.com/ytdl-org/youtube-dl
 youtube-dl --extract-audio --audio-format mp3 -o the-evolution-of-the-operating-system.mp3 'https://www.youtube.com/watch?v=1lG7lFLXBIs'

mkdocs.yml CHANGED Viewed

@@ -1,8 +1,8 @@
 # yaml-language-server: $schema=https://squidfunk.github.io/mkdocs-material/schema.json
 # https://www.mkdocs.org/user-guide/configuration/#configuration
-site_name: Faster Whisper Server Documentation
-site_url: https://fedirz.github.io/faster-whisper-server/
-repo_url: https://github.com/fedirz/faster-whisper-server/
 edit_uri: edit/master/docs/
 docs_dir: docs
 theme:

 # yaml-language-server: $schema=https://squidfunk.github.io/mkdocs-material/schema.json
 # https://www.mkdocs.org/user-guide/configuration/#configuration
+site_name: Speaches Documentation
+site_url: https://speaches-ai.github.io/speaches/
+repo_url: https://github.com/speaches-ai/speaches/
 edit_uri: edit/master/docs/
 docs_dir: docs
 theme:

pyproject.toml CHANGED Viewed

@@ -1,5 +1,5 @@
 [project]
-name = "faster-whisper-server"
 version = "0.1.0"
 requires-python = ">=3.12,<3.13"
 # https://packaging.python.org/en/latest/specifications/version-specifiers/#id5

 [project]
+name = "speaches"
 version = "0.1.0"
 requires-python = ">=3.12,<3.13"
 # https://packaging.python.org/en/latest/specifications/version-specifiers/#id5

src/{faster_whisper_server → speaches}/__init__.py RENAMED Viewed

File without changes

src/{faster_whisper_server → speaches}/api_models.py RENAMED Viewed

@@ -4,7 +4,7 @@ from typing import TYPE_CHECKING, Literal
 from pydantic import BaseModel, ConfigDict, Field
-from faster_whisper_server.text_utils import Transcription, canonicalize_word, segments_to_text
 if TYPE_CHECKING:
     from collections.abc import Iterable
@@ -23,7 +23,7 @@ class TranscriptionWord(BaseModel):
     def from_segments(cls, segments: Iterable[TranscriptionSegment]) -> list[TranscriptionWord]:
         words: list[TranscriptionWord] = []
         for segment in segments:
-            # NOTE: a temporary "fix" for https://github.com/fedirz/faster-whisper-server/issues/58.
             # TODO: properly address the issue
             assert (
                 segment.words is not None

 from pydantic import BaseModel, ConfigDict, Field
+from speaches.text_utils import Transcription, canonicalize_word, segments_to_text
 if TYPE_CHECKING:
     from collections.abc import Iterable
     def from_segments(cls, segments: Iterable[TranscriptionSegment]) -> list[TranscriptionWord]:
         words: list[TranscriptionWord] = []
         for segment in segments:
+            # NOTE: a temporary "fix" for https://github.com/speaches-ai/speaches/issues/58.
             # TODO: properly address the issue
             assert (
                 segment.words is not None

src/{faster_whisper_server → speaches}/asr.py RENAMED Viewed

@@ -5,13 +5,13 @@ import logging
 import time
 from typing import TYPE_CHECKING
-from faster_whisper_server.api_models import TranscriptionSegment, TranscriptionWord
-from faster_whisper_server.text_utils import Transcription
 if TYPE_CHECKING:
     from faster_whisper import transcribe
-    from faster_whisper_server.audio import Audio
 logger = logging.getLogger(__name__)

 import time
 from typing import TYPE_CHECKING
+from speaches.api_models import TranscriptionSegment, TranscriptionWord
+from speaches.text_utils import Transcription
 if TYPE_CHECKING:
     from faster_whisper import transcribe
+    from speaches.audio import Audio
 logger = logging.getLogger(__name__)

src/{faster_whisper_server → speaches}/audio.py RENAMED Viewed

@@ -7,7 +7,7 @@ from typing import TYPE_CHECKING, BinaryIO
 import numpy as np
 import soundfile as sf
-from faster_whisper_server.config import SAMPLES_PER_SECOND
 if TYPE_CHECKING:
     from collections.abc import AsyncGenerator

 import numpy as np
 import soundfile as sf
+from speaches.config import SAMPLES_PER_SECOND
 if TYPE_CHECKING:
     from collections.abc import AsyncGenerator

src/{faster_whisper_server → speaches}/config.py RENAMED Viewed

File without changes

src/{faster_whisper_server → speaches}/dependencies.py RENAMED Viewed

@@ -9,8 +9,8 @@ from openai import AsyncOpenAI
 from openai.resources.audio import AsyncSpeech, AsyncTranscriptions
 from openai.resources.chat.completions import AsyncCompletions
-from faster_whisper_server.config import Config
-from faster_whisper_server.model_manager import PiperModelManager, WhisperModelManager
 logger = logging.getLogger(__name__)
@@ -73,7 +73,7 @@ def get_speech_client() -> AsyncSpeech:
     config = get_config()
     if config.speech_base_url is None:
         # this might not work as expected if `speech_router` won't have shared state (access to the same `model_manager`) with the main FastAPI `app`. TODO: verify  # noqa: E501
-        from faster_whisper_server.routers.speech import (
             router as speech_router,
         )
@@ -94,7 +94,7 @@ def get_transcription_client() -> AsyncTranscriptions:
     config = get_config()
     if config.transcription_base_url is None:
         # this might not work as expected if `transcription_router` won't have shared state (access to the same `model_manager`) with the main FastAPI `app`. TODO: verify  # noqa: E501
-        from faster_whisper_server.routers.stt import (
             router as stt_router,
         )

 from openai.resources.audio import AsyncSpeech, AsyncTranscriptions
 from openai.resources.chat.completions import AsyncCompletions
+from speaches.config import Config
+from speaches.model_manager import PiperModelManager, WhisperModelManager
 logger = logging.getLogger(__name__)
     config = get_config()
     if config.speech_base_url is None:
         # this might not work as expected if `speech_router` won't have shared state (access to the same `model_manager`) with the main FastAPI `app`. TODO: verify  # noqa: E501
+        from speaches.routers.speech import (
             router as speech_router,
         )
     config = get_config()
     if config.transcription_base_url is None:
         # this might not work as expected if `transcription_router` won't have shared state (access to the same `model_manager`) with the main FastAPI `app`. TODO: verify  # noqa: E501
+        from speaches.routers.stt import (
             router as stt_router,
         )

src/{faster_whisper_server → speaches}/gradio_app.py RENAMED Viewed

@@ -7,8 +7,8 @@ import httpx
 from httpx_sse import aconnect_sse
 from openai import AsyncOpenAI
-from faster_whisper_server.config import Config, Task
-from faster_whisper_server.hf_utils import PiperModel
 TRANSCRIPTION_ENDPOINT = "/v1/audio/transcriptions"
 TRANSLATION_ENDPOINT = "/v1/audio/translations"
@@ -128,9 +128,9 @@ def create_gradio_demo(config: Config) -> gr.Blocks:  # noqa: C901, PLR0915
             file.write(audio_bytes)
         return file_path
-    with gr.Blocks(title="faster-whisper-server Playground") as demo:
         gr.Markdown(
-            "### Consider supporting the project by starring the [repository on GitHub](https://github.com/fedirz/faster-whisper-server)."
         )
         with gr.Tab(label="Transcribe/Translate"):
             audio = gr.Audio(type="filepath")
@@ -157,7 +157,7 @@ def create_gradio_demo(config: Config) -> gr.Blocks:  # noqa: C901, PLR0915
         with gr.Tab(label="Speech Generation"):
             if platform.machine() != "x86_64":
-                from faster_whisper_server.routers.speech import (
                     DEFAULT_VOICE,
                     MAX_SAMPLE_RATE,
                     MIN_SAMPLE_RATE,

 from httpx_sse import aconnect_sse
 from openai import AsyncOpenAI
+from speaches.config import Config, Task
+from speaches.hf_utils import PiperModel
 TRANSCRIPTION_ENDPOINT = "/v1/audio/transcriptions"
 TRANSLATION_ENDPOINT = "/v1/audio/translations"
             file.write(audio_bytes)
         return file_path
+    with gr.Blocks(title="Speaches Playground") as demo:
         gr.Markdown(
+            "### Consider supporting the project by starring the [repository on GitHub](https://github.com/speaches-ai/speaches)."
         )
         with gr.Tab(label="Transcribe/Translate"):
             audio = gr.Audio(type="filepath")
         with gr.Tab(label="Speech Generation"):
             if platform.machine() != "x86_64":
+                from speaches.routers.speech import (
                     DEFAULT_VOICE,
                     MAX_SAMPLE_RATE,
                     MIN_SAMPLE_RATE,

src/{faster_whisper_server → speaches}/hf_utils.py RENAMED Viewed

@@ -10,7 +10,7 @@ import huggingface_hub
 from huggingface_hub.constants import HF_HUB_CACHE
 from pydantic import BaseModel, Field, computed_field
-from faster_whisper_server.api_models import Model
 logger = logging.getLogger(__name__)

 from huggingface_hub.constants import HF_HUB_CACHE
 from pydantic import BaseModel, Field, computed_field
+from speaches.api_models import Model
 logger = logging.getLogger(__name__)

src/{faster_whisper_server → speaches}/logger.py RENAMED Viewed

File without changes

src/{faster_whisper_server → speaches}/main.py RENAMED Viewed

@@ -10,15 +10,15 @@ from fastapi import (
 )
 from fastapi.middleware.cors import CORSMiddleware
-from faster_whisper_server.dependencies import ApiKeyDependency, get_config, get_model_manager
-from faster_whisper_server.logger import setup_logger
-from faster_whisper_server.routers.misc import (
     router as misc_router,
 )
-from faster_whisper_server.routers.models import (
     router as models_router,
 )
-from faster_whisper_server.routers.stt import (
     router as stt_router,
 )
@@ -47,7 +47,7 @@ def create_app() -> FastAPI:
     logger.debug(f"Config: {config}")
     if platform.machine() == "x86_64":
-        from faster_whisper_server.routers.speech import (
             router as speech_router,
         )
     else:
@@ -86,7 +86,7 @@ def create_app() -> FastAPI:
     if config.enable_ui:
         import gradio as gr
-        from faster_whisper_server.gradio_app import create_gradio_demo
         app = gr.mount_gradio_app(app, create_gradio_demo(config), path="/")

 )
 from fastapi.middleware.cors import CORSMiddleware
+from speaches.dependencies import ApiKeyDependency, get_config, get_model_manager
+from speaches.logger import setup_logger
+from speaches.routers.misc import (
     router as misc_router,
 )
+from speaches.routers.models import (
     router as models_router,
 )
+from speaches.routers.stt import (
     router as stt_router,
 )
     logger.debug(f"Config: {config}")
     if platform.machine() == "x86_64":
+        from speaches.routers.speech import (
             router as speech_router,
         )
     else:
     if config.enable_ui:
         import gradio as gr
+        from speaches.gradio_app import create_gradio_demo
         app = gr.mount_gradio_app(app, create_gradio_demo(config), path="/")

src/{faster_whisper_server → speaches}/model_manager.py RENAMED Viewed

@@ -9,14 +9,14 @@ from typing import TYPE_CHECKING
 from faster_whisper import WhisperModel
-from faster_whisper_server.hf_utils import get_piper_voice_model_file
 if TYPE_CHECKING:
     from collections.abc import Callable
     from piper.voice import PiperVoice
-    from faster_whisper_server.config import (
         WhisperConfig,
     )

 from faster_whisper import WhisperModel
+from speaches.hf_utils import get_piper_voice_model_file
 if TYPE_CHECKING:
     from collections.abc import Callable
     from piper.voice import PiperVoice
+    from speaches.config import (
         WhisperConfig,
     )

src/{faster_whisper_server → speaches}/routers/__init__.py RENAMED Viewed

File without changes

src/{faster_whisper_server → speaches}/routers/misc.py RENAMED Viewed

@@ -7,8 +7,8 @@ from fastapi import (
 import huggingface_hub
 from huggingface_hub.hf_api import RepositoryNotFoundError
-from faster_whisper_server import hf_utils
-from faster_whisper_server.dependencies import ModelManagerDependency  # noqa: TCH001
 router = APIRouter()

 import huggingface_hub
 from huggingface_hub.hf_api import RepositoryNotFoundError
+from speaches import hf_utils
+from speaches.dependencies import ModelManagerDependency  # noqa: TC001
 router = APIRouter()

src/{faster_whisper_server → speaches}/routers/models.py RENAMED Viewed

@@ -9,11 +9,11 @@ from fastapi import (
 )
 import huggingface_hub
-from faster_whisper_server.api_models import (
     ListModelsResponse,
     Model,
 )
-from faster_whisper_server.hf_utils import list_whisper_models
 if TYPE_CHECKING:
     from huggingface_hub.hf_api import ModelInfo

 )
 import huggingface_hub
+from speaches.api_models import (
     ListModelsResponse,
     Model,
 )
+from speaches.hf_utils import list_whisper_models
 if TYPE_CHECKING:
     from huggingface_hub.hf_api import ModelInfo

src/{faster_whisper_server → speaches}/routers/speech.py RENAMED Viewed

@@ -11,8 +11,8 @@ from piper.voice import PiperVoice
 from pydantic import BaseModel, BeforeValidator, Field, ValidationError, model_validator
 import soundfile as sf
-from faster_whisper_server.dependencies import PiperModelManagerDependency
-from faster_whisper_server.hf_utils import (
     PiperModel,
     list_piper_models,
     read_piper_voices_config,

 from pydantic import BaseModel, BeforeValidator, Field, ValidationError, model_validator
 import soundfile as sf
+from speaches.dependencies import PiperModelManagerDependency
+from speaches.hf_utils import (
     PiperModel,
     list_piper_models,
     read_piper_voices_config,

src/{faster_whisper_server → speaches}/routers/stt.py RENAMED Viewed

@@ -27,7 +27,7 @@ from numpy import float32
 from numpy.typing import NDArray
 from pydantic import AfterValidator, Field
-from faster_whisper_server.api_models import (
     DEFAULT_TIMESTAMP_GRANULARITIES,
     TIMESTAMP_GRANULARITIES_COMBINATIONS,
     CreateTranscriptionResponseJson,
@@ -35,17 +35,17 @@ from faster_whisper_server.api_models import (
     TimestampGranularities,
     TranscriptionSegment,
 )
-from faster_whisper_server.asr import FasterWhisperASR
-from faster_whisper_server.audio import AudioStream, audio_samples_from_file
-from faster_whisper_server.config import (
     SAMPLES_PER_SECOND,
     Language,
     ResponseFormat,
     Task,
 )
-from faster_whisper_server.dependencies import ConfigDependency, ModelManagerDependency, get_config
-from faster_whisper_server.text_utils import segments_to_srt, segments_to_text, segments_to_vtt
-from faster_whisper_server.transcriber import audio_transcriber
 if TYPE_CHECKING:
     from collections.abc import Generator, Iterable
@@ -77,7 +77,7 @@ def audio_file_dependency(
         ) from e
     except Exception as e:
         logger.exception(
-            "Failed to decode audio. This is likely a bug. Please create an issue at https://github.com/fedirz/faster-whisper-server/issues/new."
         )
         raise HTTPException(status_code=500, detail="Failed to decode audio.") from e
     else:

 from numpy.typing import NDArray
 from pydantic import AfterValidator, Field
+from speaches.api_models import (
     DEFAULT_TIMESTAMP_GRANULARITIES,
     TIMESTAMP_GRANULARITIES_COMBINATIONS,
     CreateTranscriptionResponseJson,
     TimestampGranularities,
     TranscriptionSegment,
 )
+from speaches.asr import FasterWhisperASR
+from speaches.audio import AudioStream, audio_samples_from_file
+from speaches.config import (
     SAMPLES_PER_SECOND,
     Language,
     ResponseFormat,
     Task,
 )
+from speaches.dependencies import ConfigDependency, ModelManagerDependency, get_config
+from speaches.text_utils import segments_to_srt, segments_to_text, segments_to_vtt
+from speaches.transcriber import audio_transcriber
 if TYPE_CHECKING:
     from collections.abc import Generator, Iterable
         ) from e
     except Exception as e:
         logger.exception(
+            "Failed to decode audio. This is likely a bug. Please create an issue at https://github.com/speaches-ai/speaches/issues/new."
         )
         raise HTTPException(status_code=500, detail="Failed to decode audio.") from e
     else:

src/{faster_whisper_server → speaches}/text_utils.py RENAMED Viewed

@@ -6,7 +6,7 @@ from typing import TYPE_CHECKING
 if TYPE_CHECKING:
     from collections.abc import Iterable
-    from faster_whisper_server.api_models import TranscriptionSegment, TranscriptionWord
 class Transcription:
@@ -38,7 +38,7 @@ class Transcription:
         self.words.extend(words)
     def _ensure_no_word_overlap(self, words: list[TranscriptionWord]) -> None:
-        from faster_whisper_server.dependencies import get_config  # HACK: avoid circular import
         config = get_config()  # HACK
         if len(self.words) > 0 and len(words) > 0:

 if TYPE_CHECKING:
     from collections.abc import Iterable
+    from speaches.api_models import TranscriptionSegment, TranscriptionWord
 class Transcription:
         self.words.extend(words)
     def _ensure_no_word_overlap(self, words: list[TranscriptionWord]) -> None:
+        from speaches.dependencies import get_config  # HACK: avoid circular import
         config = get_config()  # HACK
         if len(self.words) > 0 and len(words) > 0:

src/{faster_whisper_server → speaches}/text_utils_test.py RENAMED Viewed

@@ -1,5 +1,5 @@
-from faster_whisper_server.api_models import TranscriptionWord
-from faster_whisper_server.text_utils import (
     canonicalize_word,
     common_prefix,
     is_eos,

+from speaches.api_models import TranscriptionWord
+from speaches.text_utils import (
     canonicalize_word,
     common_prefix,
     is_eos,

src/{faster_whisper_server → speaches}/transcriber.py RENAMED Viewed

@@ -3,14 +3,14 @@ from __future__ import annotations
 import logging
 from typing import TYPE_CHECKING
-from faster_whisper_server.audio import Audio, AudioStream
-from faster_whisper_server.text_utils import Transcription, common_prefix, to_full_sentences, word_to_text
 if TYPE_CHECKING:
     from collections.abc import AsyncGenerator
-    from faster_whisper_server.api_models import TranscriptionWord
-    from faster_whisper_server.asr import FasterWhisperASR
 logger = logging.getLogger(__name__)

 import logging
 from typing import TYPE_CHECKING
+from speaches.audio import Audio, AudioStream
+from speaches.text_utils import Transcription, common_prefix, to_full_sentences, word_to_text
 if TYPE_CHECKING:
     from collections.abc import AsyncGenerator
+    from speaches.api_models import TranscriptionWord
+    from speaches.asr import FasterWhisperASR
 logger = logging.getLogger(__name__)

tests/api_timestamp_granularities_test.py CHANGED Viewed

@@ -5,7 +5,7 @@ from pathlib import Path
 from openai import AsyncOpenAI
 import pytest
-from faster_whisper_server.api_models import TIMESTAMP_GRANULARITIES_COMBINATIONS, TimestampGranularities
 @pytest.mark.asyncio

 from openai import AsyncOpenAI
 import pytest
+from speaches.api_models import TIMESTAMP_GRANULARITIES_COMBINATIONS, TimestampGranularities
 @pytest.mark.asyncio

tests/conftest.py CHANGED Viewed

@@ -12,9 +12,9 @@ import pytest
 import pytest_asyncio
 from pytest_mock import MockerFixture
-from faster_whisper_server.config import Config, WhisperConfig
-from faster_whisper_server.dependencies import get_config
-from faster_whisper_server.main import create_app
 DISABLE_LOGGERS = ["multipart.multipart", "faster_whisper"]
 OPENAI_BASE_URL = "https://api.openai.com/v1"
@@ -54,11 +54,11 @@ async def aclient_factory(mocker: MockerFixture) -> AclientFactory:
     @asynccontextmanager
     async def inner(config: Config = DEFAULT_CONFIG) -> AsyncGenerator[AsyncClient, None]:
         # NOTE: all calls to `get_config` should be patched. One way to test that this works is to update the original `get_config` to raise an exception and see if the tests fail  # noqa: E501
-        mocker.patch("faster_whisper_server.dependencies.get_config", return_value=config)
-        mocker.patch("faster_whisper_server.main.get_config", return_value=config)
         # NOTE: I couldn't get the following to work but it shouldn't matter
         # mocker.patch(
-        #     "faster_whisper_server.text_utils.Transcription._ensure_no_word_overlap.get_config", return_value=config
         # )
         app = create_app()

 import pytest_asyncio
 from pytest_mock import MockerFixture
+from speaches.config import Config, WhisperConfig
+from speaches.dependencies import get_config
+from speaches.main import create_app
 DISABLE_LOGGERS = ["multipart.multipart", "faster_whisper"]
 OPENAI_BASE_URL = "https://api.openai.com/v1"
     @asynccontextmanager
     async def inner(config: Config = DEFAULT_CONFIG) -> AsyncGenerator[AsyncClient, None]:
         # NOTE: all calls to `get_config` should be patched. One way to test that this works is to update the original `get_config` to raise an exception and see if the tests fail  # noqa: E501
+        mocker.patch("speaches.dependencies.get_config", return_value=config)
+        mocker.patch("speaches.main.get_config", return_value=config)
         # NOTE: I couldn't get the following to work but it shouldn't matter
         # mocker.patch(
+        #     "speaches.text_utils.Transcription._ensure_no_word_overlap.get_config", return_value=config
         # )
         app = create_app()

tests/model_manager_test.py CHANGED Viewed

@@ -3,7 +3,7 @@ import asyncio
 import anyio
 import pytest
-from faster_whisper_server.config import Config, WhisperConfig
 from tests.conftest import DEFAULT_WHISPER_MODEL, AclientFactory
 MODEL = DEFAULT_WHISPER_MODEL  # just to make the test more readable

 import anyio
 import pytest
+from speaches.config import Config, WhisperConfig
 from tests.conftest import DEFAULT_WHISPER_MODEL, AclientFactory
 MODEL = DEFAULT_WHISPER_MODEL  # just to make the test more readable

tests/openai_timestamp_granularities_test.py CHANGED Viewed

@@ -5,7 +5,7 @@ from pathlib import Path
 from openai import AsyncOpenAI, BadRequestError
 import pytest
-from faster_whisper_server.api_models import TIMESTAMP_GRANULARITIES_COMBINATIONS, TimestampGranularities
 @pytest.mark.asyncio

 from openai import AsyncOpenAI, BadRequestError
 import pytest
+from speaches.api_models import TIMESTAMP_GRANULARITIES_COMBINATIONS, TimestampGranularities
 @pytest.mark.asyncio

tests/speech_test.py CHANGED Viewed

@@ -9,7 +9,7 @@ platform_machine = platform.machine()
 if platform_machine != "x86_64":
     pytest.skip("Only supported on x86_64", allow_module_level=True)
-from faster_whisper_server.routers.speech import (  # noqa: E402
     DEFAULT_MODEL,
     DEFAULT_RESPONSE_FORMAT,
     DEFAULT_VOICE,

 if platform_machine != "x86_64":
     pytest.skip("Only supported on x86_64", allow_module_level=True)
+from speaches.routers.speech import (  # noqa: E402
     DEFAULT_MODEL,
     DEFAULT_RESPONSE_FORMAT,
     DEFAULT_VOICE,

tests/sse_test.py CHANGED Viewed

@@ -9,7 +9,7 @@ import srt
 import webvtt
 import webvtt.vtt
-from faster_whisper_server.api_models import (
     CreateTranscriptionResponseJson,
     CreateTranscriptionResponseVerboseJson,
 )

 import webvtt
 import webvtt.vtt
+from speaches.api_models import (
     CreateTranscriptionResponseJson,
     CreateTranscriptionResponseVerboseJson,
 )

uv.lock CHANGED Viewed

@@ -266,115 +266,6 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/7b/03/ab118cb743dcf671da01ad0cfd7564465dda115db32976fdc95e21ce8feb/faster_whisper-1.1.0-py3-none-any.whl", hash = "sha256:0f2d025676bbff1e46c4108b6f9a82578d6e33826c174af2990e45b33fab6182", size = 1118168 },
 ]
-[[package]]
-name = "faster-whisper-server"
-version = "0.1.0"
-source = { editable = "." }
-dependencies = [
-    { name = "ctranslate2" },
-    { name = "fastapi" },
-    { name = "faster-whisper" },
-    { name = "huggingface-hub", extra = ["hf-transfer"] },
-    { name = "numpy" },
-    { name = "piper-phonemize", marker = "platform_machine == 'x86_64'" },
-    { name = "piper-tts", marker = "platform_machine == 'x86_64'" },
-    { name = "pydantic" },
-    { name = "pydantic-settings" },
-    { name = "python-multipart" },
-    { name = "sounddevice" },
-    { name = "soundfile" },
-    { name = "uvicorn" },
-]
-[package.optional-dependencies]
-client = [
-    { name = "keyboard" },
-]
-dev = [
-    { name = "anyio" },
-    { name = "basedpyright" },
-    { name = "mdx-truly-sane-lists" },
-    { name = "mkdocs-material" },
-    { name = "mkdocs-render-swagger-plugin" },
-    { name = "mkdocstrings", extra = ["python"] },
-    { name = "pre-commit" },
-    { name = "pytest" },
-    { name = "pytest-antilru" },
-    { name = "pytest-asyncio" },
-    { name = "pytest-mock" },
-    { name = "pytest-xdist" },
-    { name = "ruff" },
-    { name = "srt" },
-    { name = "webvtt-py" },
-]
-opentelemetry = [
-    { name = "opentelemetry-distro" },
-    { name = "opentelemetry-exporter-otlp" },
-    { name = "opentelemetry-instrumentation-asyncio" },
-    { name = "opentelemetry-instrumentation-fastapi" },
-    { name = "opentelemetry-instrumentation-grpc" },
-    { name = "opentelemetry-instrumentation-httpx" },
-    { name = "opentelemetry-instrumentation-logging" },
-    { name = "opentelemetry-instrumentation-requests" },
-    { name = "opentelemetry-instrumentation-threading" },
-    { name = "opentelemetry-instrumentation-urllib" },
-    { name = "opentelemetry-instrumentation-urllib3" },
-]
-ui = [
-    { name = "gradio" },
-    { name = "httpx" },
-    { name = "httpx-sse" },
-    { name = "openai" },
-]
-[package.metadata]
-requires-dist = [
-    { name = "anyio", marker = "extra == 'dev'", specifier = ">=4.4.0" },
-    { name = "basedpyright", marker = "extra == 'dev'", specifier = ">=1.18.0" },
-    { name = "ctranslate2", specifier = ">=4.5.0" },
-    { name = "fastapi", specifier = ">=0.115.0" },
-    { name = "faster-whisper", specifier = ">=1.1.0" },
-    { name = "gradio", marker = "extra == 'ui'", specifier = ">=5.0.2" },
-    { name = "httpx", marker = "extra == 'ui'", specifier = ">=0.27.2" },
-    { name = "httpx-sse", marker = "extra == 'ui'", specifier = ">=0.4.0" },
-    { name = "huggingface-hub", extras = ["hf-transfer"], specifier = ">=0.25.1" },
-    { name = "keyboard", marker = "extra == 'client'", specifier = ">=0.13.5" },
-    { name = "mdx-truly-sane-lists", marker = "extra == 'dev'", specifier = ">=1.3" },
-    { name = "mkdocs-material", marker = "extra == 'dev'", specifier = ">=9.5.39" },
-    { name = "mkdocs-render-swagger-plugin", marker = "extra == 'dev'", specifier = ">=0.1.2" },
-    { name = "mkdocstrings", extras = ["python"], marker = "extra == 'dev'", specifier = ">=0.26.1" },
-    { name = "numpy", specifier = ">=2.1.1" },
-    { name = "openai", marker = "extra == 'ui'", specifier = ">=1.48.0" },
-    { name = "opentelemetry-distro", marker = "extra == 'opentelemetry'", specifier = ">=0.48b0" },
-    { name = "opentelemetry-exporter-otlp", marker = "extra == 'opentelemetry'", specifier = ">=1.27.0" },
-    { name = "opentelemetry-instrumentation-asyncio", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
-    { name = "opentelemetry-instrumentation-fastapi", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
-    { name = "opentelemetry-instrumentation-grpc", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
-    { name = "opentelemetry-instrumentation-httpx", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
-    { name = "opentelemetry-instrumentation-logging", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
-    { name = "opentelemetry-instrumentation-requests", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
-    { name = "opentelemetry-instrumentation-threading", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
-    { name = "opentelemetry-instrumentation-urllib", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
-    { name = "opentelemetry-instrumentation-urllib3", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
-    { name = "piper-phonemize", marker = "platform_machine == 'x86_64'", url = "https://github.com/fedirz/piper-phonemize/raw/refs/heads/master/dist/piper_phonemize-1.2.0-cp312-cp312-manylinux_2_28_x86_64.whl" },
-    { name = "piper-tts", marker = "platform_machine == 'x86_64'", specifier = ">=1.2.0" },
-    { name = "pre-commit", marker = "extra == 'dev'", specifier = ">=4.0.1" },
-    { name = "pydantic", specifier = ">=2.9.0" },
-    { name = "pydantic-settings", specifier = ">=2.5.2" },
-    { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.3.3" },
-    { name = "pytest-antilru", marker = "extra == 'dev'", specifier = ">=2.0.0" },
-    { name = "pytest-asyncio", marker = "extra == 'dev'", specifier = ">=0.24.0" },
-    { name = "pytest-mock", marker = "extra == 'dev'", specifier = ">=3.14.0" },
-    { name = "pytest-xdist", marker = "extra == 'dev'", specifier = ">=3.6.1" },
-    { name = "python-multipart", specifier = ">=0.0.10" },
-    { name = "ruff", marker = "extra == 'dev'", specifier = ">=0.7.1" },
-    { name = "sounddevice", specifier = ">=0.5.1" },
-    { name = "soundfile", specifier = ">=0.12.1" },
-    { name = "srt", marker = "extra == 'dev'", specifier = ">=3.5.3" },
-    { name = "uvicorn", specifier = ">=0.30.6" },
-    { name = "webvtt-py", marker = "extra == 'dev'", specifier = ">=0.5.1" },
-]
 [[package]]
 name = "ffmpy"
 version = "0.4.0"
@@ -4241,6 +4132,115 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/50/ff/26a4ee48d0b66625a4e4028a055b9f25bc9d7c7b2d17d21a45137621a50d/soundfile-0.12.1-py2.py3-none-win_amd64.whl", hash = "sha256:0d86924c00b62552b650ddd28af426e3ff2d4dc2e9047dae5b3d8452e0a49a77", size = 1009109 },
 ]
 [[package]]
 name = "srt"
 version = "3.5.3"

     { url = "https://files.pythonhosted.org/packages/7b/03/ab118cb743dcf671da01ad0cfd7564465dda115db32976fdc95e21ce8feb/faster_whisper-1.1.0-py3-none-any.whl", hash = "sha256:0f2d025676bbff1e46c4108b6f9a82578d6e33826c174af2990e45b33fab6182", size = 1118168 },
 ]
 [[package]]
 name = "ffmpy"
 version = "0.4.0"
     { url = "https://files.pythonhosted.org/packages/50/ff/26a4ee48d0b66625a4e4028a055b9f25bc9d7c7b2d17d21a45137621a50d/soundfile-0.12.1-py2.py3-none-win_amd64.whl", hash = "sha256:0d86924c00b62552b650ddd28af426e3ff2d4dc2e9047dae5b3d8452e0a49a77", size = 1009109 },
 ]
+[[package]]
+name = "speaches"
+version = "0.1.0"
+source = { editable = "." }
+dependencies = [
+    { name = "ctranslate2" },
+    { name = "fastapi" },
+    { name = "faster-whisper" },
+    { name = "huggingface-hub", extra = ["hf-transfer"] },
+    { name = "numpy" },
+    { name = "piper-phonemize", marker = "platform_machine == 'x86_64'" },
+    { name = "piper-tts", marker = "platform_machine == 'x86_64'" },
+    { name = "pydantic" },
+    { name = "pydantic-settings" },
+    { name = "python-multipart" },
+    { name = "sounddevice" },
+    { name = "soundfile" },
+    { name = "uvicorn" },
+]
+[package.optional-dependencies]
+client = [
+    { name = "keyboard" },
+]
+dev = [
+    { name = "anyio" },
+    { name = "basedpyright" },
+    { name = "mdx-truly-sane-lists" },
+    { name = "mkdocs-material" },
+    { name = "mkdocs-render-swagger-plugin" },
+    { name = "mkdocstrings", extra = ["python"] },
+    { name = "pre-commit" },
+    { name = "pytest" },
+    { name = "pytest-antilru" },
+    { name = "pytest-asyncio" },
+    { name = "pytest-mock" },
+    { name = "pytest-xdist" },
+    { name = "ruff" },
+    { name = "srt" },
+    { name = "webvtt-py" },
+]
+opentelemetry = [
+    { name = "opentelemetry-distro" },
+    { name = "opentelemetry-exporter-otlp" },
+    { name = "opentelemetry-instrumentation-asyncio" },
+    { name = "opentelemetry-instrumentation-fastapi" },
+    { name = "opentelemetry-instrumentation-grpc" },
+    { name = "opentelemetry-instrumentation-httpx" },
+    { name = "opentelemetry-instrumentation-logging" },
+    { name = "opentelemetry-instrumentation-requests" },
+    { name = "opentelemetry-instrumentation-threading" },
+    { name = "opentelemetry-instrumentation-urllib" },
+    { name = "opentelemetry-instrumentation-urllib3" },
+]
+ui = [
+    { name = "gradio" },
+    { name = "httpx" },
+    { name = "httpx-sse" },
+    { name = "openai" },
+]
+[package.metadata]
+requires-dist = [
+    { name = "anyio", marker = "extra == 'dev'", specifier = ">=4.4.0" },
+    { name = "basedpyright", marker = "extra == 'dev'", specifier = ">=1.18.0" },
+    { name = "ctranslate2", specifier = ">=4.5.0" },
+    { name = "fastapi", specifier = ">=0.115.0" },
+    { name = "faster-whisper", specifier = ">=1.1.0" },
+    { name = "gradio", marker = "extra == 'ui'", specifier = ">=5.0.2" },
+    { name = "httpx", marker = "extra == 'ui'", specifier = ">=0.27.2" },
+    { name = "httpx-sse", marker = "extra == 'ui'", specifier = ">=0.4.0" },
+    { name = "huggingface-hub", extras = ["hf-transfer"], specifier = ">=0.25.1" },
+    { name = "keyboard", marker = "extra == 'client'", specifier = ">=0.13.5" },
+    { name = "mdx-truly-sane-lists", marker = "extra == 'dev'", specifier = ">=1.3" },
+    { name = "mkdocs-material", marker = "extra == 'dev'", specifier = ">=9.5.39" },
+    { name = "mkdocs-render-swagger-plugin", marker = "extra == 'dev'", specifier = ">=0.1.2" },
+    { name = "mkdocstrings", extras = ["python"], marker = "extra == 'dev'", specifier = ">=0.26.1" },
+    { name = "numpy", specifier = ">=2.1.1" },
+    { name = "openai", marker = "extra == 'ui'", specifier = ">=1.48.0" },
+    { name = "opentelemetry-distro", marker = "extra == 'opentelemetry'", specifier = ">=0.48b0" },
+    { name = "opentelemetry-exporter-otlp", marker = "extra == 'opentelemetry'", specifier = ">=1.27.0" },
+    { name = "opentelemetry-instrumentation-asyncio", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
+    { name = "opentelemetry-instrumentation-fastapi", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
+    { name = "opentelemetry-instrumentation-grpc", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
+    { name = "opentelemetry-instrumentation-httpx", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
+    { name = "opentelemetry-instrumentation-logging", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
+    { name = "opentelemetry-instrumentation-requests", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
+    { name = "opentelemetry-instrumentation-threading", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
+    { name = "opentelemetry-instrumentation-urllib", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
+    { name = "opentelemetry-instrumentation-urllib3", marker = "extra == 'opentelemetry'", specifier = "==0.48b0" },
+    { name = "piper-phonemize", marker = "platform_machine == 'x86_64'", url = "https://github.com/fedirz/piper-phonemize/raw/refs/heads/master/dist/piper_phonemize-1.2.0-cp312-cp312-manylinux_2_28_x86_64.whl" },
+    { name = "piper-tts", marker = "platform_machine == 'x86_64'", specifier = ">=1.2.0" },
+    { name = "pre-commit", marker = "extra == 'dev'", specifier = ">=4.0.1" },
+    { name = "pydantic", specifier = ">=2.9.0" },
+    { name = "pydantic-settings", specifier = ">=2.5.2" },
+    { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.3.3" },
+    { name = "pytest-antilru", marker = "extra == 'dev'", specifier = ">=2.0.0" },
+    { name = "pytest-asyncio", marker = "extra == 'dev'", specifier = ">=0.24.0" },
+    { name = "pytest-mock", marker = "extra == 'dev'", specifier = ">=3.14.0" },
+    { name = "pytest-xdist", marker = "extra == 'dev'", specifier = ">=3.6.1" },
+    { name = "python-multipart", specifier = ">=0.0.10" },
+    { name = "ruff", marker = "extra == 'dev'", specifier = ">=0.7.1" },
+    { name = "sounddevice", specifier = ">=0.5.1" },
+    { name = "soundfile", specifier = ">=0.12.1" },
+    { name = "srt", marker = "extra == 'dev'", specifier = ">=3.5.3" },
+    { name = "uvicorn", specifier = ">=0.30.6" },
+    { name = "webvtt-py", marker = "extra == 'dev'", specifier = ">=0.5.1" },
+]
 [[package]]
 name = "srt"
 version = "3.5.3"