philschmid
/

pyannote-speaker-diarization-endpoint

Voice Activity Detection

pyannote-audio-pipeline

speaker-diarization

speaker-change-detection

overlapped-speech-detection

Inference Endpoints

Model card Files Files and versions Community

philschmid commited on Oct 7, 2022

Commit

0c82ac9

·

1 Parent(s): 9063905

Update README.md

Files changed (1) hide show

README.md +79 -0

README.md CHANGED Viewed

@@ -1,3 +1,82 @@
 ---
 license: mit
 ---

 ---
 license: mit
+tags:
+- pyannote
+- pyannote-audio
+- pyannote-audio-pipeline
+- audio
+- voice
+- speech
+- speaker
+- speaker-diarization
+- speaker-change-detection
+- endpoints-template
+library_name: generic
 ---
+# 🎹 Speaker diarization with Pyannote and Inference Endpoints
+This repository implements a custom `handler` for `speaker-diarization` for 🤗 Inference Endpoints using Pyannote. The code for the customized pipeline is in the [handler.py](https://huggingface.co/philschmid/pyannote-speaker-diarization-endpoint/blob/main/handler.py).
+There is also a [notebook](https://huggingface.co/philschmid/pyannote-speaker-diarization-endpoint/blob/main/create_handler.ipynb) included, on how to create the `handler.py`
+###  Request
+The endpoint expects a binary audio file. Below are a cURL and a Python example using the `requests` library.
+**curl**
+```bash
+# load audio file
+wget https://cdn-media.huggingface.co/speech_samples/sample1.flac
+# run request
+curl --request POST \
+  --url https://{ENDPOINT}/ \
+  --header 'Content-Type: audio/x-wav' \
+  --header 'Authorization: Bearer {HF_TOKEN}' \
+  --data-binary '@sample.wav'
+```
+**Python**
+```python
+import json
+from typing import List
+import requests as r
+import base64
+import mimetypes
+ENDPOINT_URL=""
+HF_TOKEN=""
+def predict(path_to_audio:str=None):
+    # read audio file
+    with open(path_to_audio, "rb") as i:
+      b = i.read()
+    # get mimetype
+    content_type= mimetypes.guess_type(path_to_audio)[0]
+    headers= {
+        "Authorization": f"Bearer {HF_TOKEN}",
+        "Content-Type": content_type
+    }
+    response = r.post(ENDPOINT_URL, headers=headers, data=b)
+    return response.json()
+prediction = predict(path_to_audio="sample.wav")
+prediction
+```
+expected output
+```json
+{"diarization": [
+{"label": "SPEAKER_01", "start": "0.4978125", "stop": "1.3921875"},
+{"label": "SPEAKER_01", "start": "1.8984375", "stop": "2.7590624999999998"},
+{"label": "SPEAKER_02", "start": "2.9953125", "stop": "3.5015625000000004"},
+{"label": "SPEAKER_01", "start": "3.5690625000000002", "stop": "4.311562500000001"}
+...
+```