File size: 2,179 Bytes

5d88cba
 
0c82ac9
 
 
 
 
 
 
 
 
 
 
 
5d88cba
e20b058
 
 
0c82ac9

---
license: mit
tags:
- pyannote
- pyannote-audio
- pyannote-audio-pipeline
- audio
- voice
- speech
- speaker
- speaker-diarization
- speaker-change-detection
- endpoints-template
library_name: generic
---

# WIP: Depends on [#1098](https://github.com/pyannote/pyannote-audio/pull/1098)

# 🎹 Speaker diarization with Pyannote and Inference Endpoints


This repository implements a custom `handler` for `speaker-diarization` for 🤗 Inference Endpoints using Pyannote. The code for the customized pipeline is in the [handler.py](https://huggingface.co/philschmid/pyannote-speaker-diarization-endpoint/blob/main/handler.py).

There is also a [notebook](https://huggingface.co/philschmid/pyannote-speaker-diarization-endpoint/blob/main/create_handler.ipynb) included, on how to create the `handler.py`

###  Request 

The endpoint expects a binary audio file. Below are a cURL and a Python example using the `requests` library.

**curl**

```bash
# load audio file
wget https://cdn-media.huggingface.co/speech_samples/sample1.flac

# run request
curl --request POST \
  --url https://{ENDPOINT}/ \
  --header 'Content-Type: audio/x-wav' \
  --header 'Authorization: Bearer {HF_TOKEN}' \
  --data-binary '@sample.wav'
```

**Python**

```python
import json
from typing import List
import requests as r
import base64
import mimetypes

ENDPOINT_URL=""
HF_TOKEN=""

def predict(path_to_audio:str=None):
    # read audio file
    with open(path_to_audio, "rb") as i:
      b = i.read()
    # get mimetype
    content_type= mimetypes.guess_type(path_to_audio)[0]

    headers= {
        "Authorization": f"Bearer {HF_TOKEN}",
        "Content-Type": content_type
    }
    response = r.post(ENDPOINT_URL, headers=headers, data=b)
    return response.json()

prediction = predict(path_to_audio="sample.wav")

prediction

```
expected output

```json
{"diarization": [
{"label": "SPEAKER_01", "start": "0.4978125", "stop": "1.3921875"},
{"label": "SPEAKER_01", "start": "1.8984375", "stop": "2.7590624999999998"},
{"label": "SPEAKER_02", "start": "2.9953125", "stop": "3.5015625000000004"},
{"label": "SPEAKER_01", "start": "3.5690625000000002", "stop": "4.311562500000001"}
...
```