philschmid HF staff commited on
Commit
0c82ac9
1 Parent(s): 9063905

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md CHANGED
@@ -1,3 +1,82 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ tags:
4
+ - pyannote
5
+ - pyannote-audio
6
+ - pyannote-audio-pipeline
7
+ - audio
8
+ - voice
9
+ - speech
10
+ - speaker
11
+ - speaker-diarization
12
+ - speaker-change-detection
13
+ - endpoints-template
14
+ library_name: generic
15
  ---
16
+ # 🎹 Speaker diarization with Pyannote and Inference Endpoints
17
+
18
+
19
+ This repository implements a custom `handler` for `speaker-diarization` for 🤗 Inference Endpoints using Pyannote. The code for the customized pipeline is in the [handler.py](https://huggingface.co/philschmid/pyannote-speaker-diarization-endpoint/blob/main/handler.py).
20
+
21
+ There is also a [notebook](https://huggingface.co/philschmid/pyannote-speaker-diarization-endpoint/blob/main/create_handler.ipynb) included, on how to create the `handler.py`
22
+
23
+ ### Request
24
+
25
+ The endpoint expects a binary audio file. Below are a cURL and a Python example using the `requests` library.
26
+
27
+ **curl**
28
+
29
+ ```bash
30
+ # load audio file
31
+ wget https://cdn-media.huggingface.co/speech_samples/sample1.flac
32
+
33
+ # run request
34
+ curl --request POST \
35
+ --url https://{ENDPOINT}/ \
36
+ --header 'Content-Type: audio/x-wav' \
37
+ --header 'Authorization: Bearer {HF_TOKEN}' \
38
+ --data-binary '@sample.wav'
39
+ ```
40
+
41
+ **Python**
42
+
43
+ ```python
44
+ import json
45
+ from typing import List
46
+ import requests as r
47
+ import base64
48
+ import mimetypes
49
+
50
+ ENDPOINT_URL=""
51
+ HF_TOKEN=""
52
+
53
+ def predict(path_to_audio:str=None):
54
+ # read audio file
55
+ with open(path_to_audio, "rb") as i:
56
+ b = i.read()
57
+ # get mimetype
58
+ content_type= mimetypes.guess_type(path_to_audio)[0]
59
+
60
+ headers= {
61
+ "Authorization": f"Bearer {HF_TOKEN}",
62
+ "Content-Type": content_type
63
+ }
64
+ response = r.post(ENDPOINT_URL, headers=headers, data=b)
65
+ return response.json()
66
+
67
+ prediction = predict(path_to_audio="sample.wav")
68
+
69
+ prediction
70
+
71
+ ```
72
+ expected output
73
+
74
+ ```json
75
+ {"diarization": [
76
+ {"label": "SPEAKER_01", "start": "0.4978125", "stop": "1.3921875"},
77
+ {"label": "SPEAKER_01", "start": "1.8984375", "stop": "2.7590624999999998"},
78
+ {"label": "SPEAKER_02", "start": "2.9953125", "stop": "3.5015625000000004"},
79
+ {"label": "SPEAKER_01", "start": "3.5690625000000002", "stop": "4.311562500000001"}
80
+ ...
81
+ ```
82
+