File size: 2,429 Bytes
0a4aac9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
license: mit
tags:
- audio
- automatic-speech-recognition
- endpoints-template
library_name: generic
inference: false
duplicated_from: philschmid/openai-whisper-endpoint
---
# OpenAI [Whisper](https://github.com/openai/whisper) Inference Endpoint example
> Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.
For more information about the model, license and limitations check the original repository at [openai/whisper](https://github.com/openai/whisper).
---
This repository implements a custom `handler` task for `automatic-speech-recognition` for 🤗 Inference Endpoints using OpenAIs new Whisper model. The code for the customized pipeline is in the [pipeline.py](https://huggingface.co/philschmid/openai-whisper-endpoint/blob/main/handler.py).
There is also a [notebook](https://huggingface.co/philschmid/openai-whisper-endpoint/blob/main/create_handler.ipynb) included, on how to create the `handler.py`
### Request
The endpoint expects a binary audio file. Below is a cURL example and a Python example using the `requests` library.
**curl**
```bash
# load audio file
wget https://cdn-media.huggingface.co/speech_samples/sample1.flac
# run request
curl --request POST \
--url https://{ENDPOINT}/ \
--header 'Content-Type: audio/x-flac' \
--header 'Authorization: Bearer {HF_TOKEN}' \
--data-binary '@sample1.flac'
```
**Python**
```python
import json
from typing import List
import requests as r
import base64
import mimetypes
ENDPOINT_URL=""
HF_TOKEN=""
def predict(path_to_audio:str=None):
# read audio file
with open(path_to_audio, "rb") as i:
b = i.read()
# get mimetype
content_type= mimetypes.guess_type(path_to_audio)[0]
headers= {
"Authorization": f"Bearer {HF_TOKEN}",
"Content-Type": content_type
}
response = r.post(ENDPOINT_URL, headers=headers, data=b)
return response.json()
prediction = predict(path_to_audio="sample1.flac")
prediction
```
expected output
```json
{"text": " going along slushy country roads and speaking to damp audiences in draughty school rooms day after day for a fortnight. He'll have to put in an appearance at some place of worship on Sunday morning, and he can come to us immediately afterwards."}
```
|