source: ./README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,7 @@ extra_gated_fields:
|
|
21 |
Website: text
|
22 |
---
|
23 |
|
24 |
-
Using this open-source model in production?
|
25 |
Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options.
|
26 |
|
27 |
# 🎹 "Powerset" speaker segmentation
|
@@ -33,7 +33,7 @@ This model ingests 10 seconds of mono audio sampled at 16kHz and outputs speaker
|
|
33 |
```python
|
34 |
# waveform (first row)
|
35 |
duration, sample_rate, num_channels = 10, 16000, 1
|
36 |
-
waveform = torch.randn(batch_size, num_channels, duration * sample_rate)
|
37 |
|
38 |
# powerset multi-class encoding (second row)
|
39 |
powerset_encoding = model(waveform)
|
@@ -42,7 +42,7 @@ powerset_encoding = model(waveform)
|
|
42 |
from pyannote.audio.utils.powerset import Powerset
|
43 |
max_speakers_per_chunk, max_speakers_per_frame = 3, 2
|
44 |
to_multilabel = Powerset(
|
45 |
-
max_speakers_per_chunk,
|
46 |
max_speakers_per_frame).to_multilabel
|
47 |
multilabel_encoding = to_multilabel(powerset_encoding)
|
48 |
```
|
@@ -66,13 +66,13 @@ This [companion repository](https://github.com/FrenchKrab/IS2023-powerset-diariz
|
|
66 |
# instantiate the model
|
67 |
from pyannote.audio import Model
|
68 |
model = Model.from_pretrained(
|
69 |
-
"pyannote/segmentation-3.0",
|
70 |
use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")
|
71 |
```
|
72 |
|
73 |
### Speaker diarization
|
74 |
|
75 |
-
This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s chunks).
|
76 |
|
77 |
See [pyannote/speaker-diarization-3.0](https://hf.co/pyannote/speaker-diarization-3.0) pipeline that uses an additional speaker embedding model to perform full recording speaker diarization.
|
78 |
|
|
|
21 |
Website: text
|
22 |
---
|
23 |
|
24 |
+
Using this open-source model in production?
|
25 |
Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options.
|
26 |
|
27 |
# 🎹 "Powerset" speaker segmentation
|
|
|
33 |
```python
|
34 |
# waveform (first row)
|
35 |
duration, sample_rate, num_channels = 10, 16000, 1
|
36 |
+
waveform = torch.randn(batch_size, num_channels, duration * sample_rate)
|
37 |
|
38 |
# powerset multi-class encoding (second row)
|
39 |
powerset_encoding = model(waveform)
|
|
|
42 |
from pyannote.audio.utils.powerset import Powerset
|
43 |
max_speakers_per_chunk, max_speakers_per_frame = 3, 2
|
44 |
to_multilabel = Powerset(
|
45 |
+
max_speakers_per_chunk,
|
46 |
max_speakers_per_frame).to_multilabel
|
47 |
multilabel_encoding = to_multilabel(powerset_encoding)
|
48 |
```
|
|
|
66 |
# instantiate the model
|
67 |
from pyannote.audio import Model
|
68 |
model = Model.from_pretrained(
|
69 |
+
"pyannote/segmentation-3.0",
|
70 |
use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")
|
71 |
```
|
72 |
|
73 |
### Speaker diarization
|
74 |
|
75 |
+
This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s chunks).
|
76 |
|
77 |
See [pyannote/speaker-diarization-3.0](https://hf.co/pyannote/speaker-diarization-3.0) pipeline that uses an additional speaker embedding model to perform full recording speaker diarization.
|
78 |
|