--- language: - "fr" tags: - "audio" - "speech" - "speaker-diarization" - "medkit" - "pyannote-audio" datasets: - "common_voice" - "pxcorpus" - "simsamu" metrics: - "der" --- # Simsamu diarization pipeline This repository contains a pretrained [pyannote-audio](https://github.com/pyannote/pyannote-audio) diarization pipeline that was fine-tuned on the [Simsamu](https://huggingface.co/datasets/medkit/simsamu) dataset. The pipeline uses a fine-tuned segmentation model based on https://huggingface.co/pyannote/segmentation-3.0 and pretrained embeddings from https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM. The pipeline hyperparameters were optimized. The pipeline can be used in [medkit](https://github.com/medkit-lib/medkit/) the following way: ``` from medkit.core.audio import AudioDocument from medkit.audio.segmentation.pa_speaker_detector import PASpeakerDetector # init speaker detector operation speaker_detector = PASpeakerDetector( model="medkit/simsamu-diarization", device=0, segmentation_batch_size=10, embedding_batch_size=10, ) # create audio document audio_doc = AudioDocument.from_file("path/to/audio.wav") # apply operation on audio document speech_segments = speaker_detector.run([audio_doc.raw_segment]) # display each speech turn and corresponding speaker for speech_seg in speech_segments: speaker_attr = speech_seg.attrs.get(label="speaker")[0] print(speech_seg.span.start, speech_seg.span.end, speaker_attr.value) ``` More info at https://medkit.readthedocs.io/ See also: [Simsamu transcription model](https://huggingface.co/medkit/simsamu-transcription)