evie-8
/

speaker-segmentation-fine-tuned-callhome-eng

speaker-diarization

speaker-segmentation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

speaker-segmentation-fine-tuned-callhome-eng / README.md

evie-8's picture

Update README.md

00c581f verified 4 months ago

|

history blame contribute delete

3.68 kB

	---
	license: mit
	base_model: pyannote/segmentation-3.0
	tags:
	- speaker-diarization
	- speaker-segmentation
	- generated_from_trainer
	datasets:
	- diarizers-community/callhome
	model-index:
	- name: speaker-segmentation-fine-tuned-callhome-eng
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# speaker-segmentation-fine-tuned-callhome-eng

	This model is a fine-tuned version of [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) on the diarizers-community/callhome eng dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4587
	- Der: 0.1824
	- False Alarm: 0.0587
	- Missed Detection: 0.0707
	- Confusion: 0.0529

	## Model description
	This segmentation model has been trained on English data (Callhome) using [diarizers](https://github.com/huggingface/diarizers/tree/main).
	It can be loaded with two lines of code:

	```python
	from diarizers import SegmentationModel

	segmentation_model = SegmentationModel().from_pretrained('evie-8/speaker-segmentation-fine-tuned-callhome-eng')
	```

	To use it within a pyannote speaker diarization pipeline, load the [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) pipeline, and convert the model to a pyannote compatible format:

	```python

	from pyannote.audio import Pipeline
	import torch

	device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")


	# load the pre-trained pyannote pipeline
	pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1")
	pipeline.to(device)

	# replace the segmentation model with your fine-tuned one
	model = segmentation_model.to_pyannote_model()
	pipeline._segmentation.model = model.to(device)
	```

	You can now use the pipeline on audio examples:

	```python
	# load dataset example
	dataset = load_dataset("diarizers-community/callhome", "eng", split="data")
	sample = dataset[0]["audio"]

	# pre-process inputs
	sample["waveform"] = torch.from_numpy(sample.pop("array")[None, :]).to(device, dtype=model.dtype)
	sample["sample_rate"] = sample.pop("sampling_rate")

	# perform inference
	diarization = pipeline(sample)

	# dump the diarization output to disk using RTTM format
	with open("audio.rttm", "w") as rttm:
	diarization.write_rttm(rttm)
	```


	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.001
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- num_epochs: 5.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Der \| False Alarm \| Missed Detection \| Confusion \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|:-----------:\|:----------------:\|:---------:\|
	\| 0.4181 \| 1.0 \| 362 \| 0.4878 \| 0.1940 \| 0.0577 \| 0.0756 \| 0.0607 \|
	\| 0.3931 \| 2.0 \| 724 \| 0.4616 \| 0.1827 \| 0.0590 \| 0.0718 \| 0.0520 \|
	\| 0.3766 \| 3.0 \| 1086 \| 0.4643 \| 0.1826 \| 0.0576 \| 0.0723 \| 0.0527 \|
	\| 0.3661 \| 4.0 \| 1448 \| 0.4603 \| 0.1832 \| 0.0620 \| 0.0682 \| 0.0530 \|
	\| 0.3568 \| 5.0 \| 1810 \| 0.4587 \| 0.1824 \| 0.0587 \| 0.0707 \| 0.0529 \|


	### Framework versions

	- Transformers 4.41.2
	- Pytorch 2.3.0+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1