Rename `segmentation_model` to `model` in README.md documentation

abb76b9 verified 7 months ago

3.69 kB

	---
	license: mit
	base_model: pyannote/segmentation-3.0
	tags:
	- speaker-diarization
	- speaker-segmentation
	- generated_from_trainer
	datasets:
	- diarizers-community/callhome
	model-index:
	- name: speaker-segmentation-fine-tuned-callhome-deu
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# speaker-segmentation-fine-tuned-callhome-deu

	This model is a fine-tuned version of [pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0) on the diarizers-community/callhome deu dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.3780
	- Der: 0.1415
	- False Alarm: 0.0724
	- Missed Detection: 0.0490
	- Confusion: 0.0201

	## Model description

	This segmentation model has been trained on German data (Callhome) using [diarizers](https://github.com/huggingface/diarizers/tree/main).
	It can be loaded with two lines of code:

	```python
	from diarizers import SegmentationModel

	segmentation_model = SegmentationModel().from_pretrained('diarizers-community/speaker-segmentation-fine-tuned-callhome-deu')
	```

	To use it within a pyannote speaker diarization pipeline, load the [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) pipeline, and convert the model to a pyannote compatible format:

	```python

	from pyannote.audio import Pipeline
	import torch

	device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")


	# load the pre-trained pyannote pipeline
	pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1")
	pipeline.to(device)

	# replace the segmentation model with your fine-tuned one
	model = segmentation_model.to_pyannote_model()
	pipeline._segmentation.model = model.to(device)
	```

	You can now use the pipeline on audio examples:

	```python
	# load dataset example
	dataset = load_dataset("diarizers-community/callhome", "deu", split="data")
	sample = dataset[0]["audio"]

	# pre-process inputs
	sample["waveform"] = torch.from_numpy(sample.pop("array")[None, :]).to(device, dtype=model.dtype)
	sample["sample_rate"] = sample.pop("sampling_rate")

	# perform inference
	diarization = pipeline(sample)

	# dump the diarization output to disk using RTTM format
	with open("audio.rttm", "w") as rttm:
	diarization.write_rttm(rttm)
	```

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.001
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- num_epochs: 5.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Der \| False Alarm \| Missed Detection \| Confusion \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|:-----------:\|:----------------:\|:---------:\|
	\| 0.4622 \| 1.0 \| 330 \| 0.3844 \| 0.1439 \| 0.0653 \| 0.0562 \| 0.0223 \|
	\| 0.4306 \| 2.0 \| 660 \| 0.4004 \| 0.1519 \| 0.0763 \| 0.0515 \| 0.0241 \|
	\| 0.4069 \| 3.0 \| 990 \| 0.3775 \| 0.1407 \| 0.0707 \| 0.0496 \| 0.0204 \|
	\| 0.3949 \| 4.0 \| 1320 \| 0.3771 \| 0.1408 \| 0.0710 \| 0.0498 \| 0.0200 \|
	\| 0.3879 \| 5.0 \| 1650 \| 0.3780 \| 0.1415 \| 0.0724 \| 0.0490 \| 0.0201 \|


	### Framework versions

	- Transformers 4.40.0
	- Pytorch 2.2.2+cu121
	- Datasets 2.18.0
	- Tokenizers 0.19.1