End-to-end Neural Diarization with Encoder-Decoder Based Attractors trained on AMI-headset. This example could be found at egs2/ami/diar1.

Configurations:

  • Use ESPNet's default frontend to extract features. The sampling rate is 8000 Hz, with a frame length of 25 ms and a frame shift of 10 ms. The frontend extracts 23 log-scaled Mel-filterbanks.
  • Use 4 layer stacked Transformer encoder, each outputs 256-dimensional frame-wise embeddings.
  • Use the ESPNet' standard rnn attractor (LSTM) with hidden size of 256.
  • Initial training uses data with 4 speakers for 500 epochs, following spk4/diar_train_diar_eda_raw_spk4/config.yaml.
  • Adaptation involves fine-tuning the model using data with 3 and 5 speakers respectively for 20 epochs respectively, using spk3/diar_train_diar_eda_raw_spk3/config.yaml and spk5/diar_train_diar_eda_raw_spk5/config.yaml respectively.

RESULTS

The following results were obtained using the checkpoint spk5/diar_train_diar_eda_raw_spk5/20epoch.pth, tested on the test and development sets with the 4-speakers.

Environments

  • date: Thu Dec 19 22:43:37 EST 2024
  • python version: 3.11.10 (main, Oct 3 2024, 07:29:13) [GCC 11.2.0]
  • espnet version: espnet 202409
  • pytorch version: pytorch 2.4.0
  • Git hash: c12b3d59ca4fd8847edf274e56a1716474d2a30e
    • Commit date: Thu Dec 19 21:58:26 2024 -0500

spk4

DER

diarized_test

threshold_median_collar DER
result_th0.3_med11_collar0.0 72.44
result_th0.3_med1_collar0.0 74.64
result_th0.4_med11_collar0.0 70.60
result_th0.4_med1_collar0.0 72.30
result_th0.5_med11_collar0.0 70.45
result_th0.5_med1_collar0.0 72.02
result_th0.6_med11_collar0.0 71.85
result_th0.6_med1_collar0.0 73.41
result_th0.7_med11_collar0.0 75.56
result_th0.7_med1_collar0.0 77.02

spk4

DER

diarized_dev

threshold_median_collar DER
result_th0.3_med11_collar0.0 74.37
result_th0.3_med1_collar0.0 75.96
result_th0.4_med11_collar0.0 71.69
result_th0.4_med1_collar0.0 72.94
result_th0.5_med11_collar0.0 70.83
result_th0.5_med1_collar0.0 72.12
result_th0.6_med11_collar0.0 71.96
result_th0.6_med1_collar0.0 73.34
result_th0.7_med11_collar0.0 75.81
result_th0.7_med1_collar0.0 76.99
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .