Upload config

Browse files

Files changed (3) hide show

README.md +199 -0
config.json +0 -0
configuration_ecapa_tdnn.py +196 -0

README.md ADDED Viewed

	@@ -0,0 +1,199 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

configuration_ecapa_tdnn.py ADDED Viewed

	@@ -0,0 +1,196 @@

+from typing import Any, Union
+from transformers.configuration_utils import PretrainedConfig
+class EcapaTdnnConfig(PretrainedConfig):
+    def __init__(
+        self,
+        sample_rate: int = 16000,
+        window_size: float = 0.02,
+        window_stride: float = 0.01,
+        n_window_size: Any = None,
+        n_window_stride: Any = None,
+        window: str = "hann",
+        normalize: str = "per_feature",
+        n_fft: Any = None,
+        preemph: float = 0.97,
+        features: int = 64,
+        lowfreq: int = 0,
+        highfreq: Any = None,
+        log: bool = True,
+        log_zero_guard_type: str = "add",
+        log_zero_guard_value: Any = 2 ** -24,
+        dither: float = 0.00001,
+        pad_to: int = 16,
+        frame_splicing: int = 1,
+        exact_pad: bool = False,
+        pad_value: int = 0,
+        mag_power: float = 2,
+        rng: Any = None,
+        nb_augmentation_prob: float = 0,
+        nb_max_freq: int = 4000,
+        use_torchaudio: bool = False,
+        mel_norm: str = "slaney",
+        freq_masks: int = 0,
+        time_masks: int = 0,
+        freq_width: int = 10,
+        time_width: int = 10,
+        rect_masks: int = 0,
+        rect_time: int = 5,
+        rect_freq: int = 20,
+        mask_value: float = 0,
+        use_vectorized_spec_augment: bool = True,
+        filters: list = [512, 512, 512, 512, 1500],
+        kernel_sizes: list = [5, 3, 3, 1, 1],
+        dilations: list = [1, 2, 3, 1, 1],
+        scale: int = 8,
+        res2net: bool = False,
+        res2net_scale: int = 8,
+        init_mode: str = 'xavier_uniform',
+        emb_sizes: Union[int, list] = 256,
+        pool_mode: str = 'xvector',
+        angular: bool = False,
+        attention_channels: int = 128,
+        objective: str = 'additive_angular_margin', # additive_margin, additive_angular_margin, cross_entropy
+        angular_scale = 30,
+        angular_margin: float = 0.2,
+        label_smoothing: float = 0.0,
+        initializer_range=0.02,
+        pad_token_id=0,
+        bos_token_id=1,
+        eos_token_id=2,
+        **kwargs,
+    ):
+        super().__init__(**kwargs, pad_token_id=pad_token_id, bos_token_id=bos_token_id, eos_token_id=eos_token_id)
+        self.initializer_range = initializer_range
+        # Mel-spectrogram configuration
+        self.sample_rate = sample_rate
+        self.window_size = window_size
+        self.window_stride = window_stride
+        self.n_window_size = n_window_size
+        self.n_window_stride = n_window_stride
+        self.window = window
+        self.normalize = normalize
+        self.n_fft = n_fft
+        self.preemph = preemph
+        self.features = features
+        self.lowfreq = lowfreq
+        self.highfreq = highfreq
+        self.log = log
+        self.log_zero_guard_type = log_zero_guard_type
+        self.log_zero_guard_value = log_zero_guard_value
+        self.dither = dither
+        self.pad_to = pad_to
+        self.frame_splicing = frame_splicing
+        self.exact_pad = exact_pad
+        self.pad_value = pad_value
+        self.mag_power = mag_power
+        self.rng = rng
+        self.nb_augmentation_prob = nb_augmentation_prob
+        self.nb_max_freq = nb_max_freq
+        self.use_torchaudio = use_torchaudio
+        self.mel_norm = mel_norm
+        self.mel_spectrogram_config = {
+            "sample_rate": sample_rate,
+            "window_size": window_size,
+            "window_stride": window_stride,
+            "n_window_size": n_window_size,
+            "n_window_stride": n_window_stride,
+            "window": window,
+            "normalize": normalize,
+            "n_fft": n_fft,
+            "preemph": preemph,
+            "features": features,
+            "lowfreq": lowfreq,
+            "highfreq": highfreq,
+            "log": log,
+            "log_zero_guard_type": log_zero_guard_type,
+            "log_zero_guard_value": log_zero_guard_value,
+            "dither": dither,
+            "pad_to": pad_to,
+            "frame_splicing": frame_splicing,
+            "exact_pad": exact_pad,
+            "pad_value": pad_value,
+            "mag_power": mag_power,
+            "rng": rng,
+            "nb_augmentation_prob": nb_augmentation_prob,
+            "nb_max_freq": nb_max_freq,
+            "use_torchaudio": use_torchaudio,
+            "mel_norm": mel_norm,
+        }
+        # Spectrogram Augmentation configuration
+        self.freq_masks = freq_masks
+        self.time_masks = time_masks
+        self.freq_width = freq_width
+        self.time_width = time_width
+        self.rect_masks = rect_masks
+        self.rect_time = rect_time
+        self.rect_freq = rect_freq
+        self.mask_value = mask_value
+        self.use_vectorized_spec_augment = use_vectorized_spec_augment
+        self.spectrogram_augmentation_config = {
+            "freq_masks": freq_masks,
+            "time_masks": time_masks,
+            "freq_width": freq_width,
+            "time_width": time_width,
+            "rect_masks": rect_masks,
+            "rect_time": rect_time,
+            "rect_freq": rect_freq,
+            "mask_value": mask_value,
+            "use_vectorized_spec_augment": use_vectorized_spec_augment,
+        }
+        # Encoder configuration
+        self.feat_in = features
+        self.filters = filters
+        self.kernel_sizes = kernel_sizes
+        self.dilations = dilations
+        self.scale = scale
+        self.res2net = res2net
+        self.res2net_scale = res2net_scale
+        self.init_mode = init_mode
+        self.encoder_config = {
+            "feat_in": self.features,
+            "filters": self.filters,
+            "kernel_sizes": self.kernel_sizes,
+            "dilations": self.dilations,
+            "scale": self.scale,
+            "res2net": self.res2net,
+            "res2net_scale": self.res2net_scale,
+            "init_mode": self.init_mode,
+        }
+        # Decoder configuration
+        self.emb_sizes = emb_sizes
+        self.pool_mode = pool_mode
+        self.angular = True if objective in ['additive_angular_margin', 'additive_margin'] else False
+        self.attention_channels = attention_channels
+        self.decoder_config = {
+            "feat_in": filters[-1],
+            "num_classes": self.num_labels,
+            "emb_sizes": emb_sizes,
+            "pool_mode": pool_mode,
+            "angular": angular,
+            "attention_channels": attention_channels,
+            "init_mode": init_mode,
+        }
+        # Loss function configuration
+        self.objective = objective
+        self.angular_scale = angular_scale
+        self.angular_margin = angular_margin
+        self.label_smoothing = label_smoothing
+        if objective in ['additive_angular_margin', 'additive_margin']:
+            self.objective_config = {
+                "scale": angular_scale,
+                "margin": angular_margin,
+            }
+        elif objective == 'cross_entropy':
+            self.objective_config = {
+                "label_smoothing": label_smoothing,
+            }