jhauret commited on
Commit
e03e8a3
1 Parent(s): 1ee7529

Upload EBENGenerator

Browse files
Files changed (3) hide show
  1. README.md +50 -0
  2. config.json +5 -0
  3. model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: fr
3
+ license: mit
4
+ library_name: transformers
5
+ tags:
6
+ - audio
7
+ - audio-to-audio
8
+ - speech
9
+ datasets:
10
+ - Cnam-LMSSC/vibravox
11
+ ---
12
+ # Model Card
13
+
14
+ - **Developed by:** [Cnam-LMSSC](https://huggingface.co/Cnam-LMSSC)
15
+ - **Model type:** [EBEN](https://github.com/jhauret/vibravox/blob/main/vibravox/torch_modules/dnn/eben_generator.py) (see [publication](https://ieeexplore.ieee.org/document/10244161))
16
+ - **Language:** French
17
+ - **License:** MIT
18
+ - **Finetuned dataset:** `speech_clean` subset of [Cnam-LMSSC/vibravox](https://huggingface.co/datasets/Cnam-LMSSC/vibravox)
19
+ - **Samplerate for usage:** 16kHz
20
+
21
+ ## Overview
22
+
23
+ This bandwidth extension model is trained on one specific body conduction sensor data from the [Vibravox dataset](https://huggingface.co/datasets/Cnam-LMSSC/vibravox).
24
+ The model is designed to to enhance the audio quality of body-conducted captured speech, by denoising and regenerating mid and high frequencies from low frequency content only.
25
+
26
+ ## Disclaimer
27
+ This model has been trained for **specific non-conventional speech sensors** and is intended to be used with **in-domain data**.
28
+ Please be advised that using these models outside their intended sensor data may result in suboptimal performance.
29
+
30
+
31
+ ## Training procedure
32
+
33
+ Detailed instructions for reproducing the experiments are available on the [jhauret/vibravox](https://github.com/jhauret/vibravox) Github repository.
34
+
35
+ ## Inference script :
36
+
37
+ ```python
38
+ import torch, torchaudio
39
+ from vibravox import EBENGenerator
40
+ from datasets import load_dataset
41
+
42
+ audio_16kHz, _ = torch.load("path_to_audio")
43
+
44
+ cut_audio_16kHz = model.cut_to_valid_length(audio_16kHz)
45
+ enhanced_audio_16kHz = model(cut_audio_16kHz)
46
+ ```
47
+
48
+ ## Link to other BWE models trained on other body conducted sensors :
49
+
50
+ An entry point to all **audio bandwidth extension** (BWE) models trained on different sensor data from the trained on different sensor data from the [Vibravox dataset](https://huggingface.co/datasets/Cnam-LMSSC/vibravox) is available at [https://huggingface.co/Cnam-LMSSC/vibravox_EBEN_bwe_models](https://huggingface.co/Cnam-LMSSC/vibravox_EBEN_bwe_models).
config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "m": 4,
3
+ "n": 32,
4
+ "p": 2
5
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e836249ddff11387fa9c10865390106c71130f007c48890bb070f7ff9cdaed9d
3
+ size 7797832