trysem enhuiz commited on
Commit
e29954a
0 Parent(s):

Duplicate from ResembleAI/resemble-enhance

Browse files

Co-authored-by: Zhe Niu <enhuiz@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Archived[[:space:]]Speech.mp4 filter=lfs diff=lfs merge=lfs -text
37
+ Background[[:space:]]Music.mp4 filter=lfs diff=lfs merge=lfs -text
38
+ Street[[:space:]]Noise.mp4 filter=lfs diff=lfs merge=lfs -text
Archived Speech.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:695be2e390da5f10187a2b34789ec82ca55a0fe727614dddd50e5b419f6a1687
3
+ size 33300779
Background Music.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:528601427e0a95dfe3b7e08fec399c3e3142946cdb6cfbcc33de79aaefb08e1b
3
+ size 33738329
README.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ pipeline_tag: audio-to-audio
6
+ tags:
7
+ - speech-enhancement
8
+ - speech-denoising
9
+ ---
10
+
11
+ Resemble Enhance is an AI-powered tool that aims to improve the overall quality of speech by performing denoising and enhancement. It consists of two modules: a denoiser, which separates speech from a noisy audio, and an enhancer, which further boosts the perceptual audio quality by restoring audio distortions and extending the audio bandwidth. The two models are trained on high-quality 44.1kHz speech data that guarantees the enhancement of your speech with high quality.
12
+
13
+ Below are three videos with separate examples of Resemble Enhance's denoiser module and then enhancer module improving speech quality.
14
+
15
+ Background Music - https://youtu.be/gl--IMtQ0XQ
16
+
17
+ Street Noise - https://youtu.be/zC87BjtsZVA
18
+
19
+ Archived Speech - https://youtu.be/6dALaLMJhSQ
Street Noise.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39d0f09cbb894b65013e09cfe003a393b38242120afe4f163d79f78ec965e0c3
3
+ size 30024648
enhancer_stage2/ds/G/default/mp_rank_00_model_states.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9d035f318de3e6d919bc70cf7ad7d32b4fe92ec5cbe0b30029a27f5db07d9d6
3
+ size 713176232
enhancer_stage2/ds/G/latest ADDED
@@ -0,0 +1 @@
 
 
1
+ default
enhancer_stage2/hparams.yaml ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ fg_dir: !!python/object/apply:pathlib.PosixPath
2
+ - data
3
+ - fg
4
+ bg_dir: !!python/object/apply:pathlib.PosixPath
5
+ - data
6
+ - bg
7
+ rir_dir: !!python/object/apply:pathlib.PosixPath
8
+ - data
9
+ - rir
10
+ load_fg_only: false
11
+ wav_rate: 44100
12
+ n_fft: 2048
13
+ win_size: 2048
14
+ hop_size: 420
15
+ num_mels: 128
16
+ stft_magnitude_min: 0.0001
17
+ preemphasis: 0.97
18
+ mix_alpha_range:
19
+ - 0.2
20
+ - 0.8
21
+ nj: 64
22
+ training_seconds: 3.0
23
+ batch_size_per_gpu: 32
24
+ min_lr: 1.0e-05
25
+ max_lr: 0.0001
26
+ warmup_steps: 1000
27
+ max_steps: 1000000
28
+ gradient_clipping: 1.0
29
+ cfm_solver_method: midpoint
30
+ cfm_solver_nfe: 64
31
+ cfm_time_mapping_divisor: 4
32
+ univnet_nc: 96
33
+ lcfm_latent_dim: 64
34
+ lcfm_training_mode: cfm
35
+ lcfm_z_scale: 6
36
+ vocoder_extra_dim: 32
37
+ gan_training_start_step: null
38
+ praat_augment_prob: 0.2