pharaouk ylacombe commited on
Commit
8726036
·
verified ·
0 Parent(s):

Duplicate from parler-tts/dac_44khZ_8kbps

Browse files

Co-authored-by: Yoach Lacombe <ylacombe@users.noreply.huggingface.co>

Files changed (5) hide show
  1. .gitattributes +35 -0
  2. README.md +126 -0
  3. config.json +13 -0
  4. model.safetensors +3 -0
  5. preprocessor_config.json +10 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - DAC
5
+ - audio
6
+ license: mit
7
+ ---
8
+
9
+ # Descript Audio Codec (.dac): High-Fidelity Audio Compression with Improved RVQGAN
10
+
11
+ This repository is a wrapper around the original **Descript Audio Codec** model, a high fidelity general neural audio codec, introduced in the paper titled **High-Fidelity Audio Compression with Improved RVQGAN**.
12
+
13
+ It is designed to be used as a drop-in replacement of the [transformers implementation](https://huggingface.co/docs/transformers/v4.39.3/en/model_doc/encodec#overview) of [Encodec](https://github.com/facebookresearch/encodec), so that architectures that use Encodec can also be trained with DAC instead.
14
+ The [Parler-TTS library](https://github.com/huggingface/parler-tts) is an example of how to use DAC to train high-quality TTS models. We released [Parler-TTS Mini v0.1]("https://huggingface.co/parler-tts/parler_tts_mini_v0.1"), a first iteration model trained using 10k hours of narrated audiobooks. It generates high-quality speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation)
15
+
16
+ To use this checkpoint, you first need to install the [Parler-TTS library](https://github.com/huggingface/parler-tts) with (to do once):
17
+ ```sh
18
+ pip install git+https://github.com/huggingface/parler-tts.git
19
+ ```
20
+
21
+ And then use:
22
+ ```python
23
+ from parler_tts import DACModel
24
+ dac_model = DACModel.from_pretrained("parler-tts/dac_44khZ_8kbps")
25
+ ```
26
+
27
+
28
+ 🚨 If you want to use the original DAC codebase, refers to the [original repository](https://github.com/descriptinc/descript-audio-codec/tree/main) or to the [Original Usage](#original-usage) section.
29
+
30
+
31
+ ## Original Usage
32
+
33
+ [arXiv Paper: High-Fidelity Audio Compression with Improved RVQGAN
34
+ ](http://arxiv.org/abs/2306.06546) <br>
35
+ [Demo Site](https://descript.notion.site/Descript-Audio-Codec-11389fce0ce2419891d6591a68f814d5)<br>
36
+ [Github repo](https://github.com/descriptinc/descript-audio-codec/tree/main)<br>
37
+
38
+ 👉 With Descript Audio Codec, you can compress **44.1 KHz audio** into discrete codes at a **low 8 kbps bitrate**. <br>
39
+ 🤌 That's approximately **90x compression** while maintaining exceptional fidelity and minimizing artifacts. <br>
40
+ 💪 Descript universal model works on all domains (speech, environment, music, etc.), making it widely applicable to generative modeling of all audio. <br>
41
+ 👌 It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.) <br>
42
+
43
+
44
+ ### Installation
45
+ ```
46
+ pip install descript-audio-codec
47
+ ```
48
+ OR
49
+
50
+ ```
51
+ pip install git+https://github.com/descriptinc/descript-audio-codec
52
+ ```
53
+
54
+ ### Weights
55
+ Weights are released as part of this repo under MIT license.
56
+ We release weights for models that can natively support 16 kHz, 24kHz, and 44.1kHz sampling rates.
57
+ Weights are automatically downloaded when you first run `encode` or `decode` command. You can cache them using one of the following commands
58
+ ```bash
59
+ python3 -m dac download # downloads the default 44kHz variant
60
+ python3 -m dac download --model_type 44khz # downloads the 44kHz variant
61
+ python3 -m dac download --model_type 24khz # downloads the 24kHz variant
62
+ python3 -m dac download --model_type 16khz # downloads the 16kHz variant
63
+ ```
64
+ We provide a Dockerfile that installs all required dependencies for encoding and decoding. The build process caches the default model weights inside the image. This allows the image to be used without an internet connection. [Please refer to instructions below.](#docker-image)
65
+
66
+
67
+ ### Compress audio
68
+ ```
69
+ python3 -m dac encode /path/to/input --output /path/to/output/codes
70
+ ```
71
+
72
+ This command will create `.dac` files with the same name as the input files.
73
+ It will also preserve the directory structure relative to input root and
74
+ re-create it in the output directory. Please use `python -m dac encode --help`
75
+ for more options.
76
+
77
+ ### Reconstruct audio from compressed codes
78
+ ```
79
+ python3 -m dac decode /path/to/output/codes --output /path/to/reconstructed_input
80
+ ```
81
+
82
+ This command will create `.wav` files with the same name as the input files.
83
+ It will also preserve the directory structure relative to input root and
84
+ re-create it in the output directory. Please use `python -m dac decode --help`
85
+ for more options.
86
+
87
+ ### Programmatic Usage
88
+ ```py
89
+ import dac
90
+ from audiotools import AudioSignal
91
+
92
+ # Download a model
93
+ model_path = dac.utils.download(model_type="44khz")
94
+ model = dac.DAC.load(model_path)
95
+
96
+ model.to('cuda')
97
+
98
+ # Load audio signal file
99
+ signal = AudioSignal('input.wav')
100
+
101
+ # Encode audio signal as one long file
102
+ # (may run out of GPU memory on long files)
103
+ signal.to(model.device)
104
+
105
+ x = model.preprocess(signal.audio_data, signal.sample_rate)
106
+ z, codes, latents, _, _ = model.encode(x)
107
+
108
+ # Decode audio signal
109
+ y = model.decode(z)
110
+
111
+ # Alternatively, use the `compress` and `decompress` functions
112
+ # to compress long files.
113
+
114
+ signal = signal.cpu()
115
+ x = model.compress(signal)
116
+
117
+ # Save and load to and from disk
118
+ x.save("compressed.dac")
119
+ x = dac.DACFile.load("compressed.dac")
120
+
121
+ # Decompress it back to an AudioSignal
122
+ y = model.decompress(x)
123
+
124
+ # Write to file
125
+ y.write('output.wav')
126
+ ```
config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "DACModel"
4
+ ],
5
+ "codebook_size": 1024,
6
+ "frame_rate": 86,
7
+ "latent_dim": 1024,
8
+ "model_bitrate": 8,
9
+ "model_type": "dac",
10
+ "num_codebooks": 9,
11
+ "torch_dtype": "float32",
12
+ "transformers_version": "4.38.0.dev0"
13
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f65197de6142f9e0d186f78fb3aa12d47fde62f4c650e7ee5a254157618230f7
3
+ size 306642416
preprocessor_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length_s": null,
3
+ "feature_extractor_type": "EncodecFeatureExtractor",
4
+ "feature_size": 1,
5
+ "overlap": null,
6
+ "padding_side": "right",
7
+ "padding_value": 0.0,
8
+ "return_attention_mask": true,
9
+ "sampling_rate": 44100
10
+ }