music
audio
speech
autoencoder
diffusion
marcop commited on
Commit
dfe05b5
·
verified ·
1 Parent(s): 9b57ee5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -7
README.md CHANGED
@@ -11,15 +11,17 @@ tags:
11
  ---
12
 
13
  # Music2Latent
14
- Encode and decode audio samples to compressed representations! Useful for efficient generative modelling applications and for other downstream tasks.
15
 
16
  ![music2latent](music2latent.png)
17
 
18
  Read the ISMIR 2024 paper [here](https://arxiv.org/).
19
 
20
  Under the hood, __Music2Latent__ uses a __Consistency Autoencoder__ model to efficiently encode and decode audio samples.
 
21
  44.1 kHz audio is encoded into a sequence of __~10 Hz__, and each of the latents has 64 channels.
22
- You can then train a generative model on these embeddings, or use them for other downstream tasks.
 
23
 
24
  Music2Latent was trained on __music__ and on __speech__. Refer to the [paper](https://arxiv.org/) for more details.
25
 
@@ -29,7 +31,7 @@ Music2Latent was trained on __music__ and on __speech__. Refer to the [paper](ht
29
  ```bash
30
  pip install music2latent
31
  ```
32
- The model weights will be downloaded automatically the first time you run the code.
33
 
34
 
35
  ## How to use
@@ -38,20 +40,21 @@ To encode and decode audio samples to/from latent embeddings:
38
  audio_path = librosa.example('trumpet')
39
  wv, sr = librosa.load(audio_path, sr=44100)
40
 
41
- from music2latent2 import EncoderDecoder
42
  encdec = EncoderDecoder()
43
 
44
  latent = encdec.encode(wv)
 
45
 
46
  wv_rec = encdec.decode(latent)
47
  ```
48
- If you need to extract encoder features to use in downstream tasks, and you don't need to reconstruct the audio:
49
  ```bash
50
  features = encoder.encode(wv, extract_features=True)
51
  ```
52
- These features are extracted before the encoder bottleneck, and thus have more channels (contain more information) than the latents used for reconstruction.
53
 
54
- music2latent2 supports more advanced usage, inclusing GPU memory management controls. Please refer to __tutorial.ipynb__.
55
 
56
 
57
  ## License
 
11
  ---
12
 
13
  # Music2Latent
14
+ Encode and decode audio samples to/from compressed representations! Useful for efficient generative modelling applications and for other downstream tasks.
15
 
16
  ![music2latent](music2latent.png)
17
 
18
  Read the ISMIR 2024 paper [here](https://arxiv.org/).
19
 
20
  Under the hood, __Music2Latent__ uses a __Consistency Autoencoder__ model to efficiently encode and decode audio samples.
21
+
22
  44.1 kHz audio is encoded into a sequence of __~10 Hz__, and each of the latents has 64 channels.
23
+ 48 kHz audio can also be encoded, which results in a sequence of ~12 Hz.
24
+ A generative model can then be trained on these embeddings, or they can be used for other downstream tasks.
25
 
26
  Music2Latent was trained on __music__ and on __speech__. Refer to the [paper](https://arxiv.org/) for more details.
27
 
 
31
  ```bash
32
  pip install music2latent
33
  ```
34
+ The model weights will be downloaded automatically the first time the code is run.
35
 
36
 
37
  ## How to use
 
40
  audio_path = librosa.example('trumpet')
41
  wv, sr = librosa.load(audio_path, sr=44100)
42
 
43
+ from music2latent import EncoderDecoder
44
  encdec = EncoderDecoder()
45
 
46
  latent = encdec.encode(wv)
47
+ # latent has shape (batch_size/audio_channels, dim (64), sequence_length)
48
 
49
  wv_rec = encdec.decode(latent)
50
  ```
51
+ To extract encoder features to use in downstream tasks:
52
  ```bash
53
  features = encoder.encode(wv, extract_features=True)
54
  ```
55
+ These features are extracted before the encoder bottleneck, and thus have more channels (contain more information) than the latents used for reconstruction. It will not be possible to directly decode these features back to audio.
56
 
57
+ music2latent supports more advanced usage, including GPU memory management controls. Please refer to __tutorial.ipynb__.
58
 
59
 
60
  ## License