GitMylo commited on
Commit
7ae6eef
1 Parent(s): 5257e3a

Add readme from riffusion/riffusion-model-v1

Browse files
Files changed (1) hide show
  1. README.md +92 -0
README.md CHANGED
@@ -1,3 +1,95 @@
1
  ---
2
  license: creativeml-openrail-m
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: creativeml-openrail-m
3
+ tags:
4
+ - stable-diffusion
5
+ - stable-diffusion-diffusers
6
+ - text-to-audio
7
+ inference: true
8
+ extra_gated_prompt: >-
9
+ This model is open access and available to all, with a CreativeML OpenRAIL-M
10
+ license further specifying rights and usage.
11
+
12
+ The CreativeML OpenRAIL License specifies:
13
+
14
+
15
+ 1. You can't use the model to deliberately produce nor share illegal or
16
+ harmful outputs or content
17
+
18
+ 2. Riffusion claims no rights on the outputs you generate, you are free to use
19
+ them and are accountable for their use which must not go against the
20
+ provisions set in the license
21
+
22
+ 3. You may re-distribute the weights and use the model commercially and/or as
23
+ a service. If you do, please be aware you have to include the same use
24
+ restrictions as the ones in the license and share a copy of the CreativeML
25
+ OpenRAIL-M to all your users (please read the license entirely and carefully)
26
+
27
+ Please read the full license carefully here:
28
+ https://huggingface.co/spaces/CompVis/stable-diffusion-license
29
+
30
+ extra_gated_heading: Please read the LICENSE to access this model
31
+ library_name: diffusers
32
  ---
33
+
34
+ # Riffusion
35
+
36
+ Riffusion is an app for real-time music generation with stable diffusion.
37
+
38
+ Read about it at https://www.riffusion.com/about and try it at https://www.riffusion.com/.
39
+
40
+ * Code: https://github.com/riffusion/riffusion
41
+ * Web app: https://github.com/hmartiro/riffusion-app
42
+ * Model checkpoint: https://huggingface.co/riffusion/riffusion-model-v1
43
+ * Discord: https://discord.gg/yu6SRwvX4v
44
+
45
+ This repository contains the model files, including:
46
+
47
+ * a diffusers formated library
48
+ * a compiled checkpoint file
49
+ * a traced unet for improved inference speed
50
+ * a seed image library for use with riffusion-app
51
+
52
+ ## Riffusion v1 Model
53
+
54
+ Riffusion is a latent text-to-image diffusion model capable of generating spectrogram images given any text input. These spectrograms can be converted into audio clips.
55
+
56
+ The model was created by [Seth Forsgren](https://sethforsgren.com/) and [Hayk Martiros](https://haykmartiros.com/) as a hobby project.
57
+
58
+ You can use the Riffusion model directly, or try the [Riffusion web app](https://www.riffusion.com/).
59
+
60
+ The Riffusion model was created by fine-tuning the **Stable-Diffusion-v1-5** checkpoint. Read about Stable Diffusion here [🤗's Stable Diffusion blog](https://huggingface.co/blog/stable_diffusion).
61
+
62
+ ### Model Details
63
+ - **Developed by:** Seth Forsgren, Hayk Martiros
64
+ - **Model type:** Diffusion-based text-to-image generation model
65
+ - **Language(s):** English
66
+ - **License:** [The CreativeML OpenRAIL M license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) is an [Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses), adapted from the work that [BigScience](https://bigscience.huggingface.co/) and [the RAIL Initiative](https://www.licenses.ai/) are jointly carrying in the area of responsible AI licensing. See also [the article about the BLOOM Open RAIL license](https://bigscience.huggingface.co/blog/the-bigscience-rail-license) on which our license is based.
67
+ - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses a fixed, pretrained text encoder ([CLIP ViT-L/14](https://arxiv.org/abs/2103.00020)) as suggested in the [Imagen paper](https://arxiv.org/abs/2205.11487).
68
+
69
+ ### Direct Use
70
+ The model is intended for research purposes only. Possible research areas and
71
+ tasks include
72
+
73
+ - Generation of artworks, audio, and use in creative processes.
74
+ - Applications in educational or creative tools.
75
+ - Research on generative models.
76
+
77
+ ### Datasets
78
+ The original Stable Diffusion v1.5 was trained on the [LAION-5B](https://arxiv.org/abs/2210.08402) dataset using the [CLIP text encoder](https://openai.com/blog/clip/), which provided an amazing starting point with an in-depth understanding of language, including musical concepts. The team at LAION also compiled a fantastic audio dataset from many general, speech, and music sources that we recommend at [LAION-AI/audio-dataset](https://github.com/LAION-AI/audio-dataset/blob/main/data_collection/README.md).
79
+
80
+ ### Fine Tuning
81
+
82
+ Check out the [diffusers training examples](https://huggingface.co/docs/diffusers/training/overview) from Hugging Face. Fine tuning requires a dataset of spectrogram images of short audio clips, with associated text describing them. Note that the CLIP encoder is able to understand and connect many words even if they never appear in the dataset. It is also possible to use a [dreambooth](https://huggingface.co/blog/dreambooth) method to get custom styles.
83
+
84
+ ## Citation
85
+
86
+ If you build on this work, please cite it as follows:
87
+
88
+ ```
89
+ @article{Forsgren_Martiros_2022,
90
+ author = {Forsgren, Seth* and Martiros, Hayk*},
91
+ title = {{Riffusion - Stable diffusion for real-time music generation}},
92
+ url = {https://riffusion.com/about},
93
+ year = {2022}
94
+ }
95
+ ```