Update README.md
Browse files
README.md
CHANGED
@@ -61,7 +61,7 @@ Upcoming Modalities:
|
|
61 |
|
62 |
Architecture:
|
63 |
|
64 |
-
![Main architecture](https://
|
65 |
|
66 |
• Core Framework: A central general-purpose LLM (LLaMA/Qwen) processes discrete tokens generated from various sensory inputs.
|
67 |
• Audio Processing: Employs RQ-Transformers with temporal and depth transformers, encoding raw audio into discrete tokens that the LLM processes. The tokens are then decoded back into audio responses, with the RQ-Transformer converting output tokens into Mel spectrograms that a vocoder renders into audio.
|
|
|
61 |
|
62 |
Architecture:
|
63 |
|
64 |
+
![Main architecture](https://cdn-lfs-us-1.hf.co/repos/1c/93/1c93eaf591b503a88961f61de2407cb6c3bb3453cb5ad121c01c928d0e3706f7/23666d2de95a4b3ae0e1c764a61a36e96a4341eeda2fd75815bf32a577838272?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27Drawing%25202024-10-28%252019.59.08.excalidraw.png%3B+filename%3D%22Drawing+2024-10-28+19.59.08.excalidraw.png%22%3B&response-content-type=image%2Fpng&Expires=1730493421&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTczMDQ5MzQyMX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zLzFjLzkzLzFjOTNlYWY1OTFiNTAzYTg4OTYxZjYxZGUyNDA3Y2I2YzNiYjM0NTNjYjVhZDEyMWMwMWM5MjhkMGUzNzA2ZjcvMjM2NjZkMmRlOTVhNGIzYWUwZTFjNzY0YTYxYTM2ZTk2YTQzNDFlZWRhMmZkNzU4MTViZjMyYTU3NzgzODI3Mj9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoifV19&Signature=RziqPEZhFwb91rf8zha6-BIj0drXjx3gdjhWig37QkVroYnnWU%7E-ovFy2Kn2J%7EuTncxSZ10Yz3-qCJrvTkCJjRzbf1k7LoCekGumzfkdgKHQkrbXhbdaOhU31jWoRWLdaQL-z74a9UyhXrEmTmDq-iUdflDnF1wHEQLBWnfyE9GqJv8nHukisqEokIut60UuuYroaMLwJFQLipVir7MRtrO2UoO8pvMcA0OepYK88nU%7EpYZcFPXK99zOK3-CGgsIi4q5aYdZt0u-W1VI3lhjMtlHQbyHyDV5KJ32PFrE7VXxu0Xxe7lIN2F-hOe2l0l3GVrXSVnVG-EO%7ENZU8941OQ__&Key-Pair-Id=K24J24Z295AEI9)
|
65 |
|
66 |
• Core Framework: A central general-purpose LLM (LLaMA/Qwen) processes discrete tokens generated from various sensory inputs.
|
67 |
• Audio Processing: Employs RQ-Transformers with temporal and depth transformers, encoding raw audio into discrete tokens that the LLM processes. The tokens are then decoded back into audio responses, with the RQ-Transformer converting output tokens into Mel spectrograms that a vocoder renders into audio.
|