Goekdeniz-Guelmez
/

J.O.S.I.E.v4o

Mixture of Experts

Model card Files Files and versions Community

Goekdeniz-Guelmez commited on Oct 29, 2024

Commit

412bbad

·

verified ·

1 Parent(s): 2715030

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -61,7 +61,7 @@ Upcoming Modalities:
 Architecture:
-![Main architecture](https://huggingface.co/Goekdeniz-Guelmez/J.O.S.I.E.v4o/blob/main/Drawing%202024-10-28%2019.59.08.excalidraw.png)
 	•	Core Framework: A central general-purpose LLM (LLaMA/Qwen) processes discrete tokens generated from various sensory inputs.
 	•	Audio Processing: Employs RQ-Transformers with temporal and depth transformers, encoding raw audio into discrete tokens that the LLM processes. The tokens are then decoded back into audio responses, with the RQ-Transformer converting output tokens into Mel spectrograms that a vocoder renders into audio.

 Architecture:
+![Main architecture](https://cdn-lfs-us-1.hf.co/repos/1c/93/1c93eaf591b503a88961f61de2407cb6c3bb3453cb5ad121c01c928d0e3706f7/23666d2de95a4b3ae0e1c764a61a36e96a4341eeda2fd75815bf32a577838272?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27Drawing%25202024-10-28%252019.59.08.excalidraw.png%3B+filename%3D%22Drawing+2024-10-28+19.59.08.excalidraw.png%22%3B&response-content-type=image%2Fpng&Expires=1730493421&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTczMDQ5MzQyMX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zLzFjLzkzLzFjOTNlYWY1OTFiNTAzYTg4OTYxZjYxZGUyNDA3Y2I2YzNiYjM0NTNjYjVhZDEyMWMwMWM5MjhkMGUzNzA2ZjcvMjM2NjZkMmRlOTVhNGIzYWUwZTFjNzY0YTYxYTM2ZTk2YTQzNDFlZWRhMmZkNzU4MTViZjMyYTU3NzgzODI3Mj9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoifV19&Signature=RziqPEZhFwb91rf8zha6-BIj0drXjx3gdjhWig37QkVroYnnWU%7E-ovFy2Kn2J%7EuTncxSZ10Yz3-qCJrvTkCJjRzbf1k7LoCekGumzfkdgKHQkrbXhbdaOhU31jWoRWLdaQL-z74a9UyhXrEmTmDq-iUdflDnF1wHEQLBWnfyE9GqJv8nHukisqEokIut60UuuYroaMLwJFQLipVir7MRtrO2UoO8pvMcA0OepYK88nU%7EpYZcFPXK99zOK3-CGgsIi4q5aYdZt0u-W1VI3lhjMtlHQbyHyDV5KJ32PFrE7VXxu0Xxe7lIN2F-hOe2l0l3GVrXSVnVG-EO%7ENZU8941OQ__&Key-Pair-Id=K24J24Z295AEI9)
 	•	Core Framework: A central general-purpose LLM (LLaMA/Qwen) processes discrete tokens generated from various sensory inputs.
 	•	Audio Processing: Employs RQ-Transformers with temporal and depth transformers, encoding raw audio into discrete tokens that the LLM processes. The tokens are then decoded back into audio responses, with the RQ-Transformer converting output tokens into Mel spectrograms that a vocoder renders into audio.