--- license: cc-by-nc-4.0 language: - en tags: - audio2face - Transformers - seq2seq - UnrealEngine - LiveLink - FeatureMapping - PyTorch - AudioToFace --- # NeuroSync Audio-to-Face Blendshape Transformer Model ## 25/11/24 Update Added correct timecode format and option to remove the emotion dimensions to the CSV creation in [NeuroSync Player (Unreal Engine LiveLink)](https://github.com/AnimaVR/NeuroSync_Player) - now you can just drag and drop generated audio and face data files into Unreal directly and apply it to a metahuman (if you don't need to use livelink and just want the animations generated.) Make sure to set adding emotion dimensions to false in the csv creator in utils>csv to ensure only the first 61 dimensions are added if you want to use it this way. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64ad37822a530cbdee7ce10b/r5DBmiQ4XwlMd8JW4GEz-.png) ## Model Overview The **NeuroSync audio-to-face blendshape transformer seq2seq model** is designed to transform sequences of audio features into corresponding facial blendshape coefficients. This enables facial animation from audio input, making it useful for real-time character animation, including integration with Unreal Engine via LiveLink. ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64ad37822a530cbdee7ce10b/35IP0CtVzzllXxOwf2f51.jpeg) The model maps sequences of 128 frames of audio features to facial blendshapes used for character animation. By leveraging a transformer-based encoder-decoder architecture, it generates highly accurate blendshape coefficients that can be streamed to Unreal Engine 5 using LiveLink, ensuring real-time synchronization between audio and facial movements. ## Features - **Audio-to-Face Transformation**: Converts raw audio features into facial blendshape coefficients for driving facial animations. - **Transformer Seq2Seq Architecture**: Uses transformer encoder-decoder layers to capture complex dependencies between audio features and facial expressions. - **Integration with Unreal Engine (LiveLink)**: Supports real-time streaming of generated facial blendshapes into Unreal Engine 5 through the [NeuroSync Player](https://github.com/AnimaVR/NeuroSync_Player) using LiveLink. - **Non-Commercial License**: This model is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). ## YouTube Channel For more updates on the progress of training, development of the tools, and tutorials on how to use the NeuroSync model, check out our [YouTube channel](https://www.youtube.com/@animaai_mai). Stay tuned for insights into the ongoing development and enhancements related to the model and its integration with tools like Unreal Engine and LiveLink. ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64ad37822a530cbdee7ce10b/f2EBvDJEmtsCwPJvyDcxl.jpeg) [YouTube channel](https://www.youtube.com/@animaai_mai) ## Usage You can set up the local API for this model using the [NeuroSync Local API repository](https://github.com/AnimaVR/NeuroSync_Local_API). This API allows you to process audio files and stream the generated blendshapes to Unreal Engine using [NeuroSync Player (Unreal Engine LiveLink)](https://github.com/AnimaVR/NeuroSync_Player). ### **Non-Local API Option (Alpha Access)** If you prefer not to host the model locally, you can apply for access to the **NeuroSync Alpha API**, which enables non-local usage. This allows you to connect directly with the [NeuroSync Player (Unreal Engine LiveLink)](https://github.com/AnimaVR/NeuroSync_Player) and stream facial blendshapes without running the local model. To apply for access to the alpha API, visit [neurosync.info](https://neurosync.info). ## Model Architecture The model consists of: - **Encoder**: A transformer encoder that processes audio features and applies positional encodings to capture temporal relationships. - **Decoder**: A transformer decoder with cross-attention, which attends to the encoder outputs and generates the corresponding blendshape coefficients. - **Blendshape Output**: The output consists of 52 blendshape coefficients used for facial animations (some coefficients, such as head movements and tongue movements, are excluded from being sent to LiveLink). ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64ad37822a530cbdee7ce10b/rptYcl8W7i3XnCCDPUVVL.jpeg) ### Blendshape Coefficients The model outputs 61 blendshape coefficients including: - **Eye movements** (e.g., EyeBlinkLeft, EyeSquintRight) - **Jaw movements** (e.g., JawOpen, JawRight) - **Mouth movements** (e.g., MouthSmileLeft, MouthPucker) - **Brow movements** (e.g., BrowInnerUp, BrowDownLeft) - **Cheek and nose movements** (e.g., CheekPuff, NoseSneerRight) Currently, coefficients 52 to 68 should be ignored (or used to drive additive sliders) as they pertain to head movements and emotional states (e.g., Angry, Happy, Sad), and they are not streamed into LiveLink. ## License This model is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). You may use, adapt, and share this model for non-commercial purposes, but you must give appropriate credit. For more details, see the [Creative Commons BY-NC 4.0 License](https://creativecommons.org/licenses/by-nc/4.0/). ## References - [NeuroSync Local API](https://github.com/AnimaVR/NeuroSync_Local_API) - [NeuroSync Player (Unreal Engine LiveLink)](https://github.com/AnimaVR/NeuroSync_Player) - [Apply for Alpha API Access](https://neurosync.info) For any questions or further support, please feel free to contribute to the repository or raise an issue.