A gradio custom component
Identify key entities in text
Generate a detailed image caption with highlighted entities
Generate and convert speech using text and audio inputs