FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models Paper • 2402.10986 • Published Feb 16 • 74
LayoutLM Collection The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA. • 5 items • Updated May 22 • 9
SpeechT5 Collection The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks. • 8 items • Updated May 22 • 14