---
title: README
emoji: 😻
colorFrom: yellow
colorTo: green
sdk: static
pinned: false
---

<p align="center">
    <a href="#readme">
        <img alt="logo" width="10%" src="https://mesolitica.com/images/mesolitica-transparent.png">
    </a>
</p>
    
1. Continously gather pretraining data Malaysian context, up to 200B tokens, https://huggingface.co/collections/mesolitica/malaysian-pretraining-dataset-66d6968e9e7dbd3be34b9630
2. Pretrain from scratch Multi-nodes training bare-metal or Kubernetes, we done up to 10x 8 A100 DGX nodes, https://huggingface.co/collections/mesolitica/mallam-6577b59d1e0b436ae75f930f
3. Generate synthetic massive Instruction finetuning dataset, from RAG, Function Call, up to 128k context length QA, https://huggingface.co/collections/mesolitica/malaysian-synthetic-dataset-656c2673fe7fe0b1e9e25fe2
4. Build multimodality dataset, we have Visual QA, Audio QA, Visual-Visual QA and Visual-Audio QA, https://huggingface.co/collections/mesolitica/multimodal-malaysian-dataset-653a16214037a1bc4417eb3a
5. Build multimodality model, https://huggingface.co/collections/mesolitica/multimodal-malaysian-llm-65c6f893e03f78fa9e5c8859
6. Experience in build continuous batching also we support vLLM development, https://github.com/mesolitica/transformers-openai-api https://github.com/mesolitica/vllm-whisper
7. Support static cache Encoder-Decoder for HuggingFace Transformers for 2x inference speed, https://github.com/mesolitica?q=static&type=all&language=&sort=
8. Context parallelism and currently developing this parallelism for vLLM, https://github.com/mesolitica/context-parallelism-xformers
9. Build massive pseudolabel speech recognition dataset with timestamp and postprocessing, https://huggingface.co/collections/mesolitica/speech-to-text-dataset-65425beb992ac570f0446a5c
10. Build Noisy Neural Translation Model using T5 SDPA Packing, https://huggingface.co/collections/mesolitica/malaysian-noisy-translation-657e5f88e6759943575a91ac
11. Want to serve real-time speech-to-speech with interruptable like GPT-4o? Websocket with GRPC backend to serve better streaming, we open source the JS widget, https://github.com/mesolitica/nous-chat-widget