Edit model card

MaLLaM πŸŒ™ 5B (Malaysia Large Language Model), Pretrain 5B 4096 context length on Malaysian text

Pretrain from scratch 5B parameters using Mistral architecture on 90B Malaysian text tokens.

README at https://github.com/mesolitica/malaya/tree/5.1/pretrained-model/mistral

WandB, https://wandb.ai/mesolitica/pretrain-mistral-5b?workspace=user-husein-mesolitica

WandB report, https://wandb.ai/mesolitica/pretrain-mistral-3b/reports/Pretrain-Larger-Malaysian-Mistral--Vmlldzo2MDkyOTgz

Technical report, https://github.com/mesolitica/malaya/wiki/MaLLaM-%F0%9F%8C%99-Malaysia-Large-Language-Model

how-to

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

TORCH_DTYPE = 'bfloat16'
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=getattr(torch, TORCH_DTYPE)
)

tokenizer = AutoTokenizer.from_pretrained('mesolitica/malaysian-mistral-5B-4096')
model = AutoModelForCausalLM.from_pretrained(
    'mesolitica/malaysian-mistral-5B-4096',
    use_flash_attention_2 = True,
    quantization_config = nf4_config
)
prompt = '<s>nama saya'
inputs = tokenizer([prompt], return_tensors='pt', add_special_tokens=False).to('cuda')

generate_kwargs = dict(
    inputs,
    max_new_tokens=512,
    top_p=0.95,
    top_k=50,
    temperature=0.9,
    do_sample=True,
    num_beams=1,
    repetition_penalty=1.05,
)
r = model.generate(**generate_kwargs)
Downloads last month
39
Safetensors
Model size
5B params
Tensor type
BF16
Β·
Inference API
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.

Space using mesolitica/mallam-5B-4096 1

Collection including mesolitica/mallam-5B-4096