Description

This model is a 10.2 billion parameter model that combines two sets of 24 layers each from CALM2-7B using slerp-merge.

Note

This model is experimental and may not achieve expected performance without additional tuning.

Tutorial

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("sudy-super/baku-10b")
model = AutoModelForCausalLM.from_pretrained("sudy-super/baku-10b", device_map="auto", torch_dtype=torch.bfloat16)

prompt = "大規模言語モデルとは、"
token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
with torch.no_grad():
    output_ids = model.generate(
        token_ids.to(model.device),
        max_new_tokens=100,
        do_sample=True,
        temperature=0.8,
        pad_token_id=tokenizer.pad_token_id,
        bos_token_id=tokenizer.bos_token_id,
        eos_token_id=tokenizer.eos_token_id
    )
result = tokenizer.decode(output_ids.tolist()[0])

print(result)
Downloads last month
15
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.