This language model was finetuned with a dataset of 48k machine-translated Chinese instructions. For dataset description, inference examples and other details, see: https://github.com/magichub-opensource/CLAM-Conversational-Language-AI-from-MagicData.

模型推理

  • 单卡加载一个模型需要15G显存。
  • 本地测试环境:py310-torch1.13.1-cuda11.6-cudnn8

Web Demo

我们使用 text-generation-webui 开源项目搭建的 demo 进行推理,得到文档中的对比样例。该demo支持在网页端切换模型、调整多种常见参数等。

实验环境:py310-torch1.13.1-cuda11.6-cudnn8

git clone https://github.com/oobabooga/text-generation-webui.git
cd text-generation-webui
pip install -r requirements.txt

# 建议使用软链接将模型绝对路径链至 `./models`。也可以直接拷贝进去。
ln -s ${model_dir_absolute_path} models/${model_name}

# 启动服务
python server.py --model ${model_name} --listen --listen-host 0.0.0.0 --listen-port ${port}

如果服务正常启动,就可以通过该端口访问服务了 ${server_ip}:${port}

Inference script

See https://github.com/magichub-opensource/CLAM-Conversational-Language-AI-from-MagicData/blob/master/inference.py

import os,sys,argparse
# os.environ['CUDA_VISIBLE_DEVICES'] = '1'
import torch
import re
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

# modelpath = 'models/Chinese-llama2-alpaca-7b' # local path
modelpath = 'MagicHub/Chinese-llama2-alpaca-7b' # huggingface repo

print(f'model path: {modelpath}')
model = AutoModelForCausalLM.from_pretrained(modelpath, device_map="cuda:0", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(modelpath, use_fast=False)

prompt = "歌剧和京剧的区别是什么?\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
generate_ids = model.generate(
        inputs.input_ids, do_sample=True, max_new_tokens=1024, top_k=10, top_p=0.1, temperature=0.5, repetition_penalty=1.18,
        eos_token_id=2, bos_token_id=1, pad_token_id=0, typical_p=1.0,encoder_repetition_penalty=1,
        )
response = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
cleaned_response = re.sub('^'+prompt,'', response)
print(f'输入:\n{prompt}\n')
print(f"输出:\n{cleaned_response}\n")
Downloads last month
40
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.