--- license: apache-2.0 language: - ja - en pipeline_tag: text-generation datasets: - NTQAI/sharegpt-clean-ja --- # chatntq-7b-jpntuned Card ## Model Details chatntq-7b-jpntuned is a chat assistant trained by fine-tuning [BlinkDL/rwkv-4-world](https://huggingface.co/BlinkDL/rwkv-4-world) on user-shared conversations collected from ShareGPT. - **Developed by:** [NTQAI](https://huggingface.co/NTQAI) - **Model type:** An auto-regressive language model based on the transformer architecture. - **License:** Commercial license - **Finetuned from model:** [BlinkDL/rwkv-4-world/JPNtuned-7B-v1](https://huggingface.co/BlinkDL/rwkv-4-world/blob/main/RWKV-4-World-JPNtuned-7B-v1-OnlyForTest_76%25_trained-20230714-ctx4096.pth). ## Uses ```python import os, gc, copy, torch import gradio as gr os.environ["RWKV_JIT_ON"] = '1' os.environ["RWKV_CUDA_ON"] = '1' from rwkv.model import RWKV model_path = "chatntq-7b-jpntuned/ChatNTQ-7B-RWKV-world-JPNtuned-ctx2048.pth" WORD_NAME = "rwkv_vocab_v20230424" # copy rwkv_vocab_v20230424.txt in ChatNTQ-7B-Japanese to the same folder test ctx_limit = 1024 model = RWKV(model=model_path, strategy='cuda fp16i8 *24 -> cuda fp16') from rwkv.utils import PIPELINE, PIPELINE_ARGS pipeline = PIPELINE(model, WORD_NAME) def generate_prompt(instruction): return f"\x00Human: {instruction}\x00Assistant: " def evaluate( prompt, token_count=1024, temperature=1.2, top_p=0.5, presencePenalty = 0.4, countPenalty = 0.4, ): args = PIPELINE_ARGS(temperature = max(0.2, float(temperature)), top_p = float(top_p), alpha_frequency = countPenalty, alpha_presence = presencePenalty, token_ban = [], # ban the generation of some tokens token_stop = [0,1]) # stop generation whenever you see any token here all_tokens = [] out_last = 0 out_str = '' occurrence = {} state = None prompt = generate_prompt(prompt) print(prompt) for i in range(int(token_count)): out, state = model.forward(pipeline.encode(prompt)[-ctx_limit:] if i == 0 else [token], state) for n in occurrence: out[n] -= (args.alpha_presence + occurrence[n] * args.alpha_frequency) token = pipeline.sample_logits(out, temperature=args.temperature, top_p=args.top_p) if token in args.token_stop: break all_tokens += [token] if token not in occurrence: occurrence[token] = 1 else: occurrence[token] += 1 tmp = pipeline.decode(all_tokens[out_last:]) if '\ufffd' not in tmp: out_str += tmp out_last = i + 1 gc.collect() torch.cuda.empty_cache() return out_str if __name__ == "__main__": question = "東京の人口はどれくらいですか?" response = evaluate(question) ``` ### Contact information For personal communication related to this project, please contact Nha Nguyen Van (nha282@gmail.com).