--- license: mit language: - zh - en --- We fine-tuned our ChemGPT2-QA-72B based on the Qwen2-72B-Instruct model. Our training data, ChemGPT-2.0-Data, has been open-sourced and is available at https://huggingface.co/datasets/ALmonster/ChemGPT-2.0-Data. We evaluated our model on the three chemistry tasks of C-Eval and compared it with GPT-3.5 and GPT-4. The results are as follows: ## C-Eval | Models | college_chemistry | high_school_chemistry | middle_school_chemistry | AVG | |--------|-------------------|-----------------------|-------------------------|-----| | GPT-3.5 | 0.397 | 0.529 | 0.714 | 0.54666667 | | GPT4 | 0.594 | 0.558 | 0.811 | 0.65433333 | | chemgpt| 0.71 | 0.936 | 0.995 | 0.88033333 | ## Quickstart Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents. ```python from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" # the device to load the model onto model = AutoModelForCausalLM.from_pretrained( "ALmonster/ChemGPT2-QA-72B", torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("ALmonster/ChemGPT2-QA-72B") prompt = "Give me a short introduction to large language model." messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(device) generated_ids = model.generate( model_inputs.input_ids, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ## VLLM We recommend deploying our model using 4 A100 GPUs. You can run the vllm server-side with the following code in terminal: ```python python -m vllm.entrypoints.openai.api_server --served-model-name chemgpt --model path/to/chemgpt --gpu-memory-utilization 0.98 --tensor-parallel-size 4 --port 6000 ``` Then, you can use the following code to deploy client-side: ```python import requests import json def general_chemgpt_stream(inputs,history): url = 'http://loaclhost:6000/v1/chat/completions' history+=[{"role": "user", "content": inputs},] data = { "model": "chemgpt", "messages": history, } headers = { 'Content-Type': 'application/json' } response = requests.post(url, headers=headers, data=json.dumps(data)) headers = {"User-Agent": "vLLM Client"} pload = { "model": "chemgpt", "stream": True, "messages": history } response = requests.post(url, headers=headers, json=pload, stream=True) for chunk in response.iter_lines(chunk_size=1, decode_unicode=False, delimiter=b"\n"): if chunk: string_data = chunk.decode("utf-8") try: json_data = json.loads(string_data[6:]) delta_content = json_data["choices"][0]["delta"]["content"] assistant_reply+=delta_content yield delta_content except KeyError as e: delta_content = json_data["choices"][0]["delta"]["role"] except json.JSONDecodeError as e: history+=[{ "role": "assistant", "content": assistant_reply, "tool_calls": [] },] delta_content='[DONE]' assert '[DONE]'==chunk.decode("utf-8")[6:] inputs='介绍一下NaoH' history_chem=[] for response_text in general_chemgpt_stream(inputs,history_chem): print(response_text,end='') ```