|
--- |
|
license: cc |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- medical |
|
inference: false |
|
--- |
|
<!-- header start --> |
|
<div style="width: 100%;"> |
|
<img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;"> |
|
</div> |
|
<div style="display: flex; justify-content: space-between; width: 100%;"> |
|
<div style="display: flex; flex-direction: column; align-items: flex-start;"> |
|
<p><a href="https://discord.gg/Jq4vkcDakD">Chat & support: my new Discord server</a></p> |
|
</div> |
|
<div style="display: flex; flex-direction: column; align-items: flex-end;"> |
|
<p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p> |
|
</div> |
|
</div> |
|
<!-- header end --> |
|
|
|
# medalpaca-13B GPTQ 4bit |
|
|
|
This is a [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) 4bit quantisation of [medalpaca-13b](https://huggingface.co/medalpaca/medalpaca-13b). |
|
|
|
## GIBBERISH OUTPUT IN `text-generation-webui`? |
|
|
|
Please read the Provided Files section below. You should use `medalpaca-13B-GPTQ-4bit-128g.no-act-order.safetensors` unless you are able to use the latest Triton branch of GPTQ-for-LLaMa. |
|
|
|
## How to easily download and use this model in text-generation-webui |
|
|
|
Open the text-generation-webui UI as normal. |
|
|
|
1. Click the **Model tab**. |
|
2. Under **Download custom model or LoRA**, enter `TheBloke/medalpaca-13B-GPTQ-4bit`. |
|
3. Click **Download**. |
|
4. Wait until it says it's finished downloading. |
|
5. Click the **Refresh** icon next to **Model** in the top left. |
|
6. In the **Model drop-down**: choose the model you just downloaded,`medalpaca-13B-GPTQ-4bit`. |
|
7. If you see an error in the bottom right, ignore it - it's temporary. |
|
8. Fill out the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = 128`, `model_type = Llama` |
|
9. Click **Save settings for this model** in the top right. |
|
10. Click **Reload the Model** in the top right. |
|
11. Once it says it's loaded, click the **Text Generation tab** and enter a prompt! |
|
|
|
## Provided files |
|
|
|
Two files are provided. **The second file will not work unless you use a recent version of the Triton branch of GPTQ-for-LLaMa** |
|
|
|
Specifically, the second file uses `--act-order` for maximum quantisation quality and will not work with oobabooga's fork of GPTQ-for-LLaMa. Therefore at this time it will also not work with the CUDA branch of GPTQ-for-LLaMa, or `text-generation-webui` one-click installers. |
|
|
|
Unless you are able to use the latest GPTQ-for-LLaMa code, please use `medalpaca-13B-GPTQ-4bit-128g.no-act-order.safetensors` |
|
|
|
* `medalpaca-13B-GPTQ-4bit-128g.no-act-order.safetensors` |
|
* Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches |
|
* Works with text-generation-webui one-click-installers |
|
* Works on Windows |
|
* Parameters: Groupsize = 128g. No act-order. |
|
* Command used to create the GPTQ: |
|
``` |
|
CUDA_VISIBLE_DEVICES=0 python3 llama.py medalpaca-13b c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors medalpaca-13B-GPTQ-4bit-128g.no-act-order.safetensors |
|
``` |
|
* `medalpaca-13B-GPTQ-4bit-128g.safetensors` |
|
* Only works with the latest Triton branch of GPTQ-for-LLaMa |
|
* **Does not** work with text-generation-webui one-click-installers |
|
* **Does not** work on Windows |
|
* Parameters: Groupsize = 128g. act-order. |
|
* Offers highest quality quantisation, but requires recent GPTQ-for-LLaMa code |
|
* Command used to create the GPTQ: |
|
``` |
|
CUDA_VISIBLE_DEVICES=0 python3 llama.py medalpaca-13b c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors medalpaca-13B-GPTQ-4bit-128g.safetensors |
|
``` |
|
|
|
## How to run in `text-generation-webui` |
|
|
|
File `medalpaca-13B-GPTQ-4bit-128g.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui). |
|
|
|
[Instructions on using GPTQ 4bit files in text-generation-webui are here](https://github.com/oobabooga/text-generation-webui/wiki/GPTQ-models-\(4-bit-mode\)). |
|
|
|
The other `safetensors` model file was created with the latest GPTQ code, and uses `--act-order` to give the maximum possible quantisation quality, but this means it requires that the latest GPTQ-for-LLaMa is used inside the UI. |
|
|
|
If you want to use the act-order `safetensors` file and need to update the Triton branch of GPTQ-for-LLaMa, here are the commands I used to clone the Triton branch of GPTQ-for-LLaMa, clone text-generation-webui, and install GPTQ into the UI: |
|
``` |
|
# Clone text-generation-webui, if you don't already have it |
|
git clone https://github.com/oobabooga/text-generation-webui |
|
# Make a repositories directory |
|
mkdir text-generation-webui/repositories |
|
cd text-generation-webui/repositories |
|
# Clone the latest GPTQ-for-LLaMa code inside text-generation-webui |
|
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa |
|
``` |
|
|
|
Then install this model into `text-generation-webui/models` and launch the UI as follows: |
|
``` |
|
cd text-generation-webui |
|
python server.py --model medalpaca-13B-GPTQ-4bit --wbits 4 --groupsize 128 --model_type Llama # add any other command line args you want |
|
``` |
|
|
|
The above commands assume you have installed all dependencies for GPTQ-for-LLaMa and text-generation-webui. Please see their respective repositories for further information. |
|
|
|
If you can't update GPTQ-for-LLaMa to the latest Triton branch, or don't want to, you can use `medalpaca-13B-GPTQ-4bit-128g.no-act-order.safetensors` as mentioned above, which should work without any upgrades to text-generation-webui. |
|
|
|
<!-- footer start --> |
|
## Discord |
|
|
|
For further support, and discussions on these models and AI in general, join us at: |
|
|
|
[TheBloke AI's Discord server](https://discord.gg/Jq4vkcDakD) |
|
|
|
## Thanks, and how to contribute. |
|
|
|
Thanks to the [chirper.ai](https://chirper.ai) team! |
|
|
|
I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training. |
|
|
|
If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects. |
|
|
|
Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits. |
|
|
|
* Patreon: https://patreon.com/TheBlokeAI |
|
* Ko-Fi: https://ko-fi.com/TheBlokeAI |
|
|
|
**Patreon special mentions**: Aemon Algiz, Dmitriy Samsonov, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, Jonathan Leane, Talal Aujan, V. Lukas, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Sebastain Graf, Johann-Peter Hartman. |
|
|
|
Thank you to all my generous patrons and donaters! |
|
<!-- footer end --> |
|
# Original model card: MedAlpaca 13b |
|
|
|
|
|
## Table of Contents |
|
|
|
[Model Description](#model-description) |
|
- [Architecture](#architecture) |
|
- [Training Data](#trainig-data) |
|
[Model Usage](#model-usage) |
|
[Limitations](#limitations) |
|
|
|
## Model Description |
|
### Architecture |
|
`medalpaca-13b` is a large language model specifically fine-tuned for medical domain tasks. |
|
It is based on LLaMA (Large Language Model Meta AI) and contains 13 billion parameters. |
|
The primary goal of this model is to improve question-answering and medical dialogue tasks. |
|
|
|
### Training Data |
|
The training data for this project was sourced from various resources. |
|
Firstly, we used Anki flashcards to automatically generate questions, |
|
from the front of the cards and anwers from the back of the card. |
|
Secondly, we generated medical question-answer pairs from [Wikidoc](https://www.wikidoc.org/index.php/Main_Page). |
|
We extracted paragraphs with relevant headings, and used Chat-GPT 3.5 |
|
to generate questions from the headings and using the corresponding paragraphs |
|
as answers. This dataset is still under development and we believe |
|
that approximately 70% of these question answer pairs are factual correct. |
|
Thirdly, we used StackExchange to extract question-answer pairs, taking the |
|
top-rated question from five categories: Academia, Bioinformatics, Biology, |
|
Fitness, and Health. Additionally, we used a dataset from [ChatDoctor](https://arxiv.org/abs/2303.14070) |
|
consisting of 200,000 question-answer pairs, available at https://github.com/Kent0n-Li/ChatDoctor. |
|
|
|
| Source | n items | |
|
|------------------------------|--------| |
|
| ChatDoc large | 200000 | |
|
| wikidoc | 67704 | |
|
| Stackexchange academia | 40865 | |
|
| Anki flashcards | 33955 | |
|
| Stackexchange biology | 27887 | |
|
| Stackexchange fitness | 9833 | |
|
| Stackexchange health | 7721 | |
|
| Wikidoc patient information | 5942 | |
|
| Stackexchange bioinformatics | 5407 | |
|
|
|
## Model Usage |
|
To evaluate the performance of the model on a specific dataset, you can use the Hugging Face Transformers library's built-in evaluation scripts. Please refer to the evaluation guide for more information. |
|
Inference |
|
|
|
You can use the model for inference tasks like question-answering and medical dialogues using the Hugging Face Transformers library. Here's an example of how to use the model for a question-answering task: |
|
|
|
```python |
|
|
|
from transformers import pipeline |
|
|
|
qa_pipeline = pipeline("question-answering", model="medalpaca/medalpaca-7b", tokenizer="medalpaca/medalpaca-7b") |
|
question = "What are the symptoms of diabetes?" |
|
context = "Diabetes is a metabolic disease that causes high blood sugar. The symptoms include increased thirst, frequent urination, and unexplained weight loss." |
|
answer = qa_pipeline({"question": question, "context": context}) |
|
print(answer) |
|
``` |
|
|
|
## Limitations |
|
The model may not perform effectively outside the scope of the medical domain. |
|
The training data primarily targets the knowledge level of medical students, |
|
which may result in limitations when addressing the needs of board-certified physicians. |
|
The model has not been tested in real-world applications, so its efficacy and accuracy are currently unknown. |
|
It should never be used as a substitute for a doctor's opinion and must be treated as a research tool only. |
|
|