iandennismiller
/

LLama-2-MedText-13b-GGUF

Text Generation

Model card Files Files and versions Community

LLama-2-MedText-13b-GGUF / README.md

iandennismiller's picture

iandennismiller

disable inference, HF does not support GGUF

4a5404c about 1 year ago

|

history blame contribute delete

3.44 kB

	---
	base_model: https://huggingface.co/truehealth/LLama-2-MedText-13b
	inference: false
	license: cc-by-nc-4.0
	model_creator: truehealth
	model_name: LLama 2 MedText 13b
	model_type: llama
	prompt_template: '[INST]{prompt}[\INST]'
	quantized_by: iandennismiller
	pipeline_tag: text-generation
	tags:
	- medical
	---
	# LLama-2-MedText-13b-GGUF

	Quantized GGUF of https://huggingface.co/truehealth/LLama-2-MedText-13b

	## Usage

	Interactive [llama.cpp](https://github.com/ggerganov/llama.cpp/) session:

	```bash
	llama-cpp \
	--instruct \
	--color \
	--in-prefix "[INST] " \
	--in-suffix "[\INST] " \
	--model LLama-2-MedText-13b-q8_0.gguf

	== Running in interactive mode. ==
	- Press Ctrl+C to interject at any time.
	- Press Return to return control to LLaMa.
	- To return control without starting a new line, end your input with '/'.
	- If you want to submit another line, end your input with '\'.


	> [INST] How confident are you in your knowledge and abilities?
	[\INST] [RSP] As an AI language model, I can provide information to the best of my ability based on the resources I was trained on, which were primarily before <DATE>. While I strive to provide useful and accurate responses, my knowledge is not infinite, and I might not be able to provide professional medical advice or predictions in all cases. Additionally, healthcare decisions should always be evaluated in the context of an individual's unique circumstances and should be evaluated by a healthcare professional.
	```

	## Model card from truehealth/Llama-2-MedText-Delta-Preview

	Trained on https://huggingface.co/datasets/BI55/MedText.

	These are PEFT delta weights and need to be merged into LLama-2-13b to be used for inference.

	library_name: peft

	Training procedure

	The following bitsandbytes quantization config was used during training:

	- load_in_8bit: False
	- load_in_4bit: True
	- llm_int8_threshold: 6.0
	- llm_int8_skip_modules: None
	- llm_int8_enable_fp32_cpu_offload: False
	- llm_int8_has_fp16_weight: False
	- bnb_4bit_quant_type: nf4
	- bnb_4bit_use_double_quant: True
	- bnb_4bit_compute_dtype: float16

	Framework versions

	- PEFT 0.5.0.dev0

	## Setup Notes

	### Download torch model

	This example demonstrates using `hfdownloader` to download a torch model from HF to `./storage`

	```bash
	./hfdownloader -m truehealth/LLama-2-MedText-13b
	```

	If necessary, install `hfdownloader` from https://github.com/bodaay/HuggingFaceModelDownloader

	```bash
	bash <(curl -sSL https://raw.githubusercontent.com/bodaay/HuggingFaceModelDownloader/master/scripts/gist_gethfd.sh) -h
	```

	### Quantize torch model with llama.cpp

	Quantize directly to q8_0

	```bash
	llama.cpp/convert.py --outtype q8_0 --outfile LLama-2-MedText-13b-q8_0.gguf ./models/Storage/truehealth_LLama-2-MedText-13b/pytorch_model-00001-of-00003.bin
	```

	First convert to f32 GGUF

	```bash
	llama.cpp/convert.py --outtype f32 --outfile LLama-2-MedText-13b-f32.gguf ./models/Storage/truehealth_LLama-2-MedText-13b/pytorch_model-00001-of-00003.bin
	```

	Then quantize f32 GGUF to lower bit resolutions

	```bash
	llama.cpp/build/bin/quantize LLama-2-MedText-13b-f32.gguf LLama-2-MedText-13b-Q3_K_L.gguf Q3_K_L
	llama.cpp/build/bin/quantize LLama-2-MedText-13b-f32.gguf LLama-2-MedText-13b-Q6_K.gguf Q6_K
	```

	### Distributing model through huggingface

	```bash
	mkvirtualenv -p `which python3.11` -a . ${PWD##*/}
	python -m pip install huggingface_hub
	huggingface-cli login
	huggingface-cli lfs-enable-largefiles .
	```