README.md · alokabhishek/Llama-2-7b-chat-hf-GGUF at main

Llama-2-7b-chat-hf-GGUF / README.md

alokabhishek

Updated Readme

5be5fa0 verified 8 months ago

preview code

raw

history blame contribute delete

3.58 kB

	---
	library_name: transformers
	license: llama2
	pipeline_tag: text-generation
	tags:
	- GGUF
	- llama-2
	- llama
	- meta
	- facebook
	- quantized
	- 7b
	---

	# Model Card for alokabhishek/Llama-2-7b-chat-hf-GGUF

	<!-- Provide a quick summary of what the model is/does. -->
	This repo GGUF quantized version of Meta's meta-llama/Llama-2-7b-chat-hf model using llama.cpp.


	## Model Details

	- Model creator: [Meta](https://huggingface.co/meta-llama)
	- Original model: [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)


	### About GGUF quantization using llama.cpp

	- llama.cpp github repo: [llama.cpp github repo](https://github.com/ggerganov/llama.cpp)



	# How to Get Started with the Model

	Use the code below to get started with the model.


	## How to run from Python code

	#### First install the package
	```shell
	# Base ctransformers with CUDA GPU acceleration
	! pip install ctransformers[cuda]>=0.2.24
	# Or with no GPU acceleration
	# ! pip install ctransformers>=0.2.24
	! pip install -U sentence-transformers
	! pip install transformers huggingface_hub torch

	```

	# Import

	```python
	from ctransformers import AutoModelForCausalLM
	from transformers import pipeline, AutoModel, AutoTokenizer
	from sentence_transformers import SentenceTransformer
	import os
	```

	# Use a pipeline as a high-level helper

	```python

	# Load LLM and Tokenizer


	model_llama = AutoModelForCausalLM.from_pretrained(
	"alokabhishek/Llama-2-7b-chat-hf-GGUF",
	model_file="llama-2-7b-chat-hf.Q4_K_M.gguf", # replace Q4_K_M.gguf with Q5_K_M.gguf as needed
	model_type="llama",
	gpu_layers=50, # Use `gpu_layers` to specify how many layers will be offloaded to the GPU.
	hf=True
	)
	tokenizer_llama = AutoTokenizer.from_pretrained(
	"alokabhishek/Llama-2-7b-chat-hf-GGUF",
	use_fast=True
	)



	# Create a pipeline
	pipe_llama = pipeline(model=model_llama, tokenizer=tokenizer_llama, task='text-generation')

	prompt_llama = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."

	output_llama = pipe_llama(prompt_llama, max_new_tokens=512)

	print(output_llama[0]["generated_text"])

	```

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	### Direct Use

	<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

	[More Information Needed]

	### Downstream Use [optional]

	<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

	[More Information Needed]

	### Out-of-Scope Use

	<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

	[More Information Needed]

	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->

	[More Information Needed]


	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data, Factors & Metrics

	#### Testing Data

	<!-- This should link to a Dataset Card if possible. -->

	[More Information Needed]

	#### Factors

	<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

	[More Information Needed]

	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	[More Information Needed]

	### Results

	[More Information Needed]


	## Model Card Authors [optional]

	[More Information Needed]

	## Model Card Contact

	[More Information Needed]