alokabhishek's picture
Updated Readme
5be5fa0 verified
---
library_name: transformers
license: llama2
pipeline_tag: text-generation
tags:
- GGUF
- llama-2
- llama
- meta
- facebook
- quantized
- 7b
---
# Model Card for alokabhishek/Llama-2-7b-chat-hf-GGUF
<!-- Provide a quick summary of what the model is/does. -->
This repo GGUF quantized version of Meta's meta-llama/Llama-2-7b-chat-hf model using llama.cpp.
## Model Details
- Model creator: [Meta](https://huggingface.co/meta-llama)
- Original model: [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
### About GGUF quantization using llama.cpp
- llama.cpp github repo: [llama.cpp github repo](https://github.com/ggerganov/llama.cpp)
# How to Get Started with the Model
Use the code below to get started with the model.
## How to run from Python code
#### First install the package
```shell
# Base ctransformers with CUDA GPU acceleration
! pip install ctransformers[cuda]>=0.2.24
# Or with no GPU acceleration
# ! pip install ctransformers>=0.2.24
! pip install -U sentence-transformers
! pip install transformers huggingface_hub torch
```
# Import
```python
from ctransformers import AutoModelForCausalLM
from transformers import pipeline, AutoModel, AutoTokenizer
from sentence_transformers import SentenceTransformer
import os
```
# Use a pipeline as a high-level helper
```python
# Load LLM and Tokenizer
model_llama = AutoModelForCausalLM.from_pretrained(
"alokabhishek/Llama-2-7b-chat-hf-GGUF",
model_file="llama-2-7b-chat-hf.Q4_K_M.gguf", # replace Q4_K_M.gguf with Q5_K_M.gguf as needed
model_type="llama",
gpu_layers=50, # Use `gpu_layers` to specify how many layers will be offloaded to the GPU.
hf=True
)
tokenizer_llama = AutoTokenizer.from_pretrained(
"alokabhishek/Llama-2-7b-chat-hf-GGUF",
use_fast=True
)
# Create a pipeline
pipe_llama = pipeline(model=model_llama, tokenizer=tokenizer_llama, task='text-generation')
prompt_llama = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."
output_llama = pipe_llama(prompt_llama, max_new_tokens=512)
print(output_llama[0]["generated_text"])
```
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
[More Information Needed]
### Downstream Use [optional]
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
[More Information Needed]
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
[More Information Needed]
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
[More Information Needed]
#### Factors
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
[More Information Needed]
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
[More Information Needed]
### Results
[More Information Needed]
## Model Card Authors [optional]
[More Information Needed]
## Model Card Contact
[More Information Needed]