Model Card for alokabhishek/Llama-2-7b-chat-hf-GGUF

This repo GGUF quantized version of Meta's meta-llama/Llama-2-7b-chat-hf model using llama.cpp.

Model Details

About GGUF quantization using llama.cpp

How to Get Started with the Model

Use the code below to get started with the model.

How to run from Python code

First install the package

# Base ctransformers with CUDA GPU acceleration
! pip install ctransformers[cuda]>=0.2.24
# Or with no GPU acceleration
# ! pip install ctransformers>=0.2.24
! pip install -U sentence-transformers
! pip install transformers huggingface_hub torch

Import

from ctransformers import AutoModelForCausalLM
from transformers import pipeline, AutoModel, AutoTokenizer
from sentence_transformers import SentenceTransformer
import os

Use a pipeline as a high-level helper


# Load LLM and Tokenizer


model_llama = AutoModelForCausalLM.from_pretrained(
    "alokabhishek/Llama-2-7b-chat-hf-GGUF",
    model_file="llama-2-7b-chat-hf.Q4_K_M.gguf", # replace Q4_K_M.gguf with Q5_K_M.gguf as needed
    model_type="llama", 
    gpu_layers=50, # Use `gpu_layers` to specify how many layers will be offloaded to the GPU.
    hf=True
)
tokenizer_llama = AutoTokenizer.from_pretrained(
    "alokabhishek/Llama-2-7b-chat-hf-GGUF", 
    use_fast=True
)



# Create a pipeline
pipe_llama = pipeline(model=model_llama, tokenizer=tokenizer_llama, task='text-generation')

prompt_llama = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."

output_llama = pipe_llama(prompt_llama, max_new_tokens=512)

print(output_llama[0]["generated_text"])

Uses

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Downloads last month
36
GGUF
Model size
6.74B params
Architecture
llama

4-bit

5-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including alokabhishek/Llama-2-7b-chat-hf-GGUF