RichardErkhov
/

unsloth_-_gemma-7b-bnb-4bit-8bits

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Edit model card

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Quantization made by Richard Erkhov.

Request more models

gemma-7b-bnb-4bit - bnb 8bits

Model creator: https://huggingface.co/unsloth/
Original model: https://huggingface.co/unsloth/gemma-7b-bnb-4bit/

Original model description:

language: - en license: apache-2.0 library_name: transformers tags: - unsloth - transformers - gemma - gemma-7b - bnb

Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory via Unsloth!

We have a Google Colab Tesla T4 notebook for Gemma 7b here: https://colab.research.google.com/drive/10NbwlsRChbma1v55m8LAPYG15uQv6HLo?usp=sharing

✨ Finetune for Free

All notebooks are beginner friendly! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.

Unsloth supports	Free Notebooks	Performance	Memory use
Gemma 7b	▶️ Start on Colab	2.4x faster	58% less
Mistral 7b	▶️ Start on Colab	2.2x faster	62% less
Llama-2 7b	▶️ Start on Colab	2.2x faster	43% less
TinyLlama	▶️ Start on Colab	3.9x faster	74% less
CodeLlama 34b A100	▶️ Start on Colab	1.9x faster	27% less
Mistral 7b 1xT4	▶️ Start on Kaggle	5x faster*	62% less
DPO - Zephyr	▶️ Start on Colab	1.9x faster	19% less

This conversational notebook is useful for ShareGPT ChatML / Vicuna templates.
This text completion notebook is for raw text. This DPO notebook replicates Zephyr.
* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.

Downloads last month: 6

Safetensors

Model size

4.78B params

Tensor type

F32

·

FP16

·

U8

·

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.