smol_llama-101M-GQA-GGUF / README.md

afrideva

Upload README.md with huggingface_hub

627e53a 8 months ago

preview code

raw

history blame contribute delete

No virus

4.88 kB

	---
	base_model: BEE-spoke-data/smol_llama-101M-GQA
	datasets:
	- JeanKaddour/minipile
	- pszemraj/simple_wikipedia_LM
	- BEE-spoke-data/wikipedia-20230901.en-deduped
	- mattymchen/refinedweb-3m
	inference: false
	language:
	- en
	license: apache-2.0
	model_creator: BEE-spoke-data
	model_name: smol_llama-101M-GQA
	pipeline_tag: text-generation
	quantized_by: afrideva
	tags:
	- smol_llama
	- llama2
	- gguf
	- ggml
	- quantized
	- q2_k
	- q3_k_m
	- q4_k_m
	- q5_k_m
	- q6_k
	- q8_0
	thumbnail: https://i.ibb.co/TvyMrRc/rsz-smol-llama-banner.png
	widget:
	- example_title: El Microondas
	text: My name is El Microondas the Wise and
	- example_title: Kennesaw State University
	text: Kennesaw State University is a public
	- example_title: Bungie
	text: Bungie Studios is an American video game developer. They are most famous for
	developing the award winning Halo series of video games. They also made Destiny.
	The studio was founded
	- example_title: Mona Lisa
	text: The Mona Lisa is a world-renowned painting created by
	- example_title: Harry Potter Series
	text: The Harry Potter series, written by J.K. Rowling, begins with the book titled
	- example_title: Riddle
	text: 'Question: I have cities, but no houses. I have mountains, but no trees. I
	have water, but no fish. What am I?

	Answer:'
	- example_title: Photosynthesis
	text: The process of photosynthesis involves the conversion of
	- example_title: Story Continuation
	text: Jane went to the store to buy some groceries. She picked up apples, oranges,
	and a loaf of bread. When she got home, she realized she forgot
	- example_title: Math Problem
	text: 'Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
	and another train leaves Station B at 10:00 AM and travels at 80 mph, when will
	they meet if the distance between the stations is 300 miles?

	To determine'
	- example_title: Algorithm Definition
	text: In the context of computer programming, an algorithm is
	---
	# BEE-spoke-data/smol_llama-101M-GQA-GGUF

	Quantized GGUF model files for [smol_llama-101M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA) from [BEE-spoke-data](https://huggingface.co/BEE-spoke-data)


	\| Name \| Quant method \| Size \|
	\| ---- \| ---- \| ---- \|
	\| [smol_llama-101m-gqa.fp16.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-GGUF/resolve/main/smol_llama-101m-gqa.fp16.gguf) \| fp16 \| 203.28 MB \|
	\| [smol_llama-101m-gqa.q2_k.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-GGUF/resolve/main/smol_llama-101m-gqa.q2_k.gguf) \| q2_k \| 50.93 MB \|
	\| [smol_llama-101m-gqa.q3_k_m.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-GGUF/resolve/main/smol_llama-101m-gqa.q3_k_m.gguf) \| q3_k_m \| 57.06 MB \|
	\| [smol_llama-101m-gqa.q4_k_m.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-GGUF/resolve/main/smol_llama-101m-gqa.q4_k_m.gguf) \| q4_k_m \| 65.40 MB \|
	\| [smol_llama-101m-gqa.q5_k_m.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-GGUF/resolve/main/smol_llama-101m-gqa.q5_k_m.gguf) \| q5_k_m \| 74.34 MB \|
	\| [smol_llama-101m-gqa.q6_k.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-GGUF/resolve/main/smol_llama-101m-gqa.q6_k.gguf) \| q6_k \| 83.83 MB \|
	\| [smol_llama-101m-gqa.q8_0.gguf](https://huggingface.co/afrideva/smol_llama-101M-GQA-GGUF/resolve/main/smol_llama-101m-gqa.q8_0.gguf) \| q8_0 \| 108.35 MB \|



	## Original Model Card:
	# smol_llama-101M-GQA

	<img src="smol-llama-banner.png" alt="banner" style="max-width:95%; height:auto;">

	A small 101M param (total) decoder model. This is the first version of the model.

	- 768 hidden size, 6 layers
	- GQA (24 heads, 8 key-value), context length 1024
	- train-from-scratch

	## Notes

	This checkpoint is the 'raw' pre-trained model and has not been tuned to a more specific task. It should be fine-tuned before use in most cases.

	### Checkpoints & Links

	- _smol_-er 81M parameter checkpoint with in/out embeddings tied: [here](https://huggingface.co/BEE-spoke-data/smol_llama-81M-tied)
	- Fine-tuned on `pypi` to generate Python code - [link](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA-python)
	- For the chat version of this model, please [see here](https://youtu.be/dQw4w9WgXcQ?si=3ePIqrY1dw94KMu4)

	---


	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BEE-spoke-data__smol_llama-101M-GQA)

	\| Metric \| Value \|
	\|-----------------------\|---------------------------\|
	\| Avg. \| 25.32 \|
	\| ARC (25-shot) \| 23.55 \|
	\| HellaSwag (10-shot) \| 28.77 \|
	\| MMLU (5-shot) \| 24.24 \|
	\| TruthfulQA (0-shot) \| 45.76 \|
	\| Winogrande (5-shot) \| 50.67 \|
	\| GSM8K (5-shot) \| 0.83 \|
	\| DROP (3-shot) \| 3.39 \|