---
license: apache-2.0
---
<div style="width: auto; margin-left: auto; margin-right: auto">
  <img src="Poster.jpg" alt="Sasvata" style="width: 100%; min-width: 400px; display: block; margin: auto;">
</div>

## Model description

- **Model type:** Llama-2 7B parameter model fine-tuned on [MOM-Summary](https://huggingface.co/datasets/sasvata/MOM-Summary)  datasets.
- **Language(s):** English
- **License:** Llama 2 Community License

- ### Important note regarding GGML files.

The GGML format has now been superseded by GGUF. As of August 21st 2023, [llama.cpp](https://github.com/ggerganov/llama.cpp) no longer supports GGML models. Third party clients and libraries are expected to still support it for a time, but many may also drop support.

Please use the GGUF models instead.
### About GGML

GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp) and libraries and UIs which support this format, such as:
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most popular web UI. Supports NVidia CUDA GPU acceleration.
* [KoboldCpp](https://github.com/LostRuins/koboldcpp), a powerful GGML web UI with GPU acceleration on all platforms (CUDA and OpenCL). Especially good for story telling.
* [LM Studio](https://lmstudio.ai/), a fully featured local GUI with GPU acceleration on both Windows (NVidia and AMD), and macOS.
* [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), a great web UI with CUDA GPU acceleration via the c_transformers backend.
* [ctransformers](https://github.com/marella/ctransformers), a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.

## Prompting Format

**Prompt Template Without Input**

```
{system_prompt}

### Instruction:
{instruction or query}

### Response:
{response}
```

## Provided files

| Name                                | Quant method | Bits | Size   | Use case                                                        |
|-------------------------------------|--------------|------|--------|-----------------------------------------------------------------|
| Llama-2-7b-MOM_Summar.Q2_K.gguf    | q2_K         | 2    | 2.53 GB| New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. |
| Llama-2-7b-MOM_Summar.Q4_K_S.gguf  | q4_K_S       | 4    | 2.95 GB| New k-quant method. Uses GGML_TYPE_Q4_K for all tensors        |