File size: 2,698 Bytes

---
library_name: peft
base_model: TinyPixel/Llama-2-7B-bf16-sharded
license: apache-2.0
pipeline_tag: question-answering
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
I've fine-tuned a state-of-the-art Generative AI model using Hugging Face for customer support FAQ chat applications. This model is designed to provide accurate and helpful responses to frequently asked questions, making it a valuable tool for improving user experiences in customer support interactions. Its specialized training ensures it can understand and address a wide range of customer queries, making it an excellent choice for automating customer support tasks and enhancing overall efficiency.


## Model Details
I have implemented a sharded model TinyPixel/Llama-2–7B-bf16-sharded which involves dividing a large neural network model into multiple smaller pieces, typically more than 14 pieces in our case. This sharding strategy has proven to be highly beneficial when combined with the ‘accelerate’ framework

When a model is sharded, each shard represents a portion of the overall model’s parameters. Accelerate can then efficiently manage these shards by distributing them across various parts of the memory, including GPU memory and CPU memory. This dynamic allocation of shards allows us to work with very large models without requiring an excessive amount of memory

### Model Description

<!-- Provide a longer summary of what this model is. -->



- **Developed by:** [Tony Esposito]
- **Model type:** [LLama2 family]
- **License:** [Apache 2.0]
- **Finetuned from model [optional]:** [TinyPixel/Llama-2-7B-bf16-sharded]




<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]




## Training procedure


The following `bitsandbytes` quantization config was used during training:
- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: False
- bnb_4bit_compute_dtype: bfloat16

### Framework versions


- PEFT 0.7.0.dev0