|
--- |
|
license: apache-2.0 |
|
language: |
|
- de |
|
- fr |
|
- en |
|
- ro |
|
base_model: |
|
- google/flan-t5-xxl |
|
|
|
library_name: llama.cpp |
|
|
|
tags: |
|
- llama.cpp |
|
|
|
--- |
|
|
|
# flan-t5-xxl-gguf |
|
## This is a quantized version of [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl/) |
|
![Google Original Model Architecture](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/flan2_architecture.jpg) |
|
|
|
|
|
|
|
|
|
|
|
## Usage/Examples |
|
|
|
```sh |
|
./llama-cli -m /path/to/file.gguf --prompt "your prompt" --n-gpu-layers nn |
|
``` |
|
nn --> numbers of layers to offload to gpu |
|
|
|
## Quants |
|
|
|
BITs | TYPE | |
|
--------|------------- | |
|
Q2 | Q2_K | |
|
Q3 | Q3_K, Q3_K_L, Q3_K_M, Q3_K_S | |
|
Q4 | Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S | |
|
Q5 | Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S | |
|
Q6 | Q6_K | |
|
Q8 | Q8_0 | |
|
|
|
#### Additional: |
|
BITs | TYPE/float | |
|
--------|------------- | |
|
16 | f16 | |
|
32 | f32 | |
|
|
|
|
|
|
|
## Disclaimer |
|
I don't claim any rights on this model. All rights go to google. |
|
## Acknowledgements |
|
|
|
- [Original model](https://huggingface.co/google/flan-t5-xxl/) |
|
- [Original README](https://huggingface.co/google/flan-t5-xxl/blob/main/README.md) |
|
- [Original license](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) |
|
|
|
|