ekshat
/

zephyr_7b_q4_k_m

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ekshat commited on Jun 25

Commit

1d5b7bf

•

1 Parent(s): 78294ad

Update README.md

Files changed (1) hide show

README.md +11 -7

README.md CHANGED Viewed

@@ -1,20 +1,24 @@
 ---
 license: apache-2.0
 ---
 This repository hosts both the standard and quantized versions of the Zephyr 7B model, allowing users to choose the version that best fits their resource constraints and performance needs.
 # Model Details
-## Model Name: Zephyr 7B
-## Model Size: 7 billion parameters
-## Architecture: Transformer-based
-## Languages: Primarily English, with support for multilingual text
-## Quantized Version: Available for reduced memory footprint and faster inference
 # Performance and Efficiency
 The quantized version of Zephyr 7B is optimized for environments with limited computational resources. It offers:
-## Reduced Memory Usage: The model size is significantly smaller, making it suitable for deployment on devices with limited RAM.
-## Faster Inference: Quantized models can perform faster inference, providing quicker responses in real-time applications.
 # Fine-Tuning
 You can fine-tune the Zephyr 7B model on your own dataset to better suit specific tasks or domains. Refer to the Huggingface documentation for guidance on how to fine-tune transformer models.

 ---
 license: apache-2.0
+language:
+- en
+library_name: transformers
+pipeline_tag: text-generation
 ---
 This repository hosts both the standard and quantized versions of the Zephyr 7B model, allowing users to choose the version that best fits their resource constraints and performance needs.
 # Model Details
+### Model Name: Zephyr 7B
+### Model Size: 7 billion parameters
+### Architecture: Transformer-based
+### Languages: Primarily English, with support for multilingual text
+### Quantized Version: Available for reduced memory footprint and faster inference
 # Performance and Efficiency
 The quantized version of Zephyr 7B is optimized for environments with limited computational resources. It offers:
+### Reduced Memory Usage: The model size is significantly smaller, making it suitable for deployment on devices with limited RAM.
+### Faster Inference: Quantized models can perform faster inference, providing quicker responses in real-time applications.
 # Fine-Tuning
 You can fine-tune the Zephyr 7B model on your own dataset to better suit specific tasks or domains. Refer to the Huggingface documentation for guidance on how to fine-tune transformer models.