metadata

library_name: transformers
license: apache-2.0
language:
  - ru
base_model:
  - t-tech/T-lite-it-1.0
pipeline_tag: text-generation

T-lite-it-1.0_Q4_0

T-lite-it-1.0_Q4_0 is a quantized version of the T-lite-it-1.0 model, originally based on the Qwen 2.5 7B architecture and fine-tuned for Russian-language tasks. This version is optimized for memory-constrained environments, making it suitable for fine-tuning and inference on GPUs with as little as 8GB VRAM. The quantization was performed using BitsAndBytes, reducing the model to 4-bit precision.

Model Description

Language: Russian
Base Model: T-Lite-IT-1.0 (derived from Qwen 2.5 7B)
Quantization: 4-bit precision using BitsAndBytes
Tasks: Text generation, conversation, question answering, and chain-of-thought reasoning
Fine-Tuning Ready: Ideal for further fine-tuning in low-resource environments.
VRAM Requirements: Fine-tuning and inference possible with 8GB VRAM or more.

Usage

To load the model, ensure you have the required dependencies installed:

pip install transformers bitsandbytes

Then, load the model with the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "MilyaShams/T-lite-it-1.0_Q4_0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    load_in_4bit=True, 
    device_map="auto"
)

Fine-Tuning

The model is designed for fine-tuning with resource constraints. Use tools like Hugging Face's Trainer or peft (Parameter-Efficient Fine-Tuning) to adapt the model to specific tasks.

Example configuration for fine-tuning:

Batch Size: Adjust to fit within 8GB VRAM (e.g., batch_size=2).
Gradient Accumulation: Use to simulate larger batch sizes.

Model Card Authors

Milyausha Shamsutdinova