Gemini-Distill-Qwen2.5-0.5B-ead-ONNX

Model Description

This repository contains ONNX-optimized versions of the Geraldine/Gemini-Distill-Qwen2.5-0.5B-ead model, distilled from Gemini-2.0-Flash-Thinking-Exp. This fine-tuned model is specifically designed for structured Encoded Archival Description (EAD/XML) reasoning and generation.

ONNX conversion enables faster inference on a variety of hardware, including CPUs, GPUs, and specialized inference accelerators.

Available ONNX Model Versions

The following ONNX quantized versions are provided for different inference needs:

File Name	Description
`model.onnx`	Full precision (fp32) version
`model_fp16.onnx`	Half precision (fp16) for optimized GPU inference
`model_bnb4.onnx`	Bitsandbytes 4-bit quantization
`model_int8.onnx`	8-bit integer quantization for efficient CPU inference
`model_q4.onnx`	4-bit quantization (for low-memory scenarios)
`model_q4f16.onnx`	4-bit quantization with fp16 fallback
`model_uint8.onnx`	Unsigned 8-bit quantization
`model_quantized.onnx`	General quantized model for mixed precision

How to Use the ONNX Model

1. Install Dependencies

Ensure you have the required dependencies for ONNX inference:

pip install onnxruntime

For GPU acceleration, install:

pip install onnxruntime-gpu

2. Load and Run Inference

You can use onnxruntime to load and run inference with the model:

import onnxruntime as ort
import numpy as np

# Load the ONNX model
session = ort.InferenceSession("model_fp16.onnx", providers=["CUDAExecutionProvider"])

# Prepare input data (example)
input_data = {"input_ids": np.array([[...]])}  # Replace with tokenized input

# Run inference
outputs = session.run(None, input_data)

# Print output
print(outputs)

Why ONNX?

Faster Inference: Optimized execution across different hardware.
Cross-Platform Compatibility: Run on CPUs, GPUs, and specialized accelerators.
Reduced Memory Usage: Quantized versions provide significant efficiency gains.

Citation & Acknowledgments

If you use this model in research or production, please cite:

@misc{your-citation,
  author = {Géraldine Geoffroy},
  title = {Gemini-Distill-Qwen2.5-0.5B-ead-ONNX},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Geraldine/Gemini-Distill-Qwen2.5-0.5B-ead-ONNX}
}

Geraldine
/

Gemini-Distill-Qwen2.5-0.5B-ead-ONNX