Cassandre-RAG
Cassandre-RAG is a fine-tuned llama-3.1-8b model, built for RAG on French administrative documents, with a focus on sources from school administration.
Training Data
The model was fine-tuned on a specialized corpus consisting of:
- Synthetic queries: Generated from chunks of text extracted from French administrative documents.
- Retrieved documents: For each synthetic query, relevant documents were retrieved using the BM25 ranking algorithm.
- Generated answers: Responses to the synthetic queries were created based on the retrieved documents.
Training Hyperparameters:
Max Steps: 3000
Learning Rate: 3e-4
Batch Size: 2 per device
Gradient Accumulation Steps: 4
Max Sequence Length: 8192
Weight Decay: 0.001
Warmup Ratio: 0.03
LR Scheduler: Linear
Optimizer: paged_adamw_32bit
LoRA Configuration:
LoRA Alpha: 16
LoRA Dropout: 0.1
LoRA R: 64
Target Modules:
- gate_proj
- down_proj
- up_proj
- q_proj
- v_proj
- k_proj
- o_proj
Quantization:
Quantization: 4-bit
Quantization Type: nf4
Compute Dtype: float16
## Usage
Cassandre-RAG uses a custom syntax for parsing sources and generating sourced output.
Each source should be preceded by an ID encapsulated in double asterisks (e.g., \*\*SOURCE_ID\*\*).
### Example Usage
```python
import pandas as pd
from vllm import LLM, SamplingParams
# Load the model
model_name = "PleIAs/Cassandre-RAG"
llm = LLM(model_name, max_model_len=8128)
# Set sampling parameters
sampling_params = SamplingParams(
temperature=0.7,
top_p=0.95,
max_tokens=3000,
presence_penalty=1.2,
stop=["#END#"]
)
# Prepare the input data
def prepare_prompt(query, sources):
sources_text = "\n\n".join([f"**{src_id}**\n{content}" for src_id, content in sources])
return f"### Query ###\n{query}\n\n### Source ###\n{sources_text}\n\n### Analysis ###\n"
# Example query and sources
query = "Quelles sont les procédures pour inscrire un enfant à l'école primaire?"
sources = [
("SOURCE_001", "L'inscription à l'école primaire se fait généralement à la mairie..."),
("SOURCE_002", "Les documents nécessaires pour l'inscription scolaire incluent..."),
]
# Prepare the prompt
prompt = prepare_prompt(query, sources)
# Generate the response
outputs = llm.generate([prompt], sampling_params)
generated_text = outputs[0].outputs[0].text
print("Query:", query)
print("\nGenerated Response:")
print(generated_text)