Edit model card

Sentence Transformer for Audit Retrieval Question-Answering (STAR-QA)

Sentence Transformer for Audit Retrieval Question-Answering (STAR-QA) is a fine-tuned sentence-transformers model based on ALL-MPNET-BASE-V2. It has been developed to produce high-performance embeddings for audit, risk-management, compliance and associated regulatory documents. The model maps sentence pairs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search as part of retrieval-augmented generation pipelines.

Evaluation Results

The model was evaluated on a held-out sample from the STAR-QA dataset (see below) using sentence-transformers.InformationRetrievalEvaluator. Reported metrics include cosine similarity of retrieved documents w/r/t ground truth P/R @ 3 candidates, as well as MRR @ 10, MAP @ 10 and NDCG @ 100. This fine-tuned model was also benchmarked against its base model using the same methodology.

Metric STAR-QA Score ALL-MPNET-BASE-V2 Score
Precision @ 3 0.315 0.215
Recall @ 3 0.324 0.223
MRR @ 10 0.887 0.578
NDCG @ 10 0.44 0.303
MAP @ 100 0.316 0.209

Training Data

The model was fine-tuned on a corpus of audit, risk-management, compliance and associated regulatory documents sourced from the public internet. Documents were cleaned and chunked into 2-sentence blocks. Each block was then sent to a state-of-the-art LLM with the following prompt: "Write a question about {document_topic} for which this is the answer: {block}"

The resulting question and its associated ground-truth answer (collectively a "pair") constitute a single training example for the fine-tuning step. The final model was fine-tuned on ~18K such pairs.

Training

The model was fine-tuned with the parameters:

DataLoader:

torch.utils.data.dataloader.DataLoader of length 634 with parameters:

{'batch_size': 16, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}

Loss:

sentence_transformers.losses.MultipleNegativesRankingLoss.MultipleNegativesRankingLoss with parameters:

{'scale': 20.0, 'similarity_fct': 'cos_sim'}

Parameters of the fit()-Method:

{
    "epochs": 1,
    "evaluation_steps": 50,
    "evaluator": "sentence_transformers.evaluation.InformationRetrievalEvaluator.InformationRetrievalEvaluator",
    "max_grad_norm": 1,
    "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
    "optimizer_params": {
        "lr": 2e-05
    },
    "scheduler": "WarmupLinear",
    "steps_per_epoch": null,
    "warmup_steps": 10000,
    "weight_decay": 0.01
}

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Normalize()
)

Citing & Authors

@misc{Theron_2024, 
  title={Sentence Transformer for Audit Retrieval Question-Answering (STAR-QA)},
  url={https://huggingface.co/dptrsa/STAR-QA},
  author={Theron, Daniel},
  year={2024},
  month={Feb}
}
Downloads last month
4
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.