|
--- |
|
language: |
|
- en |
|
base_model: |
|
- microsoft/deberta-v3-base |
|
pipeline_tag: text-classification |
|
--- |
|
Binary classification model for ad-detection on QA Systems. |
|
|
|
## Sample usage |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
classifier_model_path = "jmvcoelho/ad-classifier-v0.1" |
|
tokenizer = AutoTokenizer.from_pretrained(classifier_model_path) |
|
model = AutoModelForSequenceClassification.from_pretrained(classifier_model_path) |
|
model.eval() |
|
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
model.to(device) |
|
|
|
def classify(passages): |
|
inputs = tokenizer( |
|
passages, padding=True, truncation=True, max_length=512, return_tensors="pt" |
|
) |
|
inputs = {k: v.to(device) for k, v in inputs.items()} |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
logits = outputs.logits |
|
predictions = torch.argmax(logits, dim=-1) |
|
return predictions.cpu().tolist() |
|
|
|
preds = classify(["sample_text_1", "sample_text_2"]) |
|
``` |
|
|
|
|
|
## Version |
|
|
|
- v0.0: Trained with the official data from Webis Generated Native Ads 2024 |
|
- **v0.1**: Trained with v0.0 data + new synthetic data |
|
|
|
|
|
## New Synthetic Data |
|
|
|
Objective: Given (query, answer) pair, generate new_answer which contains an advertisement. |
|
|
|
### Prompt: |
|
|
|
```python |
|
prompt = f""" |
|
Task: You are tasked with subtly integrating an advertisement into a search query response. The goal is to make the advertisement feel natural and helpful within the context of the response, not disruptive or overtly promotional. |
|
First, you should define the item to advertise. You should keep in mind the context of the query and original response. Consider the following advertisement qualities when choosing the product: |
|
|
|
* Product/Service Category. |
|
* Specific Advertisement Theme. |
|
* Key Selling Points/Qualities to Emphasize. |
|
|
|
Instructions for Integration: |
|
|
|
1. Contextual Relevance is Key: The advertisement must feel relevant to the search query and the existing response text. Think about how the advertised product/service genuinely relates to the user's needs implied by the query. |
|
|
|
2. Subtle and Natural Language: Use language that blends seamlessly with the original response. Avoid overly promotional or sales-oriented phrasing. Focus on informative and helpful language. |
|
|
|
3. Focus on Benefits, Not Just Features: Instead of directly listing qualities, rephrase them as benefits the user might gain in relation to the search query and response context. |
|
|
|
4. Strategic Placement: Choose the most natural and impactful location(s) within the response to subtly introduce the advertisement. This might involve: |
|
* Briefly modifying an existing sentence to subtly incorporate the advertisement. |
|
* Adding a short, relevant phrase or clause to an existing sentence. |
|
* In rare cases, adding a very short, contextual sentence (only if it feels truly natural). |
|
|
|
5. Maintain Original Meaning: Ensure the core meaning and factual accuracy of the original response remain unchanged. The advertisement should enhance, not distort, the original information. |
|
|
|
6. Review for Subtlety: Before returning the response, critically evaluate if the advertisement feels genuinely subtle and integrated. If it feels forced or obvious, refine your approach. |
|
|
|
Output: Return **only** the modified response with the subtly integrated advertisement. |
|
|
|
--- |
|
|
|
Search Query: {query} |
|
Original Response: |
|
|
|
{answer} |
|
|
|
Modified Response: |
|
""" |
|
``` |
|
|
|
### Obtaining (query, answer) pairs: |
|
|
|
- queries: Obtained from MS-MARCO V2.1 QA task. 150K subset of queries that are associated with a "well formed answer" |
|
- answer: Generated given the query. Model: Qwen2.5-7B-Instruct using RAG with 10 passages (from our model.) |
|
|
|
### Models used for generation |
|
|
|
|
|
Each model generated for 1/4th of the (query, answer) pairs |
|
- Gemma-2-9b-it |
|
- LLaMA-3.1-8B-Instruct |
|
- Mistral-7B-Instruct |
|
- Qwen2.5-7B-Instruct |