ad-classifier-v0.1 / README.md

Update README.md

98ae044 verified 12 days ago

4.17 kB

	---
	language:
	- en
	base_model:
	- microsoft/deberta-v3-base
	pipeline_tag: text-classification
	---
	Binary classification model for ad-detection on QA Systems.

	## Sample usage

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	classifier_model_path = "jmvcoelho/ad-classifier-v0.1"
	tokenizer = AutoTokenizer.from_pretrained(classifier_model_path)
	model = AutoModelForSequenceClassification.from_pretrained(classifier_model_path)
	model.eval()

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model.to(device)

	def classify(passages):
	inputs = tokenizer(
	passages, padding=True, truncation=True, max_length=512, return_tensors="pt"
	)
	inputs = {k: v.to(device) for k, v in inputs.items()}
	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits
	predictions = torch.argmax(logits, dim=-1)
	return predictions.cpu().tolist()

	preds = classify(["sample_text_1", "sample_text_2"])
	```


	## Version

	- v0.0: Trained with the official data from Webis Generated Native Ads 2024
	- v0.1: Trained with v0.0 data + new synthetic data


	## New Synthetic Data

	Objective: Given (query, answer) pair, generate new_answer which contains an advertisement.

	### Prompt:

	```python
	prompt = f"""
	Task: You are tasked with subtly integrating an advertisement into a search query response. The goal is to make the advertisement feel natural and helpful within the context of the response, not disruptive or overtly promotional.
	First, you should define the item to advertise. You should keep in mind the context of the query and original response. Consider the following advertisement qualities when choosing the product:

	* Product/Service Category.
	* Specific Advertisement Theme.
	* Key Selling Points/Qualities to Emphasize.

	Instructions for Integration:

	1. Contextual Relevance is Key: The advertisement must feel relevant to the search query and the existing response text. Think about how the advertised product/service genuinely relates to the user's needs implied by the query.

	2. Subtle and Natural Language: Use language that blends seamlessly with the original response. Avoid overly promotional or sales-oriented phrasing. Focus on informative and helpful language.

	3. Focus on Benefits, Not Just Features: Instead of directly listing qualities, rephrase them as benefits the user might gain in relation to the search query and response context.

	4. Strategic Placement: Choose the most natural and impactful location(s) within the response to subtly introduce the advertisement. This might involve:
	* Briefly modifying an existing sentence to subtly incorporate the advertisement.
	* Adding a short, relevant phrase or clause to an existing sentence.
	* In rare cases, adding a very short, contextual sentence (only if it feels truly natural).

	5. Maintain Original Meaning: Ensure the core meaning and factual accuracy of the original response remain unchanged. The advertisement should enhance, not distort, the original information.

	6. Review for Subtlety: Before returning the response, critically evaluate if the advertisement feels genuinely subtle and integrated. If it feels forced or obvious, refine your approach.

	Output: Return only the modified response with the subtly integrated advertisement.

	---

	Search Query: {query}
	Original Response:

	{answer}

	Modified Response:
	"""
	```

	### Obtaining (query, answer) pairs:

	- queries: Obtained from MS-MARCO V2.1 QA task. 150K subset of queries that are associated with a "well formed answer"
	- answer: Generated given the query. Model: Qwen2.5-7B-Instruct using RAG with 10 passages (from our model.)

	### Models used for generation


	Each model generated for 1/4th of the (query, answer) pairs
	- Gemma-2-9b-it
	- LLaMA-3.1-8B-Instruct
	- Mistral-7B-Instruct
	- Qwen2.5-7B-Instruct