thepowerfuldeez
/

Qwen2-1.5B-Summarize

text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Qwen2-1.5B-Summarize / README.md

George Grigorev

Upload Qwen2ForCausalLM

434ab72 verified 7 months ago

|

history blame contribute delete

2.12 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	tags:
	- axolotl
	pipeline_tag: summarization
	---


	---
	Qwen2-1.5B-Instruct finetuned on my own synthetic data for summarization task for 2 epochs

	More info on the project at my github: https://github.com/thepowerfuldeez/qwen2_1_5b_summarize

	### Usage

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-1.5B-Instruct")
	model = AutoModelForCausalLM.from_pretrained("thepowerfuldeez/Qwen2-1.5B-Summarize",
	bnb_4bit_compute_dtype=torch.bfloat16,
	load_in_4bit=True, attn_implementation="flash_attention_2")

	text = <YOUR_TEXT>
	messages = [
	{"role": "system", "content": "You are helpful AI assistant."},
	{"role": "user", "content": f"Summarize following text: \n{text}"},
	]
	input_ids = tokenizer.apply_chat_template(messages, return_tensors='pt')
	new_tokens = model.generate(input_ids, max_new_tokens=1024)[0][len(input_ids[0]):]
	summary = tokenizer.decode(new_tokens, skip_special_tokens=True)
	```

	### Dataset
	Train split is [here](https://huggingface.co/datasets/thepowefuldeez/Qwen-summarize-dataset-train)

	### Metrics

	#### BERTScore
	\|Model name \| Dataset size \| Result \|
	\| ------------------ \| ------------ \| ---------- \|
	\|Qwen2-1.5B-Instruct \| - \| 0.07 \|
	\|Qwen2-1.5B-Summarize\| 8000 \| 0.14 \|
	\|Qwen2-1.5B-Summarize\| 20500 \| In progress\|


	I have used BERTScore from [official](https://github.com/Tiiiger/bert_score/tree/master) implementation with `microsoft/deberta-xlarge-mnli` model.
	Then I sampled 32 inputs from test set (longer sentences to summarize) and generated summaries. I have reference summaries generated from stronger, Qwen2-72B-Instruct model, which I used as targets for metric.


	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)