kiddothe2b
/

hierarchical-transformer-base-4096-v2

Inference Endpoints

Model card Files Files and versions Community

hierarchical-transformer-base-4096-v2 / README.md

kiddothe2b's picture

Create README.md

f01568f almost 2 years ago

|

history blame contribute delete

No virus

2.03 kB

	---
	license: cc-by-sa-4.0
	pipeline_tag: fill-mask
	arxiv: 2210.05529
	language: en
	thumbnail: https://github.com/coastalcph/hierarchical-transformers/raw/main/data/figures/hat_encoder.png
	tags:
	- long-documents
	datasets:
	- c4
	model-index:
	- name: kiddothe2b/hierarchical-transformer-base-4096-v2
	results: []
	---

	# Hierarchical Attention Transformer (HAT) / hierarchical-transformer-base-4096-v2

	## Disclaimer 🚧 ⚠️
	This is an experimental version of HAT, trying to make HAT a native part of Transformers library. Please use ONLY [kiddothe2b/hierarchical-transformer-base-4096](https://huggingface.co/kiddothe2b/hierarchical-transformer-base-4096) for the moment.

	## Model description

	This is a Hierarchical Attention Transformer (HAT) model as presented in [An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification (Chalkidis et al., 2022)](https://arxiv.org/abs/2210.05529).

	The model has been warm-started re-using the weights of RoBERTa (Liu et al., 2019), and continued pre-trained for MLM in long sequences following the paradigm of Longformer released by Beltagy et al. (2020). It supports sequences of length up to 4,096.

	HAT uses hierarchical attention, which is a combination of segment-wise and cross-segment attention operations. You can think of segments as paragraphs or sentences.


	## Citing

	If you use HAT in your research, please cite:

	[An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification](https://arxiv.org/abs/2210.05529). Ilias Chalkidis, Xiang Dai, Manos Fergadiotis, Prodromos Malakasiotis, and Desmond Elliott. 2022. arXiv:2210.05529 (Preprint).

	```
	@misc{chalkidis-etal-2022-hat,
	url = {https://arxiv.org/abs/2210.05529},
	author = {Chalkidis, Ilias and Dai, Xiang and Fergadiotis, Manos and Malakasiotis, Prodromos and Elliott, Desmond},
	title = {An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification},
	publisher = {arXiv},
	year = {2022},
	}
	```