neulab
/

omnitab-large

Table Question Answering

text2text-generation

Inference Endpoints

Model card Files Files and versions Community

omnitab-large / README.md

Zhengbao Jiang

init commit

7776a74 almost 2 years ago

|

history blame contribute delete

1.7 kB

	---
	language: en
	tags:
	- tapex
	- table-question-answering
	datasets:
	- wikitablequestions
	---

	# OmniTab

	OmniTab is a table-based QA model proposed in [OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering](https://arxiv.org/pdf/2207.03637.pdf). The original Github repository is [https://github.com/jzbjyb/OmniTab](https://github.com/jzbjyb/OmniTab).

	## Description

	`neulab/omnitab-large` (based on BART architecture) is initialized with `microsoft/tapex-large` and continuously pretrained on natural and synthetic data.

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
	import pandas as pd

	tokenizer = AutoTokenizer.from_pretrained("neulab/omnitab-large")
	model = AutoModelForSeq2SeqLM.from_pretrained("neulab/omnitab-large")

	data = {
	"year": [1896, 1900, 1904, 2004, 2008, 2012],
	"city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
	}
	table = pd.DataFrame.from_dict(data)

	query = "In which year did beijing host the Olympic Games?"
	encoding = tokenizer(table=table, query=query, return_tensors="pt")

	outputs = model.generate(**encoding)

	print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
	# [' 2008']
	```

	## Reference

	```bibtex
	@inproceedings{jiang-etal-2022-omnitab,
	title = "{O}mni{T}ab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering",
	author = "Jiang, Zhengbao and Mao, Yi and He, Pengcheng and Neubig, Graham and Chen, Weizhu",
	booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
	month = jul,
	year = "2022",
	}
	```