twlm
/

tw-pythia-6.9b-chat-v0_2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tw-pythia-6.9b-chat-v0_2 / README.md

zetavg's picture

Update README.md

e74ba9f over 1 year ago

|

history blame contribute delete

2.26 kB

	---
	datasets:
	- zetavg/ShareGPT-Processed
	- zetavg/coct-en-zh-tw-translations-twp-300k
	- zetavg/zh-tw-wikipedia
	- zetavg/tw-sinica-corpus-word-frequency
	- RyokoAI/ShareGPT52K
	language:
	- zh
	- en
	---
	# TW-Pythia-6.9B-Chat

	Taiwanese Mandarin Pythia Language Model, instruction-tuned for dialogue.

	Version 0.2

	## Model Details

	The TW-Pythia model is derived from the Apache-2.0-licenced [Pythia](https://github.com/EleutherAI/pythia) language model, with 8000 new Traditional Chinese tokens added, embed layers resized and re-trained.

	### Basics

	- Developed by: [@zetavg](https://github.com/zetavg) based on [EleutherAI](https://www.eleuther.ai/)'s [Pythia](https://github.com/EleutherAI/pythia) language model.
	- Model type: Transformer-based GPT-NeoX Causal Language Model
	- Languages: English, Traditional Chinese
	- License: Unknown due to unconfirmed usage license of the training data
	- Derived from model: [EleutherAI/pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b)

	### Model Sources

	- Repository: https://github.com/zetavg/twlm
	- Demo: See https://hackmd.io/@z/twlm-demo

	## Uses

	Currently, this model has not demonstrated any practical value in Traditional Chinese processing without further training, but it does possess some basic Chinese-English translation capabilities.

	## Training Details

	### Training Data

	* 200k [English ↔ Traditional Chinese Sentences from the COCT Database](zetavg/coct-en-zh-tw-translations-twp-300k).
	* ~8k English and Traditional Chinese mixed [ShareGPT data](zetavg/ShareGPT-Processed).

	### Training Procedure

	First, we build a BPE tokenizer based on the original Pythia tokenizer with 8000 new Traditional Chinese tokens added.

	Then, we resize the embedding layer of the `pythia-6.9b` model to accommodate the new vocabulary size, and we train only the input/output embedding layers to allow the model to learn the new Traditional Chinese words and phrases.

	At last, LoRA weights are added to the model and fine-tuned for instruction following.

	#### Training Hyperparameters

	- Training regime: `fp32`
	- See: https://github.com/zetavg/twlm/blob/main/configs/ta01_p7b.yaml

	### Hardware

	* 1xH100 80GB GPU on Lambda Cloud (with Skypilot), about 20h in total.