earlyberts-seed1 / README.md

updated model card

d85b921 verified 25 days ago

4.49 kB

	---
	language: en
	tags:
	- multiberts
	- multiberts-seed_1
	license: mit
	datasets:
	- wikimedia/wikipedia
	- bookcorpus/bookcorpus
	base_model:
	- google/multiberts-seed_1-step_0k
	library_name: transformers
	---

	# EarlyBERTs

	Random Seed 1 \| Steps 10 – 40,000

	🐤 EarlyBERTs reproduces the [MultiBERTs](http://goo.gle/multiberts) ([Sellam et al., 2022](https://openreview.net/forum?id=K0E_F0gFDgA)), and introduces more granular checkpoints covering the initial and critical learning phases. In "The Subspace Chronicles" ([Müller-Eberstein et al., 2023](https://mxij.me/x/subspace-chronicles)), we leverage these checkpoints to study their early learning dynamics.

	This suite builds on MultiBERTs and the underlying BERT architecture, covering seeds 0 – 4 for which intermediate checkpoints were originallt released. For each seed, we provide 31 additional checkpoints for steps 10, 100, 200, ..., 1,000, 2,000, ..., 20,000, 40,000, which are stored as respective model revisions (e.g., `revision=step11000`).

	## Model Details

	Model Developers

	[Max Müller-Eberstein](https://mxij.me) as part of the [NLPnorth research unit](https://nlpnorth.github.io) at the [IT University of Copenhagen](https://itu.dk), Denmark.

	Variations

	EarlyBERTs cover seeds 0–4 (in respective repositories) and steps 10–40,000 (in respective model revision branches).

	Input

	Text only.

	Output

	Text and/or embeddings of the input.

	Additionally, the CLS-classification head is trained on next sentence prediction as in [Devlin et al. (2019)](https://aclanthology.org/N19-1423/).

	Model Architecture

	EarlyBERTs are based on the original BERT architecture [(Devlin et al., 2019)](https://aclanthology.org/N19-1423/), and loads the respective MultiBERTs seed at step 0 as initialization.

	Research Paper

	Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training ([Müller-Eberstein et al., 2023](https://mxij.me/x/subspace-chronicles)).

	## Training

	Data

	As both the original BERT as well as the MultiBERTs pre-training data are not publicly available, we gather a corresponding corpus using fully public versions of both the [English Wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) and [BookCorpus](https://huggingface.co/datasets/bookcorpus/bookcorpus). Scripts to re-create the exact data ordering, sentence pairing and subword masking can be found in [the project repository](http://mxij.me/x/emnlp-2023-code).

	Hyperparameters

	We replicate the exact training hyperparameters as in MultiBERTs, and document them in [our research paper](https://mxij.me/x/subspace-chronicles). Code to reproduce our training procedure can be found in [the project repository](http://mxij.me/x/emnlp-2023-code).

	## Usage

	Loading the intermediate checkpoint for a specific seed and step follows the standard HF API:

	```python
	from transformers import AutoTokenizer, AutoModel

	seed, step = 0, 7000

	tokenizer = AutoTokenizer.from_pretrained(f'personads/earlyberts-seed{seed}')
	model = AutoModel.from_pretrained(f'personads/earlyberts-seed{seed}', revision=f'step{step}')
	```

	## Citation

	If you find these models useful, please cite this, as well as the original MultiBERTs works:

	```
	@inproceedings{muller-eberstein-etal-2023-subspace,
	title = "Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training",
	author = {M{\"u}ller-Eberstein, Max and
	van der Goot, Rob and
	Plank, Barbara and
	Titov, Ivan},
	editor = "Bouamor, Houda and
	Pino, Juan and
	Bali, Kalika",
	booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
	month = dec,
	year = "2023",
	address = "Singapore",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2023.findings-emnlp.879",
	doi = "10.18653/v1/2023.findings-emnlp.879",
	pages = "13190--13208"
	}
	```

	```bibtex
	@inproceedings{
	sellam2022the,
	title={The Multi{BERT}s: {BERT} Reproductions for Robustness Analysis},
	author={Thibault Sellam and Steve Yadlowsky and Ian Tenney and Jason Wei and Naomi Saphra and Alexander D'Amour and Tal Linzen and Jasmijn Bastings and Iulia Raluca Turc and Jacob Eisenstein and Dipanjan Das and Ellie Pavlick},
	booktitle={International Conference on Learning Representations},
	year={2022},
	url={https://openreview.net/forum?id=K0E_F0gFDgA}
	}
	```