Commit History

fix: distributed shampoo class
696422e

boris commited on

feat: update distributed_shampoo
5996680

boris commited on

feat(modeling): simplify abstract_init
fa72aa7

boris commited on

feat(train) - handle multiple nodes (#130)
0952927
unverified

boris commited on

feat: handle model parallel
1bb3269

boris commited on

feat(train): more custom x-axis
5f28cd2

boris commited on

feat(train): split artifact into model/state (#128)
7c4c287
unverified

boris commited on

fix: style
386f839

boris commited on

fix(train): opt_state_shape for distributed_shampoo
225b6ff

boris commited on

feat(train): split artifact into model/state
fa5b058

boris commited on

style(tokenizer): remove unused variables
605df32

boris commited on

feat: use fast tokenizer
767d78a

boris commited on

feat(train): another 25% faster
14abe8c

boris commited on

Merge pull request #127 from borisdayma/pjit-t5x
e4401dd
unverified

boris commited on

feat(train): overhead from 70% to 1% 🥳
2b7f5f1

boris commited on

feat(pjit): follow t5x style
7b5868f

boris commited on

fix(train): grads spec
00710bc

boris commited on

feat(train): improve pjit speed
f254058

boris commited on

fix(train): consider correct batch size
b7c7458

boris commited on

feat(train): custom start_preconditioning_step
8149924

boris commited on

feat(train): handle distributed_shampoo in pjit
032f623

boris commited on

feat: update distributed_shampoo + fix None spec
8a9e367

boris commited on

feat(train): distributed_shampoo with pjit
cc34d07

boris commited on

feat(train): use pjit (#125)
f5239e1
unverified

boris commited on

style: unsused import
7a176b9

boris commited on

fix style
f044cb8

boris commited on

feat(train): restore opt_state efficiently
1bfc1b5

boris commited on

feat(model): clean way to load on cpu
12f323d

boris commited on

feat(train): load model on CPU
3d43591

boris commited on

feat(train): different rng per node
2d212d8

boris commited on

feat(train): no batch dimension with pjit
df1fe19

boris commited on

feat(train): progress on pjit
49597a2

boris commited on

feat(train): start pjit support
0081723

boris commited on

feat: use_artifact if run existing
a5ed112

boris commited on

Load from wandb artifact (#121)
f69b21b
unverified

boris commited on

Style (isort).
f9d51f7

Pedro Cuenca commited on

feat(train): update sweep config
bbbf7c8

boris commited on

Use DalleBartTokenizer. State restoration reverted to previous method:
ae983d7

Pedro Cuenca commited on

Tokenizer, config, model can be loaded from wandb.
7e48337

Pedro Cuenca commited on

fix(train): variable not defined
4c87adf

boris commited on

feat(train): cleanup args
a2bf605

boris commited on

Merge pull request #122 from borisdayma/feat-acccum
c91ceb7
unverified

boris commited on

feat(data): support accumulation in non-streaming
88c8e06

boris commited on

refactor(train): cleanup
274ba73

boris commited on

feat: custom gradient accumulation
2d07559

boris commited on

fix: style
df01fa8

boris commited on

feat(train): use MultiSteps for gradient accumulation
4fa53a5

boris commited on

Change import order again.
2b2be9b

Pedro Cuenca commited on

Fix import order to make isort happy.
64d99b2

Pedro Cuenca commited on

Accept changes suggested by linter.
9f522b8

Pedro Cuenca commited on