ybelkada Muennighoff commited on
Commit
ae3a5ff
1 Parent(s): def5ca0

341b tokens model (#8)

Browse files

- Update to 341b tokens model (6a8e173a4090325a5767777454961e757bc35e89)
- Update README.md (3b418bceb8f965a83d93538583babba15934e7ab)


Co-authored-by: Niklas Muennighoff <Muennighoff@users.noreply.huggingface.co>

Files changed (2) hide show
  1. README.md +5 -7
  2. pytorch_model.bin +2 -2
README.md CHANGED
@@ -52,8 +52,6 @@ language:
52
  pipeline_tag: text-generation
53
  ---
54
 
55
- # <span style="color:red"><b>WARNING:</b> This is an <b>intermediary checkpoint</b>. It is not fully trained yet. You might want to use [Bloom-1B3](https://huggingface.co/bigscience/bloom-1b3) if you want a model that has completed training.</span>
56
-
57
  <h1 style='text-align: center '>BLOOM LM</h1>
58
  <h2 style='text-align: center '><em>BigScience Large Open-science Open-access Multilingual Language Model</em> </h2>
59
  <h3 style='text-align: center '>Model Card</h3>
@@ -455,7 +453,7 @@ Includes:
455
  And multiple different metrics for specific tasks. _(More evaluation metrics forthcoming upon completion of evaluation protocol.)_
456
 
457
  ### Factors
458
- *This section lists some different aspects of what BLOOM models. Its focus is on those aspects that are likely to give rise to high variance in model behavior.*
459
 
460
  - Language, such as English or Yoruba
461
 
@@ -470,11 +468,11 @@ And multiple different metrics for specific tasks. _(More evaluation metrics for
470
 
471
  As of 25.May.2022, 15:00 PST:
472
 
473
- - Training Loss: 2.0
474
 
475
- - Validation Loss: 2.2
476
 
477
- - Perplexity: 8.9
478
 
479
  (More evaluation scores forthcoming at the end of model training.)
480
 
@@ -563,5 +561,5 @@ Initial prompting experiments using interim checkpoints: https://huggingface.co/
563
  ## Model Card Authors
564
  *Ordered roughly chronologically and by amount of time spent.*
565
 
566
- Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay
567
 
 
52
  pipeline_tag: text-generation
53
  ---
54
 
 
 
55
  <h1 style='text-align: center '>BLOOM LM</h1>
56
  <h2 style='text-align: center '><em>BigScience Large Open-science Open-access Multilingual Language Model</em> </h2>
57
  <h3 style='text-align: center '>Model Card</h3>
 
453
  And multiple different metrics for specific tasks. _(More evaluation metrics forthcoming upon completion of evaluation protocol.)_
454
 
455
  ### Factors
456
+ *This section lists some different aspects of BLOOM models. Its focus is on those aspects that are likely to give rise to high variance in model behavior.*
457
 
458
  - Language, such as English or Yoruba
459
 
 
468
 
469
  As of 25.May.2022, 15:00 PST:
470
 
471
+ - Training Loss: 2.7
472
 
473
+ - Validation Loss: 3.1
474
 
475
+ - Perplexity: 21.9
476
 
477
  (More evaluation scores forthcoming at the end of model training.)
478
 
 
561
  ## Model Card Authors
562
  *Ordered roughly chronologically and by amount of time spent.*
563
 
564
+ Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay, Niklas Muennighoff
565
 
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2b4d86036fedc35137a5955c8f72167923ed04b093ac1d3438488f2d9d598912
3
- size 2130730359
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae458039e2615e851d4e4e77f36869e1c6daf8ac00f6cfbe8a4c0252dc80de97
3
+ size 2130731319