HeyLucasLeao commited on
Commit
26a826f
1 Parent(s): 70cf665

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -4,7 +4,7 @@
4
  This is a finetuned version from GPT-Neo 125M by EletheurAI to Portuguese language.
5
 
6
  ##### Training data
7
- It was training from 227.382 selected texts from a PTWiki Dump. You can found all the data from here: https://archive.org/details/ptwiki-dump-20210520
8
 
9
  ##### Training Procedure
10
  Every text was passed through a GPT2-Tokenizer with bos and eos tokens to separate it, with max sequence length that the GPT-Neo could support. It was finetuned using the default metrics of the Trainer Class, available on the Hugging Face library.
@@ -45,7 +45,9 @@ sample_outputs = model.generate(generated,
45
 
46
  # Decoding and printing sequences
47
  for i, sample_output in enumerate(sample_outputs):
48
- print(">> Generated text {}\n\n{}".format(i+1, tokenizer.decode(sample_output.tolist())))
 
 
49
 
50
  # >> Generated text
51
  #Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 
4
  This is a finetuned version from GPT-Neo 125M by EletheurAI to Portuguese language.
5
 
6
  ##### Training data
7
+ It was training from 227,382 selected texts from a PTWiki Dump. You can found all the data from here: https://archive.org/details/ptwiki-dump-20210520
8
 
9
  ##### Training Procedure
10
  Every text was passed through a GPT2-Tokenizer with bos and eos tokens to separate it, with max sequence length that the GPT-Neo could support. It was finetuned using the default metrics of the Trainer Class, available on the Hugging Face library.
 
45
 
46
  # Decoding and printing sequences
47
  for i, sample_output in enumerate(sample_outputs):
48
+ print(">> Generated text {}\
49
+ \
50
+ {}".format(i+1, tokenizer.decode(sample_output.tolist())))
51
 
52
  # >> Generated text
53
  #Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.