anezatra commited on
Commit
c9a857a
1 Parent(s): 62ccd52

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -1
README.md CHANGED
@@ -21,7 +21,6 @@ Architecturally akin to its antecedent GPT-1 and progeny GPT-3 and GPT-4, GPT-2
21
  The transformer architecture provides a capability that allows GPT models to be trained on larger datasets compared to previous NLP (natural language processing) models. The GPT-1 model demonstrated the validity of this approach; however, GPT-2 aimed to further investigate the emergent properties of networks trained on extremely large datasets. CommonCrawl, a large corpus previously used to train NLP systems, was considered due to its extensive size. However, further examination revealed that much of the content was unintelligible. Consequently, OpenAI developed a new dataset called WebText. Instead of indiscriminately scraping content from the World Wide Web, WebText collected content only from pages linked to by Reddit posts that had received at least three upvotes prior to December 2017. The dataset was then cleaned; HTML documents were parsed into plain text, duplicate pages were removed, and Wikipedia pages were excluded due to the risk of overfitting, as they were prevalent in many other datasets. Additionally, this model was retrained using the OpenWebText corpus by Anezatra. Utilizing DistilGPT, the model was aimed at reducing its size to create a lighter and more efficient version. The DistilGPT technique maintains the model's learning capabilities while reducing the number of parameters, thus speeding up training and inference processes and utilizing resources more efficiently.
22
 
23
  ## How to use
24
-
25
  ```python
26
 
27
  # pip install git+https://github.com/huggingface/transformers.git
 
21
  The transformer architecture provides a capability that allows GPT models to be trained on larger datasets compared to previous NLP (natural language processing) models. The GPT-1 model demonstrated the validity of this approach; however, GPT-2 aimed to further investigate the emergent properties of networks trained on extremely large datasets. CommonCrawl, a large corpus previously used to train NLP systems, was considered due to its extensive size. However, further examination revealed that much of the content was unintelligible. Consequently, OpenAI developed a new dataset called WebText. Instead of indiscriminately scraping content from the World Wide Web, WebText collected content only from pages linked to by Reddit posts that had received at least three upvotes prior to December 2017. The dataset was then cleaned; HTML documents were parsed into plain text, duplicate pages were removed, and Wikipedia pages were excluded due to the risk of overfitting, as they were prevalent in many other datasets. Additionally, this model was retrained using the OpenWebText corpus by Anezatra. Utilizing DistilGPT, the model was aimed at reducing its size to create a lighter and more efficient version. The DistilGPT technique maintains the model's learning capabilities while reducing the number of parameters, thus speeding up training and inference processes and utilizing resources more efficiently.
22
 
23
  ## How to use
 
24
  ```python
25
 
26
  # pip install git+https://github.com/huggingface/transformers.git