anezatra
/

chat-gpt2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

anezatra commited on Apr 20

Commit

c9a857a

•

1 Parent(s): 62ccd52

Update README.md

Files changed (1) hide show

README.md +0 -1

README.md CHANGED Viewed

@@ -21,7 +21,6 @@ Architecturally akin to its antecedent GPT-1 and progeny GPT-3 and GPT-4, GPT-2
 The transformer architecture provides a capability that allows GPT models to be trained on larger datasets compared to previous NLP (natural language processing) models. The GPT-1 model demonstrated the validity of this approach; however, GPT-2 aimed to further investigate the emergent properties of networks trained on extremely large datasets. CommonCrawl, a large corpus previously used to train NLP systems, was considered due to its extensive size. However, further examination revealed that much of the content was unintelligible. Consequently, OpenAI developed a new dataset called WebText. Instead of indiscriminately scraping content from the World Wide Web, WebText collected content only from pages linked to by Reddit posts that had received at least three upvotes prior to December 2017. The dataset was then cleaned; HTML documents were parsed into plain text, duplicate pages were removed, and Wikipedia pages were excluded due to the risk of overfitting, as they were prevalent in many other datasets. Additionally, this model was retrained using the OpenWebText corpus by Anezatra. Utilizing DistilGPT, the model was aimed at reducing its size to create a lighter and more efficient version. The DistilGPT technique maintains the model's learning capabilities while reducing the number of parameters, thus speeding up training and inference processes and utilizing resources more efficiently.
 ## How to use
 ```python
 # pip install git+https://github.com/huggingface/transformers.git

 The transformer architecture provides a capability that allows GPT models to be trained on larger datasets compared to previous NLP (natural language processing) models. The GPT-1 model demonstrated the validity of this approach; however, GPT-2 aimed to further investigate the emergent properties of networks trained on extremely large datasets. CommonCrawl, a large corpus previously used to train NLP systems, was considered due to its extensive size. However, further examination revealed that much of the content was unintelligible. Consequently, OpenAI developed a new dataset called WebText. Instead of indiscriminately scraping content from the World Wide Web, WebText collected content only from pages linked to by Reddit posts that had received at least three upvotes prior to December 2017. The dataset was then cleaned; HTML documents were parsed into plain text, duplicate pages were removed, and Wikipedia pages were excluded due to the risk of overfitting, as they were prevalent in many other datasets. Additionally, this model was retrained using the OpenWebText corpus by Anezatra. Utilizing DistilGPT, the model was aimed at reducing its size to create a lighter and more efficient version. The DistilGPT technique maintains the model's learning capabilities while reducing the number of parameters, thus speeding up training and inference processes and utilizing resources more efficiently.
 ## How to use
 ```python
 # pip install git+https://github.com/huggingface/transformers.git