Spaces:

ML4BTweetGen
/

TweetGPT

Sleeping

ifisch commited on Jun 3, 2024

Commit

860936c

verified ·

1 Parent(s): e82503d

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -106,7 +106,15 @@ Seed: 38
 #### 3.3.3 Training
-The training process involved feeding the cleaned and prepared dataset into the GPT-2 model. We used a combination of supervised learning and transfer learning techniques to fine-tune the model effectively.
 #### 3.3.4 Generation and Deployment

 #### 3.3.3 Training
+The training process involved feeding the cleaned and prepared dataset into the GPT-2 model. We used a combination of supervised learning techniques to fine-tune the model effectively.
+We trained the model using the Hugging Face Trainer, which takes the parameters as
+input. We opted for this because it is optimized for transformer and also comes from
+the same framework. During training, we used the WANDB API to track the training
+of each model of the respective parties and obtain metrics. We ran the training of the
+models via Kaggle, as Kaggle provides two T4 GPUs and so we got good hardware
+without paying anything. Another advantage was that we could run the training via
+CUDA. The training took between 2 and 10 hours, depending on the number of tweets
+from each party. We will go into this in more detail in the evaluation.
 #### 3.3.4 Generation and Deployment