Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
@@ -10,4 +10,144 @@ pinned: false
|
|
10 |
license: apache-2.0
|
11 |
---
|
12 |
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
license: apache-2.0
|
11 |
---
|
12 |
|
13 |
+
# Political Parrots: GPT’s Take on Bundestag Tweets
|
14 |
+
|
15 |
+
**Tim Michalow (qo27leja), Ian Fischer (uf28alic), Tobias Stirner (zo94suqa), Jonathan Franke (StudOn-Username/Enrollment Number)**
|
16 |
+
|
17 |
+
## 1 Introduction
|
18 |
+
|
19 |
+
### Motivation
|
20 |
+
|
21 |
+
The use of social media in political communication has surged in recent years, with platforms like Twitter becoming pivotal in shaping public discourse. This trend was notably exemplified during Donald Trump’s election campaign, which underscored Twitter’s significant influence on political narratives and engagement. Recognizing the growing importance of social media in politics, this study focuses on analyzing the communication strategies of German political parties on Twitter.
|
22 |
+
|
23 |
+
### Define your research question
|
24 |
+
|
25 |
+
#TODO: Ausarbeiten
|
26 |
+
|
27 |
+
### How is this document structured
|
28 |
+
|
29 |
+
This document is structured as follows:
|
30 |
+
1. Introduction
|
31 |
+
2. Related Work
|
32 |
+
3. Methodology
|
33 |
+
4. Results
|
34 |
+
5. Discussion
|
35 |
+
6. Conclusion
|
36 |
+
|
37 |
+
## 2 Related Work
|
38 |
+
|
39 |
+
#TODO: Heraussuchen
|
40 |
+
|
41 |
+
## 3 Methodology
|
42 |
+
|
43 |
+
### 3.1 General Methodology
|
44 |
+
|
45 |
+
Our methodology involves several key steps to achieve our project goals:
|
46 |
+
|
47 |
+
1. **Data Collection**: We collected tweets from the official Twitter accounts of German political parties.
|
48 |
+
2. **Data Preparation**: We cleaned and prepared the dataset for analysis, including filtering out retweets and irrelevant information.
|
49 |
+
3. **Model Training**: We fine-tuned a GPT-2 model on our dataset to generate tweets that mimic the style of each political party.
|
50 |
+
4. **Evaluation**: We evaluated the model’s performance using various metrics to ensure the generated tweets were accurate and relevant.
|
51 |
+
5. **Deployment**: We developed a Streamlit application to deploy the trained model and allow users to generate tweets interactively.
|
52 |
+
|
53 |
+
### 3.2 Data Understanding and Preparation
|
54 |
+
|
55 |
+
#### 3.2.1 Merging JSON Lines Files
|
56 |
+
|
57 |
+
We merged multiple JSON Lines files containing tweet data into a single dataset to streamline our analysis process.
|
58 |
+
|
59 |
+
#### 3.2.2 Filtering Retweets
|
60 |
+
|
61 |
+
To focus on original content, we filtered out retweets from the dataset. This ensured that our analysis was based solely on the tweets created by the political parties themselves.
|
62 |
+
|
63 |
+
#### 3.2.3 Cleaning Tweet Text
|
64 |
+
|
65 |
+
We cleaned the tweet text by removing special characters, URLs, and other irrelevant information. This preprocessing step was crucial to ensure the quality of the data used for model training.
|
66 |
+
|
67 |
+
### 3.3 Modeling and Evaluation
|
68 |
+
|
69 |
+
#### 3.3.1 Model and Tokenizer
|
70 |
+
|
71 |
+
We used a fine-tuned GPT-2 model for tweet generation. The tokenizer was configured to handle the specific nuances of the German language and political terminology.
|
72 |
+
|
73 |
+
#### 3.3.2 Parameters
|
74 |
+
|
75 |
+
The model was fine-tuned with the following parameters:
|
76 |
+
|
77 |
+
#TODO: Heraussuchen
|
78 |
+
|
79 |
+
#### 3.3.3 Training
|
80 |
+
|
81 |
+
The training process involved feeding the cleaned and prepared dataset into the GPT-2 model. We used a combination of supervised learning and transfer learning techniques to fine-tune the model effectively.
|
82 |
+
|
83 |
+
#### 3.3.4 Generation and Deployment
|
84 |
+
|
85 |
+
After training, we used the model to generate tweets that mimic the styles of different political parties. The deployment was handled through a Streamlit application, providing an interactive platform for users to generate and analyze tweets.
|
86 |
+
|
87 |
+
## 4 Results
|
88 |
+
|
89 |
+
### 4.1 Artifacts
|
90 |
+
|
91 |
+
The main artifacts produced from this project include:
|
92 |
+
- A fine-tuned GPT-2 model for tweet generation
|
93 |
+
- A cleaned and prepared dataset of German political tweets
|
94 |
+
- A Streamlit application for tweet generation and analysis
|
95 |
+
|
96 |
+
### 4.2 Libraries and Tools
|
97 |
+
|
98 |
+
We used several libraries and tools, including:
|
99 |
+
- TensorFlow and PyTorch for model training
|
100 |
+
- Hugging Face Transformers for model fine-tuning
|
101 |
+
- Pandas and NumPy for data processing
|
102 |
+
- Streamlit for application deployment
|
103 |
+
|
104 |
+
### 4.3 Concept of the App
|
105 |
+
|
106 |
+
Our Streamlit application allows users to select a political party and generate tweets that reflect the party's communication style. The app provides an interactive and user-friendly interface for exploring the generated tweets.
|
107 |
+
|
108 |
+
### 4.4 Results on Unseen Data
|
109 |
+
|
110 |
+
The model performed well on unseen data, generating tweets that were coherent and stylistically similar to those of the respective political parties. The generated tweets were evaluated based on their relevance, sentiment, and rhetorical style.
|
111 |
+
|
112 |
+
## 5 Discussion
|
113 |
+
|
114 |
+
### 5.1 Results/Artifacts/App
|
115 |
+
|
116 |
+
The generated tweets successfully mimicked the communication styles of different German political parties. The Streamlit application provided an intuitive platform for users to interact with the model and generate tweets.
|
117 |
+
|
118 |
+
### 5.2 Limitations
|
119 |
+
|
120 |
+
Several limitations were identified during the project:
|
121 |
+
- The dataset was limited in size, which may have affected the model's ability to generalize.
|
122 |
+
- GPU availability in Colab and Kaggle was a constraint, limiting the extent of model fine-tuning.
|
123 |
+
- The app's functionality was limited to tweet generation and did not include advanced features like real-time sentiment analysis.
|
124 |
+
|
125 |
+
### 5.3 Ethical Perspective
|
126 |
+
|
127 |
+
#### Dangers of the Application
|
128 |
+
|
129 |
+
The use of AI-generated content in political communication can lead to ethical concerns, such as the potential for discrimination through biased models or the spread of misinformation.
|
130 |
+
|
131 |
+
#### Transparency
|
132 |
+
|
133 |
+
Ensuring transparency in the model's training and deployment processes is crucial to maintain public trust and accountability.
|
134 |
+
|
135 |
+
#### Effects on Climate Change
|
136 |
+
|
137 |
+
The computational resources required for training large models have a significant carbon footprint, contributing to climate change.
|
138 |
+
|
139 |
+
Possible sources for further ethical considerations:
|
140 |
+
- "Automating Society Report"
|
141 |
+
- Relevant publications on the ethics of AI and social media
|
142 |
+
|
143 |
+
### 5.4 Further Research
|
144 |
+
|
145 |
+
Future research could explore:
|
146 |
+
- Expanding the dataset to include more tweets and other forms of social media communication.
|
147 |
+
- Incorporating real-time sentiment analysis and other advanced features into the application.
|
148 |
+
- Investigating the long-term impacts of AI-generated content on public opinion and political engagement.
|
149 |
+
|
150 |
+
## 6 Conclusion
|
151 |
+
|
152 |
+
In this project, we analyzed the communication strategies of German political parties on Twitter using a fine-tuned GPT-2 model. Our results demonstrate the potential of NLP techniques in political communication analysis. Future research could build on these findings to explore more advanced applications and address the ethical implications of AI in social media.
|
153 |
+
|