ifisch commited on
Commit
ed80d9d
·
verified ·
1 Parent(s): c96c726

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +141 -1
README.md CHANGED
@@ -10,4 +10,144 @@ pinned: false
10
  license: apache-2.0
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  license: apache-2.0
11
  ---
12
 
13
+ # Political Parrots: GPT’s Take on Bundestag Tweets
14
+
15
+ **Tim Michalow (qo27leja), Ian Fischer (uf28alic), Tobias Stirner (zo94suqa), Jonathan Franke (StudOn-Username/Enrollment Number)**
16
+
17
+ ## 1 Introduction
18
+
19
+ ### Motivation
20
+
21
+ The use of social media in political communication has surged in recent years, with platforms like Twitter becoming pivotal in shaping public discourse. This trend was notably exemplified during Donald Trump’s election campaign, which underscored Twitter’s significant influence on political narratives and engagement. Recognizing the growing importance of social media in politics, this study focuses on analyzing the communication strategies of German political parties on Twitter.
22
+
23
+ ### Define your research question
24
+
25
+ #TODO: Ausarbeiten
26
+
27
+ ### How is this document structured
28
+
29
+ This document is structured as follows:
30
+ 1. Introduction
31
+ 2. Related Work
32
+ 3. Methodology
33
+ 4. Results
34
+ 5. Discussion
35
+ 6. Conclusion
36
+
37
+ ## 2 Related Work
38
+
39
+ #TODO: Heraussuchen
40
+
41
+ ## 3 Methodology
42
+
43
+ ### 3.1 General Methodology
44
+
45
+ Our methodology involves several key steps to achieve our project goals:
46
+
47
+ 1. **Data Collection**: We collected tweets from the official Twitter accounts of German political parties.
48
+ 2. **Data Preparation**: We cleaned and prepared the dataset for analysis, including filtering out retweets and irrelevant information.
49
+ 3. **Model Training**: We fine-tuned a GPT-2 model on our dataset to generate tweets that mimic the style of each political party.
50
+ 4. **Evaluation**: We evaluated the model’s performance using various metrics to ensure the generated tweets were accurate and relevant.
51
+ 5. **Deployment**: We developed a Streamlit application to deploy the trained model and allow users to generate tweets interactively.
52
+
53
+ ### 3.2 Data Understanding and Preparation
54
+
55
+ #### 3.2.1 Merging JSON Lines Files
56
+
57
+ We merged multiple JSON Lines files containing tweet data into a single dataset to streamline our analysis process.
58
+
59
+ #### 3.2.2 Filtering Retweets
60
+
61
+ To focus on original content, we filtered out retweets from the dataset. This ensured that our analysis was based solely on the tweets created by the political parties themselves.
62
+
63
+ #### 3.2.3 Cleaning Tweet Text
64
+
65
+ We cleaned the tweet text by removing special characters, URLs, and other irrelevant information. This preprocessing step was crucial to ensure the quality of the data used for model training.
66
+
67
+ ### 3.3 Modeling and Evaluation
68
+
69
+ #### 3.3.1 Model and Tokenizer
70
+
71
+ We used a fine-tuned GPT-2 model for tweet generation. The tokenizer was configured to handle the specific nuances of the German language and political terminology.
72
+
73
+ #### 3.3.2 Parameters
74
+
75
+ The model was fine-tuned with the following parameters:
76
+
77
+ #TODO: Heraussuchen
78
+
79
+ #### 3.3.3 Training
80
+
81
+ The training process involved feeding the cleaned and prepared dataset into the GPT-2 model. We used a combination of supervised learning and transfer learning techniques to fine-tune the model effectively.
82
+
83
+ #### 3.3.4 Generation and Deployment
84
+
85
+ After training, we used the model to generate tweets that mimic the styles of different political parties. The deployment was handled through a Streamlit application, providing an interactive platform for users to generate and analyze tweets.
86
+
87
+ ## 4 Results
88
+
89
+ ### 4.1 Artifacts
90
+
91
+ The main artifacts produced from this project include:
92
+ - A fine-tuned GPT-2 model for tweet generation
93
+ - A cleaned and prepared dataset of German political tweets
94
+ - A Streamlit application for tweet generation and analysis
95
+
96
+ ### 4.2 Libraries and Tools
97
+
98
+ We used several libraries and tools, including:
99
+ - TensorFlow and PyTorch for model training
100
+ - Hugging Face Transformers for model fine-tuning
101
+ - Pandas and NumPy for data processing
102
+ - Streamlit for application deployment
103
+
104
+ ### 4.3 Concept of the App
105
+
106
+ Our Streamlit application allows users to select a political party and generate tweets that reflect the party's communication style. The app provides an interactive and user-friendly interface for exploring the generated tweets.
107
+
108
+ ### 4.4 Results on Unseen Data
109
+
110
+ The model performed well on unseen data, generating tweets that were coherent and stylistically similar to those of the respective political parties. The generated tweets were evaluated based on their relevance, sentiment, and rhetorical style.
111
+
112
+ ## 5 Discussion
113
+
114
+ ### 5.1 Results/Artifacts/App
115
+
116
+ The generated tweets successfully mimicked the communication styles of different German political parties. The Streamlit application provided an intuitive platform for users to interact with the model and generate tweets.
117
+
118
+ ### 5.2 Limitations
119
+
120
+ Several limitations were identified during the project:
121
+ - The dataset was limited in size, which may have affected the model's ability to generalize.
122
+ - GPU availability in Colab and Kaggle was a constraint, limiting the extent of model fine-tuning.
123
+ - The app's functionality was limited to tweet generation and did not include advanced features like real-time sentiment analysis.
124
+
125
+ ### 5.3 Ethical Perspective
126
+
127
+ #### Dangers of the Application
128
+
129
+ The use of AI-generated content in political communication can lead to ethical concerns, such as the potential for discrimination through biased models or the spread of misinformation.
130
+
131
+ #### Transparency
132
+
133
+ Ensuring transparency in the model's training and deployment processes is crucial to maintain public trust and accountability.
134
+
135
+ #### Effects on Climate Change
136
+
137
+ The computational resources required for training large models have a significant carbon footprint, contributing to climate change.
138
+
139
+ Possible sources for further ethical considerations:
140
+ - "Automating Society Report"
141
+ - Relevant publications on the ethics of AI and social media
142
+
143
+ ### 5.4 Further Research
144
+
145
+ Future research could explore:
146
+ - Expanding the dataset to include more tweets and other forms of social media communication.
147
+ - Incorporating real-time sentiment analysis and other advanced features into the application.
148
+ - Investigating the long-term impacts of AI-generated content on public opinion and political engagement.
149
+
150
+ ## 6 Conclusion
151
+
152
+ In this project, we analyzed the communication strategies of German political parties on Twitter using a fine-tuned GPT-2 model. Our results demonstrate the potential of NLP techniques in political communication analysis. Future research could build on these findings to explore more advanced applications and address the ethical implications of AI in social media.
153
+