arshy commited on
Commit
590b388
·
1 Parent(s): 030053e

updated 23052024

Browse files
analysis.ipynb ADDED
@@ -0,0 +1,2056 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "import pandas as pd\n",
10
+ "import re"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "code",
15
+ "execution_count": 2,
16
+ "metadata": {},
17
+ "outputs": [],
18
+ "source": [
19
+ "fpmms = pd.read_parquet('/Users/arshath/play/openautonomy/olas-prediction-live-dashboard/data/fpmms.parquet')\n",
20
+ "tools = pd.read_parquet('/Users/arshath/play/openautonomy/olas-prediction-live-dashboard/data/tools.parquet')\n",
21
+ "trades = pd.read_parquet('/Users/arshath/play/openautonomy/olas-prediction-live-dashboard/data/all_trades_profitability.parquet')"
22
+ ]
23
+ },
24
+ {
25
+ "cell_type": "code",
26
+ "execution_count": 3,
27
+ "metadata": {},
28
+ "outputs": [],
29
+ "source": [
30
+ "def extract_question(text):\n",
31
+ " pattern = r'\"([^\"]+\\?)\"'\n",
32
+ " match = re.search(pattern, text)\n",
33
+ " if match:\n",
34
+ " return match.group(1)\n",
35
+ " return text"
36
+ ]
37
+ },
38
+ {
39
+ "cell_type": "code",
40
+ "execution_count": 4,
41
+ "metadata": {},
42
+ "outputs": [],
43
+ "source": [
44
+ "def get_current_answer(q):\n",
45
+ " return trades[trades['title'] == q]['current_answer'].unique()"
46
+ ]
47
+ },
48
+ {
49
+ "cell_type": "code",
50
+ "execution_count": 5,
51
+ "metadata": {},
52
+ "outputs": [],
53
+ "source": [
54
+ "# only select trades in May 2024\n",
55
+ "trades['creation_timestamp'] = pd.to_datetime(trades['creation_timestamp'])\n",
56
+ "trades = trades[trades['creation_timestamp'].dt.month == 5]\n",
57
+ "trades = trades[trades['creation_timestamp'].dt.year == 2024]\n",
58
+ "\n",
59
+ "# make a column for winning_vote\n",
60
+ "tools['winning_vote'] = (tools['vote'] == tools['currentAnswer'])\n",
61
+ "tools = tools[tools['tool']!= 'resolve-market-reasoning-gpt-4'].reset_index(drop=True)"
62
+ ]
63
+ },
64
+ {
65
+ "cell_type": "code",
66
+ "execution_count": 6,
67
+ "metadata": {},
68
+ "outputs": [],
69
+ "source": [
70
+ "tools['prompt_request'] = tools['prompt_request'].apply(extract_question)"
71
+ ]
72
+ },
73
+ {
74
+ "cell_type": "code",
75
+ "execution_count": 7,
76
+ "metadata": {},
77
+ "outputs": [],
78
+ "source": [
79
+ "trades_grouped = trades.groupby(['title', 'winning_trade']).size().unstack().fillna(0)"
80
+ ]
81
+ },
82
+ {
83
+ "cell_type": "code",
84
+ "execution_count": 8,
85
+ "metadata": {},
86
+ "outputs": [],
87
+ "source": [
88
+ "winning_trades_percentage = trades_grouped[True] / trades_grouped.sum(axis=1)\n",
89
+ "winning_trades_percentage = winning_trades_percentage.reset_index()\n",
90
+ "winning_trades_percentage.columns = ['title', 'winning_trade_percentage']\n",
91
+ "winning_trades_percentage['num_trades'] = list(trades_grouped.sum(axis=1).values)\n",
92
+ "winning_trades_percentage_bottom_50 = winning_trades_percentage.sort_values(by='winning_trade_percentage', ascending=False)[-50:].reset_index(drop=True)\n",
93
+ "winning_trades_percentage_top_50 = winning_trades_percentage.sort_values(by='winning_trade_percentage', ascending=False)[:50].reset_index(drop=True)"
94
+ ]
95
+ },
96
+ {
97
+ "cell_type": "code",
98
+ "execution_count": 13,
99
+ "metadata": {},
100
+ "outputs": [],
101
+ "source": [
102
+ "# winning_trades_percentage.sort_values(by='winning_trade_percentage', ascending=False).reset_index(drop=True).to_csv('winning_trades_percentage.csv', index=False)"
103
+ ]
104
+ },
105
+ {
106
+ "cell_type": "code",
107
+ "execution_count": 18,
108
+ "metadata": {},
109
+ "outputs": [
110
+ {
111
+ "data": {
112
+ "text/plain": [
113
+ "['Will Kylian Mbappe leave Paris St-Germain at the end of the season by 16 May 2024?',\n",
114
+ " 'Will BlizzCon be reinstated on or by 1 May 2024 after its cancellation in 2024?',\n",
115
+ " 'Will Joe Biden approve more weapons for Ukraine by 4 May 2024?',\n",
116
+ " \"Will FiiO's new custom in-ear monitors become the top-selling wireless earbuds by 9 May 2024?\",\n",
117
+ " 'Will Mohamed Salah leave Liverpool on 7 May 2024?',\n",
118
+ " \"Will Ryan Gosling accept a 'dark' role in a film by 14 May 2024?\",\n",
119
+ " 'Will the Philadelphia 76ers win the NBA play-offs on 7 May 2024?',\n",
120
+ " 'Will the Panamanian presidential election result in a clear victor by 12 May 2024?',\n",
121
+ " 'Will the Museum of Old and New Art in Tasmania be allowed to keep its exhibit women-only by 14 May 2024?',\n",
122
+ " \"Will Diego Maradona's 'Stolen' Golden Ball be auctioned off on 14 May 2024?\",\n",
123
+ " 'Will the Mercedes G-Wagen release an electric version on 1 May 2024?',\n",
124
+ " 'Will the Israeli government lift the broadcast ban on Al Jazeera on or before 13 May 2024?',\n",
125
+ " 'Will Intel release its Core Ultra 200 Arrow Lake CPUs by 16 May 2024?',\n",
126
+ " 'Will the Atlanta City Council pay $3.8 million to settle a lawsuit by the family of a church deacon who died in a struggle with a city police officer by 13 May 2024?',\n",
127
+ " 'Will Voyager-1 continue to send readable data until 1 May 2024?',\n",
128
+ " 'Will the Amber Alert issued in New Mexico result in the discovery of the missing 10-month-old baby by 13 May 2024?',\n",
129
+ " \"Will Florida's ban on lab-grown meat be overturned by 12 May 2024?\",\n",
130
+ " \"Will the US government successfully distribute the $138.7 million payout to Larry Nassar's victims by 1 May 2024?\",\n",
131
+ " 'Will a new sport be officially added to the Olympics programme on 16 May 2024?',\n",
132
+ " \"Will Kristi Noem be announced as Donald Trump's vice presidential running mate by 6 May 2024?\",\n",
133
+ " 'Will the United Auto Workers union strike against Daimler Truck on or by 7 May 2024?',\n",
134
+ " 'Will the World Snooker Championship 2024 conclude with Judd Trump or Tom Ford as the winner by May 5, 2024?',\n",
135
+ " \"Will Maria Georgas be announced as the next 'Bachelorette' lead on 9 May 2024?\",\n",
136
+ " 'Will Apple release new iPads at their event on May 7, 2024?',\n",
137
+ " 'Will Joe Biden still be the President of the United States on 11 May 2024?',\n",
138
+ " \"Will the world's biggest 3D printer be used to make parts of houses by 2 May 2024?\",\n",
139
+ " \"Will Anthony Edwards be named NBA's MVP on 11 May 2024?\",\n",
140
+ " 'Will a winner be declared in the Eurovision 2024 grand final by 19 May 2024?',\n",
141
+ " \"Will a new mission be launched to explore the moon's 'hidden side' by 12 May 2024?\",\n",
142
+ " 'Will Mike Tyson win his bout against Jake Paul on 7 May 2024?',\n",
143
+ " 'Will the bird flu outbreak be declared a global pandemic by 12 May 2024?',\n",
144
+ " 'Will the new Apple Pencil Pro be revealed by 15 May 2024?',\n",
145
+ " \"Will the amateur angler who landed UK's 'biggest fish' in Essex catch another record-breaking fish by 7 May 2024?\",\n",
146
+ " \"Will Saul 'Canelo' Alvarez successfully defend his WBA, WBC, WBO, and IBF titles again by 13 May 2024?\",\n",
147
+ " \"Will Taylor Swift's 'The Tortured Poets Department' album reach number 1 on Billboard 200 on 3 May 2024?\",\n",
148
+ " 'Will Joe Biden attend the White House Correspondents Dinner on 5 May 2024?',\n",
149
+ " 'Will King Charles perform public duties on 5 May 2024, after his progress in cancer treatment?',\n",
150
+ " \"Will LinkedIn's new puzzle games Pinpoint, Queens, and Crossclimb be successful on their platform by 9 May 2024?\",\n",
151
+ " 'Will South Dakota Governor Kristi Noem resign over the puppy killing controversy by 15 May 2024?',\n",
152
+ " 'Will Apple announce the release of a new M4 chip by 13 May 2024?',\n",
153
+ " 'Will Eric Adams still be the mayor of New York City on 10 May 2024?',\n",
154
+ " \"Will the livestream video 'portals' connecting New York City and Dublin still be operational on 19 May 2024?\",\n",
155
+ " 'Will there be more pro-Palestinian protests on US university campuses on 6 May 2024?',\n",
156
+ " 'Will Google Pixel 8a be released at Google I/O 2024 on 14 May?',\n",
157
+ " 'Will Apple announce more than just a spec bump at the May 2024 iPad event?',\n",
158
+ " \"Will Apple's new Magic Keyboard for the iPad Pro M4 be released by 15 May 2024?\",\n",
159
+ " 'Will the UEFA Champions League final be between PSG and Borussia Dortmund on 13 May 2024?',\n",
160
+ " 'Will the FBI report an increase in scams targeting Americans older than 60 in 2024?',\n",
161
+ " 'Will Erik ten Hag remain as Manchester United manager on 17 May 2024?',\n",
162
+ " 'Will Jofra Archer be a part of the England squad for T20 World Cup in June 2024?']"
163
+ ]
164
+ },
165
+ "execution_count": 18,
166
+ "metadata": {},
167
+ "output_type": "execute_result"
168
+ }
169
+ ],
170
+ "source": [
171
+ "winning_trades_percentage_top_50['title'].tolist()\n",
172
+ "\n"
173
+ ]
174
+ },
175
+ {
176
+ "cell_type": "code",
177
+ "execution_count": 17,
178
+ "metadata": {},
179
+ "outputs": [
180
+ {
181
+ "data": {
182
+ "text/plain": [
183
+ "[\"Will 'Scavengers Reign' be renewed for a second season on Netflix by 19 May 2024?\",\n",
184
+ " 'Will Fiona Harvey officially file a lawsuit against Netflix and Richard Gadd by 17 May 2024?',\n",
185
+ " 'Will the final report on the Baltimore bridge collapse be released by 20 May 2024?',\n",
186
+ " 'Will the Autonomous Racing League successfully hold their second race by May 3, 2024?',\n",
187
+ " 'Will Trent Staggs win the Senatorial race to replace Sen. Mitt Romney (R-UT) on 5 May 2024?',\n",
188
+ " 'Will the Houston area experience flooding conditions on 11 May 2024?',\n",
189
+ " \"Will 'Wednesday' season 2 be released on Netflix by 1 May 2024?\",\n",
190
+ " 'Will Arsenal win against Bournemouth in the Premier League match on 12 May 2024?',\n",
191
+ " 'Will Qualcomm release its Snapdragon X Plus laptop chip by 1 May 2024?',\n",
192
+ " \"Will Feyenoord's Arne Slot become the new manager of Liverpool by 1 May 2024?\",\n",
193
+ " 'Will the FCC receive additional funding for replacing Huawei gear by 10 May 2024?',\n",
194
+ " 'Will there be any major cyber attack on an organization using AI before 2 May 2024?',\n",
195
+ " 'Will Sony complete the takeover of Paramount by 11 May 2024?',\n",
196
+ " \"Will 'Hell's Kitchen' win the Tony Awards for Best Musical on 7 May 2024?\",\n",
197
+ " 'Will Tesla announce reinstating any laid off supercharger workers by 11 May 2024?',\n",
198
+ " 'Will there be another tornado in Nebraska and Iowa on 6 May 2024?',\n",
199
+ " 'Will the DJI drones be officially banned in the United States by 4 May 2024?',\n",
200
+ " 'Will OpenAI debut a multimodal AI digital assistant by 19 May 2024?',\n",
201
+ " 'Will TikTok be purchased by a Wall Street or Tech billionaire by 2 May 2024?',\n",
202
+ " \"Will the 'Lost' Gustav Klimt painting be sold at the auction in Vienna on 3 May 2024?\",\n",
203
+ " \"Will the Federal Communications Commission levy fines against AT&T, Sprint, T-Mobile, and Verizon for illegally sharing customers' location data by 9 May 2024?\",\n",
204
+ " 'Will the Manchester City win the WSL title on 14 May 2024?',\n",
205
+ " 'Will Meta start making profit from generative AI by 3 May 2024?',\n",
206
+ " 'Will Apple launch an AI-powered iOS 18 on or by 1 May 2024?',\n",
207
+ " 'Will iOS 18 receive a major AI overhaul by 6 May 2024?',\n",
208
+ " 'Will Ippei Mizuhara be sentenced for bank fraud by 15 May 2024?',\n",
209
+ " 'Will Tesla lay off nearly 2,700 workers at its Austin, Texas factory by 1 May 2024?',\n",
210
+ " 'Will Manchester City win the Premier League title on 11 May 2024?',\n",
211
+ " 'Will there be another deadly pandemic by 8 May 2024?',\n",
212
+ " 'Will China successfully collect samples from the far side of the Moon on 10 May 2024?',\n",
213
+ " \"Will the American Airlines correct their system's error of mistaking 101-year-old passenger for a baby by 7 May 2024?\",\n",
214
+ " 'Will the Boeing Starliner capsule successfully complete its first astronaut-crewed flight to the International Space Station by 13 May 2024?',\n",
215
+ " \"Will the Technics' special-edition turntable in collaboration with Lamborghini be released by 17 May 2024?\",\n",
216
+ " 'Will the Florida Panthers win against the Boston Bruins in the Game 3 on 17 May 2024?',\n",
217
+ " 'Will Harvard Yard be free from Anti-Israel protests by 2 May 2024?',\n",
218
+ " \"Will Samsung's latest jibe have any impact on Apple's sales by 11 May 2024?\",\n",
219
+ " \"Will the Miss USA organization respond to the call for 'full transparency' from contestants by 16 May 2024?\",\n",
220
+ " 'Will Tom Daley win a medal at the Paris Olympics 2024 by 14 May 2024?',\n",
221
+ " \"Will Liverpool win any more trophies in Jurgen Klopp's final season?\",\n",
222
+ " 'Will Liverpool win any more trophies by 2 May 2024?',\n",
223
+ " 'Will Caitlin Clark score more than 20 points in her next NBA game by 10 May 2024?',\n",
224
+ " 'Will the statues of civil rights leader Daisy Bates and singer Johnny Cash replace the Arkansas statues at the U.S Capitol by 14 May 2024?',\n",
225
+ " \"Will the season 6 of Netflix's Cobra Kai be released in 3 parts by 12 May 2024?\",\n",
226
+ " \"Will the 'Don't Say Gay' education restrictions bill be implemented in Alabama on or before 1 May 2024?\",\n",
227
+ " \"Will the 'lost' Gustav Klimt painting be successfully auctioned by 3 May 2024?\",\n",
228
+ " 'Will the Kansas City Chiefs win their next game on or before May 15, 2024?',\n",
229
+ " 'Will Lando Norris win another F1 race by 15 May 2024?',\n",
230
+ " 'Will Pennsylvania be a red state by 6 May 2024?',\n",
231
+ " 'Will Tesla face significant financial troubles by 11 May 2024?',\n",
232
+ " 'Will the BattlerGC Pro be released for the GameCube on or by 3 May 2024?']"
233
+ ]
234
+ },
235
+ "execution_count": 17,
236
+ "metadata": {},
237
+ "output_type": "execute_result"
238
+ }
239
+ ],
240
+ "source": [
241
+ "winning_trades_percentage_bottom_50['title'].tolist()"
242
+ ]
243
+ },
244
+ {
245
+ "cell_type": "code",
246
+ "execution_count": 62,
247
+ "metadata": {},
248
+ "outputs": [],
249
+ "source": [
250
+ "def losing_percentage(q):\n",
251
+ " print(f\"Losing percentage for: {q}\")\n",
252
+ " q_losing = tools[tools['prompt_request'].str.contains(q)].groupby(['tool', 'winning_vote']).size().unstack().fillna(0)\n",
253
+ " q_losing_perc = q_losing[False] / (q_losing[False] + q_losing[True])\n",
254
+ " q_losing_perc = q_losing_perc.reset_index()\n",
255
+ " q_losing_perc.columns = ['tool', 'losing_percentage']\n",
256
+ " q_losing_perc['num_calls'] = list(q_losing.sum(axis=1).values)\n",
257
+ " q_losing_perc = q_losing_perc.sort_values(by='losing_percentage', ascending=False)\n",
258
+ " return q_losing_perc"
259
+ ]
260
+ },
261
+ {
262
+ "cell_type": "code",
263
+ "execution_count": 63,
264
+ "metadata": {},
265
+ "outputs": [
266
+ {
267
+ "name": "stdout",
268
+ "output_type": "stream",
269
+ "text": [
270
+ "Losing percentage for: Will 'Scavengers Reign' be renewed for a second season on Netflix by 19 May 2024?\n"
271
+ ]
272
+ },
273
+ {
274
+ "data": {
275
+ "text/html": [
276
+ "<div>\n",
277
+ "<style scoped>\n",
278
+ " .dataframe tbody tr th:only-of-type {\n",
279
+ " vertical-align: middle;\n",
280
+ " }\n",
281
+ "\n",
282
+ " .dataframe tbody tr th {\n",
283
+ " vertical-align: top;\n",
284
+ " }\n",
285
+ "\n",
286
+ " .dataframe thead th {\n",
287
+ " text-align: right;\n",
288
+ " }\n",
289
+ "</style>\n",
290
+ "<table border=\"1\" class=\"dataframe\">\n",
291
+ " <thead>\n",
292
+ " <tr style=\"text-align: right;\">\n",
293
+ " <th></th>\n",
294
+ " <th>tool</th>\n",
295
+ " <th>losing_percentage</th>\n",
296
+ " <th>num_calls</th>\n",
297
+ " </tr>\n",
298
+ " </thead>\n",
299
+ " <tbody>\n",
300
+ " <tr>\n",
301
+ " <th>0</th>\n",
302
+ " <td>prediction-offline</td>\n",
303
+ " <td>1.000000</td>\n",
304
+ " <td>40.0</td>\n",
305
+ " </tr>\n",
306
+ " <tr>\n",
307
+ " <th>4</th>\n",
308
+ " <td>prediction-request-rag-claude</td>\n",
309
+ " <td>1.000000</td>\n",
310
+ " <td>17.0</td>\n",
311
+ " </tr>\n",
312
+ " <tr>\n",
313
+ " <th>7</th>\n",
314
+ " <td>prediction-url-cot-claude</td>\n",
315
+ " <td>1.000000</td>\n",
316
+ " <td>2.0</td>\n",
317
+ " </tr>\n",
318
+ " <tr>\n",
319
+ " <th>2</th>\n",
320
+ " <td>prediction-online-sme</td>\n",
321
+ " <td>0.656716</td>\n",
322
+ " <td>67.0</td>\n",
323
+ " </tr>\n",
324
+ " <tr>\n",
325
+ " <th>6</th>\n",
326
+ " <td>prediction-request-reasoning-claude</td>\n",
327
+ " <td>0.571429</td>\n",
328
+ " <td>7.0</td>\n",
329
+ " </tr>\n",
330
+ " <tr>\n",
331
+ " <th>5</th>\n",
332
+ " <td>prediction-request-reasoning</td>\n",
333
+ " <td>0.538462</td>\n",
334
+ " <td>52.0</td>\n",
335
+ " </tr>\n",
336
+ " <tr>\n",
337
+ " <th>3</th>\n",
338
+ " <td>prediction-request-rag</td>\n",
339
+ " <td>0.250000</td>\n",
340
+ " <td>4.0</td>\n",
341
+ " </tr>\n",
342
+ " <tr>\n",
343
+ " <th>1</th>\n",
344
+ " <td>prediction-online</td>\n",
345
+ " <td>0.185185</td>\n",
346
+ " <td>27.0</td>\n",
347
+ " </tr>\n",
348
+ " </tbody>\n",
349
+ "</table>\n",
350
+ "</div>"
351
+ ],
352
+ "text/plain": [
353
+ " tool losing_percentage num_calls\n",
354
+ "0 prediction-offline 1.000000 40.0\n",
355
+ "4 prediction-request-rag-claude 1.000000 17.0\n",
356
+ "7 prediction-url-cot-claude 1.000000 2.0\n",
357
+ "2 prediction-online-sme 0.656716 67.0\n",
358
+ "6 prediction-request-reasoning-claude 0.571429 7.0\n",
359
+ "5 prediction-request-reasoning 0.538462 52.0\n",
360
+ "3 prediction-request-rag 0.250000 4.0\n",
361
+ "1 prediction-online 0.185185 27.0"
362
+ ]
363
+ },
364
+ "execution_count": 63,
365
+ "metadata": {},
366
+ "output_type": "execute_result"
367
+ }
368
+ ],
369
+ "source": [
370
+ "# have confirmed market resolution was correct\n",
371
+ "losing_percentage(winning_trades_percentage_bottom_50.loc[0, 'title'])"
372
+ ]
373
+ },
374
+ {
375
+ "cell_type": "code",
376
+ "execution_count": 64,
377
+ "metadata": {},
378
+ "outputs": [
379
+ {
380
+ "name": "stdout",
381
+ "output_type": "stream",
382
+ "text": [
383
+ "Losing percentage for: Will 'Scavengers Reign' be renewed for a second season on Netflix by 19 May 2024?\n"
384
+ ]
385
+ },
386
+ {
387
+ "data": {
388
+ "text/html": [
389
+ "<div>\n",
390
+ "<style scoped>\n",
391
+ " .dataframe tbody tr th:only-of-type {\n",
392
+ " vertical-align: middle;\n",
393
+ " }\n",
394
+ "\n",
395
+ " .dataframe tbody tr th {\n",
396
+ " vertical-align: top;\n",
397
+ " }\n",
398
+ "\n",
399
+ " .dataframe thead th {\n",
400
+ " text-align: right;\n",
401
+ " }\n",
402
+ "</style>\n",
403
+ "<table border=\"1\" class=\"dataframe\">\n",
404
+ " <thead>\n",
405
+ " <tr style=\"text-align: right;\">\n",
406
+ " <th></th>\n",
407
+ " <th>tool</th>\n",
408
+ " <th>losing_percentage</th>\n",
409
+ " <th>num_calls</th>\n",
410
+ " </tr>\n",
411
+ " </thead>\n",
412
+ " <tbody>\n",
413
+ " <tr>\n",
414
+ " <th>0</th>\n",
415
+ " <td>prediction-offline</td>\n",
416
+ " <td>1.000000</td>\n",
417
+ " <td>40.0</td>\n",
418
+ " </tr>\n",
419
+ " <tr>\n",
420
+ " <th>4</th>\n",
421
+ " <td>prediction-request-rag-claude</td>\n",
422
+ " <td>1.000000</td>\n",
423
+ " <td>17.0</td>\n",
424
+ " </tr>\n",
425
+ " <tr>\n",
426
+ " <th>7</th>\n",
427
+ " <td>prediction-url-cot-claude</td>\n",
428
+ " <td>1.000000</td>\n",
429
+ " <td>2.0</td>\n",
430
+ " </tr>\n",
431
+ " <tr>\n",
432
+ " <th>2</th>\n",
433
+ " <td>prediction-online-sme</td>\n",
434
+ " <td>0.656716</td>\n",
435
+ " <td>67.0</td>\n",
436
+ " </tr>\n",
437
+ " <tr>\n",
438
+ " <th>6</th>\n",
439
+ " <td>prediction-request-reasoning-claude</td>\n",
440
+ " <td>0.571429</td>\n",
441
+ " <td>7.0</td>\n",
442
+ " </tr>\n",
443
+ " <tr>\n",
444
+ " <th>5</th>\n",
445
+ " <td>prediction-request-reasoning</td>\n",
446
+ " <td>0.538462</td>\n",
447
+ " <td>52.0</td>\n",
448
+ " </tr>\n",
449
+ " <tr>\n",
450
+ " <th>3</th>\n",
451
+ " <td>prediction-request-rag</td>\n",
452
+ " <td>0.250000</td>\n",
453
+ " <td>4.0</td>\n",
454
+ " </tr>\n",
455
+ " <tr>\n",
456
+ " <th>1</th>\n",
457
+ " <td>prediction-online</td>\n",
458
+ " <td>0.185185</td>\n",
459
+ " <td>27.0</td>\n",
460
+ " </tr>\n",
461
+ " </tbody>\n",
462
+ "</table>\n",
463
+ "</div>"
464
+ ],
465
+ "text/plain": [
466
+ " tool losing_percentage num_calls\n",
467
+ "0 prediction-offline 1.000000 40.0\n",
468
+ "4 prediction-request-rag-claude 1.000000 17.0\n",
469
+ "7 prediction-url-cot-claude 1.000000 2.0\n",
470
+ "2 prediction-online-sme 0.656716 67.0\n",
471
+ "6 prediction-request-reasoning-claude 0.571429 7.0\n",
472
+ "5 prediction-request-reasoning 0.538462 52.0\n",
473
+ "3 prediction-request-rag 0.250000 4.0\n",
474
+ "1 prediction-online 0.185185 27.0"
475
+ ]
476
+ },
477
+ "execution_count": 64,
478
+ "metadata": {},
479
+ "output_type": "execute_result"
480
+ }
481
+ ],
482
+ "source": [
483
+ "# have confirmed currentAnswer\n",
484
+ "losing_percentage(winning_trades_percentage_bottom_50.loc[0, 'title'])"
485
+ ]
486
+ },
487
+ {
488
+ "cell_type": "code",
489
+ "execution_count": 65,
490
+ "metadata": {},
491
+ "outputs": [
492
+ {
493
+ "name": "stdout",
494
+ "output_type": "stream",
495
+ "text": [
496
+ "Losing percentage for: Will Fiona Harvey officially file a lawsuit against Netflix and Richard Gadd by 17 May 2024?\n"
497
+ ]
498
+ },
499
+ {
500
+ "data": {
501
+ "text/html": [
502
+ "<div>\n",
503
+ "<style scoped>\n",
504
+ " .dataframe tbody tr th:only-of-type {\n",
505
+ " vertical-align: middle;\n",
506
+ " }\n",
507
+ "\n",
508
+ " .dataframe tbody tr th {\n",
509
+ " vertical-align: top;\n",
510
+ " }\n",
511
+ "\n",
512
+ " .dataframe thead th {\n",
513
+ " text-align: right;\n",
514
+ " }\n",
515
+ "</style>\n",
516
+ "<table border=\"1\" class=\"dataframe\">\n",
517
+ " <thead>\n",
518
+ " <tr style=\"text-align: right;\">\n",
519
+ " <th></th>\n",
520
+ " <th>tool</th>\n",
521
+ " <th>losing_percentage</th>\n",
522
+ " <th>num_calls</th>\n",
523
+ " </tr>\n",
524
+ " </thead>\n",
525
+ " <tbody>\n",
526
+ " <tr>\n",
527
+ " <th>7</th>\n",
528
+ " <td>prediction-url-cot-claude</td>\n",
529
+ " <td>1.000000</td>\n",
530
+ " <td>1.0</td>\n",
531
+ " </tr>\n",
532
+ " <tr>\n",
533
+ " <th>2</th>\n",
534
+ " <td>prediction-online-sme</td>\n",
535
+ " <td>0.977273</td>\n",
536
+ " <td>44.0</td>\n",
537
+ " </tr>\n",
538
+ " <tr>\n",
539
+ " <th>1</th>\n",
540
+ " <td>prediction-online</td>\n",
541
+ " <td>0.975000</td>\n",
542
+ " <td>40.0</td>\n",
543
+ " </tr>\n",
544
+ " <tr>\n",
545
+ " <th>0</th>\n",
546
+ " <td>prediction-offline</td>\n",
547
+ " <td>0.677419</td>\n",
548
+ " <td>31.0</td>\n",
549
+ " </tr>\n",
550
+ " <tr>\n",
551
+ " <th>5</th>\n",
552
+ " <td>prediction-request-reasoning</td>\n",
553
+ " <td>0.534483</td>\n",
554
+ " <td>58.0</td>\n",
555
+ " </tr>\n",
556
+ " <tr>\n",
557
+ " <th>4</th>\n",
558
+ " <td>prediction-request-rag-claude</td>\n",
559
+ " <td>0.223881</td>\n",
560
+ " <td>67.0</td>\n",
561
+ " </tr>\n",
562
+ " <tr>\n",
563
+ " <th>6</th>\n",
564
+ " <td>prediction-request-reasoning-claude</td>\n",
565
+ " <td>0.200000</td>\n",
566
+ " <td>5.0</td>\n",
567
+ " </tr>\n",
568
+ " <tr>\n",
569
+ " <th>3</th>\n",
570
+ " <td>prediction-request-rag</td>\n",
571
+ " <td>0.000000</td>\n",
572
+ " <td>8.0</td>\n",
573
+ " </tr>\n",
574
+ " </tbody>\n",
575
+ "</table>\n",
576
+ "</div>"
577
+ ],
578
+ "text/plain": [
579
+ " tool losing_percentage num_calls\n",
580
+ "7 prediction-url-cot-claude 1.000000 1.0\n",
581
+ "2 prediction-online-sme 0.977273 44.0\n",
582
+ "1 prediction-online 0.975000 40.0\n",
583
+ "0 prediction-offline 0.677419 31.0\n",
584
+ "5 prediction-request-reasoning 0.534483 58.0\n",
585
+ "4 prediction-request-rag-claude 0.223881 67.0\n",
586
+ "6 prediction-request-reasoning-claude 0.200000 5.0\n",
587
+ "3 prediction-request-rag 0.000000 8.0"
588
+ ]
589
+ },
590
+ "execution_count": 65,
591
+ "metadata": {},
592
+ "output_type": "execute_result"
593
+ }
594
+ ],
595
+ "source": [
596
+ "# have confirmed currentAnswer\n",
597
+ "losing_percentage(winning_trades_percentage_bottom_50.loc[1, 'title'])"
598
+ ]
599
+ },
600
+ {
601
+ "cell_type": "code",
602
+ "execution_count": 66,
603
+ "metadata": {},
604
+ "outputs": [
605
+ {
606
+ "name": "stdout",
607
+ "output_type": "stream",
608
+ "text": [
609
+ "Losing percentage for: Will the final report on the Baltimore bridge collapse be released by 20 May 2024?\n"
610
+ ]
611
+ },
612
+ {
613
+ "data": {
614
+ "text/html": [
615
+ "<div>\n",
616
+ "<style scoped>\n",
617
+ " .dataframe tbody tr th:only-of-type {\n",
618
+ " vertical-align: middle;\n",
619
+ " }\n",
620
+ "\n",
621
+ " .dataframe tbody tr th {\n",
622
+ " vertical-align: top;\n",
623
+ " }\n",
624
+ "\n",
625
+ " .dataframe thead th {\n",
626
+ " text-align: right;\n",
627
+ " }\n",
628
+ "</style>\n",
629
+ "<table border=\"1\" class=\"dataframe\">\n",
630
+ " <thead>\n",
631
+ " <tr style=\"text-align: right;\">\n",
632
+ " <th></th>\n",
633
+ " <th>tool</th>\n",
634
+ " <th>losing_percentage</th>\n",
635
+ " <th>num_calls</th>\n",
636
+ " </tr>\n",
637
+ " </thead>\n",
638
+ " <tbody>\n",
639
+ " <tr>\n",
640
+ " <th>0</th>\n",
641
+ " <td>claude-prediction-offline</td>\n",
642
+ " <td>1.000000</td>\n",
643
+ " <td>5.0</td>\n",
644
+ " </tr>\n",
645
+ " <tr>\n",
646
+ " <th>1</th>\n",
647
+ " <td>claude-prediction-online</td>\n",
648
+ " <td>1.000000</td>\n",
649
+ " <td>1.0</td>\n",
650
+ " </tr>\n",
651
+ " <tr>\n",
652
+ " <th>2</th>\n",
653
+ " <td>prediction-offline</td>\n",
654
+ " <td>1.000000</td>\n",
655
+ " <td>87.0</td>\n",
656
+ " </tr>\n",
657
+ " <tr>\n",
658
+ " <th>6</th>\n",
659
+ " <td>prediction-request-rag-claude</td>\n",
660
+ " <td>1.000000</td>\n",
661
+ " <td>25.0</td>\n",
662
+ " </tr>\n",
663
+ " <tr>\n",
664
+ " <th>9</th>\n",
665
+ " <td>prediction-url-cot-claude</td>\n",
666
+ " <td>1.000000</td>\n",
667
+ " <td>1.0</td>\n",
668
+ " </tr>\n",
669
+ " <tr>\n",
670
+ " <th>3</th>\n",
671
+ " <td>prediction-online</td>\n",
672
+ " <td>0.951220</td>\n",
673
+ " <td>41.0</td>\n",
674
+ " </tr>\n",
675
+ " <tr>\n",
676
+ " <th>8</th>\n",
677
+ " <td>prediction-request-reasoning-claude</td>\n",
678
+ " <td>0.833333</td>\n",
679
+ " <td>6.0</td>\n",
680
+ " </tr>\n",
681
+ " <tr>\n",
682
+ " <th>5</th>\n",
683
+ " <td>prediction-request-rag</td>\n",
684
+ " <td>0.714286</td>\n",
685
+ " <td>7.0</td>\n",
686
+ " </tr>\n",
687
+ " <tr>\n",
688
+ " <th>7</th>\n",
689
+ " <td>prediction-request-reasoning</td>\n",
690
+ " <td>0.437500</td>\n",
691
+ " <td>48.0</td>\n",
692
+ " </tr>\n",
693
+ " <tr>\n",
694
+ " <th>4</th>\n",
695
+ " <td>prediction-online-sme</td>\n",
696
+ " <td>0.394366</td>\n",
697
+ " <td>71.0</td>\n",
698
+ " </tr>\n",
699
+ " </tbody>\n",
700
+ "</table>\n",
701
+ "</div>"
702
+ ],
703
+ "text/plain": [
704
+ " tool losing_percentage num_calls\n",
705
+ "0 claude-prediction-offline 1.000000 5.0\n",
706
+ "1 claude-prediction-online 1.000000 1.0\n",
707
+ "2 prediction-offline 1.000000 87.0\n",
708
+ "6 prediction-request-rag-claude 1.000000 25.0\n",
709
+ "9 prediction-url-cot-claude 1.000000 1.0\n",
710
+ "3 prediction-online 0.951220 41.0\n",
711
+ "8 prediction-request-reasoning-claude 0.833333 6.0\n",
712
+ "5 prediction-request-rag 0.714286 7.0\n",
713
+ "7 prediction-request-reasoning 0.437500 48.0\n",
714
+ "4 prediction-online-sme 0.394366 71.0"
715
+ ]
716
+ },
717
+ "execution_count": 66,
718
+ "metadata": {},
719
+ "output_type": "execute_result"
720
+ }
721
+ ],
722
+ "source": [
723
+ "# have confirmed currentAnswer\n",
724
+ "losing_percentage(winning_trades_percentage_bottom_50.loc[2, 'title'])"
725
+ ]
726
+ },
727
+ {
728
+ "cell_type": "code",
729
+ "execution_count": 67,
730
+ "metadata": {},
731
+ "outputs": [
732
+ {
733
+ "name": "stdout",
734
+ "output_type": "stream",
735
+ "text": [
736
+ "Losing percentage for: Will the Autonomous Racing League successfully hold their second race by May 3, 2024?\n"
737
+ ]
738
+ },
739
+ {
740
+ "data": {
741
+ "text/html": [
742
+ "<div>\n",
743
+ "<style scoped>\n",
744
+ " .dataframe tbody tr th:only-of-type {\n",
745
+ " vertical-align: middle;\n",
746
+ " }\n",
747
+ "\n",
748
+ " .dataframe tbody tr th {\n",
749
+ " vertical-align: top;\n",
750
+ " }\n",
751
+ "\n",
752
+ " .dataframe thead th {\n",
753
+ " text-align: right;\n",
754
+ " }\n",
755
+ "</style>\n",
756
+ "<table border=\"1\" class=\"dataframe\">\n",
757
+ " <thead>\n",
758
+ " <tr style=\"text-align: right;\">\n",
759
+ " <th></th>\n",
760
+ " <th>tool</th>\n",
761
+ " <th>losing_percentage</th>\n",
762
+ " <th>num_calls</th>\n",
763
+ " </tr>\n",
764
+ " </thead>\n",
765
+ " <tbody>\n",
766
+ " <tr>\n",
767
+ " <th>0</th>\n",
768
+ " <td>claude-prediction-offline</td>\n",
769
+ " <td>1.0</td>\n",
770
+ " <td>2.0</td>\n",
771
+ " </tr>\n",
772
+ " <tr>\n",
773
+ " <th>1</th>\n",
774
+ " <td>prediction-offline</td>\n",
775
+ " <td>1.0</td>\n",
776
+ " <td>23.0</td>\n",
777
+ " </tr>\n",
778
+ " <tr>\n",
779
+ " <th>2</th>\n",
780
+ " <td>prediction-online</td>\n",
781
+ " <td>1.0</td>\n",
782
+ " <td>14.0</td>\n",
783
+ " </tr>\n",
784
+ " <tr>\n",
785
+ " <th>3</th>\n",
786
+ " <td>prediction-online-sme</td>\n",
787
+ " <td>1.0</td>\n",
788
+ " <td>18.0</td>\n",
789
+ " </tr>\n",
790
+ " <tr>\n",
791
+ " <th>4</th>\n",
792
+ " <td>prediction-request-rag</td>\n",
793
+ " <td>1.0</td>\n",
794
+ " <td>5.0</td>\n",
795
+ " </tr>\n",
796
+ " <tr>\n",
797
+ " <th>5</th>\n",
798
+ " <td>prediction-request-rag-claude</td>\n",
799
+ " <td>1.0</td>\n",
800
+ " <td>8.0</td>\n",
801
+ " </tr>\n",
802
+ " <tr>\n",
803
+ " <th>8</th>\n",
804
+ " <td>prediction-url-cot-claude</td>\n",
805
+ " <td>1.0</td>\n",
806
+ " <td>6.0</td>\n",
807
+ " </tr>\n",
808
+ " <tr>\n",
809
+ " <th>6</th>\n",
810
+ " <td>prediction-request-reasoning</td>\n",
811
+ " <td>0.0</td>\n",
812
+ " <td>18.0</td>\n",
813
+ " </tr>\n",
814
+ " <tr>\n",
815
+ " <th>7</th>\n",
816
+ " <td>prediction-request-reasoning-claude</td>\n",
817
+ " <td>0.0</td>\n",
818
+ " <td>3.0</td>\n",
819
+ " </tr>\n",
820
+ " </tbody>\n",
821
+ "</table>\n",
822
+ "</div>"
823
+ ],
824
+ "text/plain": [
825
+ " tool losing_percentage num_calls\n",
826
+ "0 claude-prediction-offline 1.0 2.0\n",
827
+ "1 prediction-offline 1.0 23.0\n",
828
+ "2 prediction-online 1.0 14.0\n",
829
+ "3 prediction-online-sme 1.0 18.0\n",
830
+ "4 prediction-request-rag 1.0 5.0\n",
831
+ "5 prediction-request-rag-claude 1.0 8.0\n",
832
+ "8 prediction-url-cot-claude 1.0 6.0\n",
833
+ "6 prediction-request-reasoning 0.0 18.0\n",
834
+ "7 prediction-request-reasoning-claude 0.0 3.0"
835
+ ]
836
+ },
837
+ "execution_count": 67,
838
+ "metadata": {},
839
+ "output_type": "execute_result"
840
+ }
841
+ ],
842
+ "source": [
843
+ "# have confirmed currentAnswer\n",
844
+ "losing_percentage(winning_trades_percentage_bottom_50.loc[3, 'title'])"
845
+ ]
846
+ },
847
+ {
848
+ "cell_type": "code",
849
+ "execution_count": 72,
850
+ "metadata": {},
851
+ "outputs": [
852
+ {
853
+ "name": "stdout",
854
+ "output_type": "stream",
855
+ "text": [
856
+ "Losing percentage for: Will the Houston area experience flooding conditions on 11 May 2024?\n"
857
+ ]
858
+ },
859
+ {
860
+ "data": {
861
+ "text/html": [
862
+ "<div>\n",
863
+ "<style scoped>\n",
864
+ " .dataframe tbody tr th:only-of-type {\n",
865
+ " vertical-align: middle;\n",
866
+ " }\n",
867
+ "\n",
868
+ " .dataframe tbody tr th {\n",
869
+ " vertical-align: top;\n",
870
+ " }\n",
871
+ "\n",
872
+ " .dataframe thead th {\n",
873
+ " text-align: right;\n",
874
+ " }\n",
875
+ "</style>\n",
876
+ "<table border=\"1\" class=\"dataframe\">\n",
877
+ " <thead>\n",
878
+ " <tr style=\"text-align: right;\">\n",
879
+ " <th></th>\n",
880
+ " <th>tool</th>\n",
881
+ " <th>losing_percentage</th>\n",
882
+ " <th>num_calls</th>\n",
883
+ " </tr>\n",
884
+ " </thead>\n",
885
+ " <tbody>\n",
886
+ " <tr>\n",
887
+ " <th>0</th>\n",
888
+ " <td>claude-prediction-offline</td>\n",
889
+ " <td>1.000000</td>\n",
890
+ " <td>2.0</td>\n",
891
+ " </tr>\n",
892
+ " <tr>\n",
893
+ " <th>1</th>\n",
894
+ " <td>claude-prediction-online</td>\n",
895
+ " <td>1.000000</td>\n",
896
+ " <td>6.0</td>\n",
897
+ " </tr>\n",
898
+ " <tr>\n",
899
+ " <th>2</th>\n",
900
+ " <td>prediction-offline</td>\n",
901
+ " <td>1.000000</td>\n",
902
+ " <td>58.0</td>\n",
903
+ " </tr>\n",
904
+ " <tr>\n",
905
+ " <th>4</th>\n",
906
+ " <td>prediction-online-sme</td>\n",
907
+ " <td>1.000000</td>\n",
908
+ " <td>39.0</td>\n",
909
+ " </tr>\n",
910
+ " <tr>\n",
911
+ " <th>5</th>\n",
912
+ " <td>prediction-request-rag</td>\n",
913
+ " <td>1.000000</td>\n",
914
+ " <td>4.0</td>\n",
915
+ " </tr>\n",
916
+ " <tr>\n",
917
+ " <th>8</th>\n",
918
+ " <td>prediction-request-reasoning-claude</td>\n",
919
+ " <td>1.000000</td>\n",
920
+ " <td>8.0</td>\n",
921
+ " </tr>\n",
922
+ " <tr>\n",
923
+ " <th>9</th>\n",
924
+ " <td>prediction-url-cot-claude</td>\n",
925
+ " <td>1.000000</td>\n",
926
+ " <td>5.0</td>\n",
927
+ " </tr>\n",
928
+ " <tr>\n",
929
+ " <th>6</th>\n",
930
+ " <td>prediction-request-rag-claude</td>\n",
931
+ " <td>0.754717</td>\n",
932
+ " <td>53.0</td>\n",
933
+ " </tr>\n",
934
+ " <tr>\n",
935
+ " <th>7</th>\n",
936
+ " <td>prediction-request-reasoning</td>\n",
937
+ " <td>0.369048</td>\n",
938
+ " <td>84.0</td>\n",
939
+ " </tr>\n",
940
+ " <tr>\n",
941
+ " <th>3</th>\n",
942
+ " <td>prediction-online</td>\n",
943
+ " <td>0.166667</td>\n",
944
+ " <td>72.0</td>\n",
945
+ " </tr>\n",
946
+ " </tbody>\n",
947
+ "</table>\n",
948
+ "</div>"
949
+ ],
950
+ "text/plain": [
951
+ " tool losing_percentage num_calls\n",
952
+ "0 claude-prediction-offline 1.000000 2.0\n",
953
+ "1 claude-prediction-online 1.000000 6.0\n",
954
+ "2 prediction-offline 1.000000 58.0\n",
955
+ "4 prediction-online-sme 1.000000 39.0\n",
956
+ "5 prediction-request-rag 1.000000 4.0\n",
957
+ "8 prediction-request-reasoning-claude 1.000000 8.0\n",
958
+ "9 prediction-url-cot-claude 1.000000 5.0\n",
959
+ "6 prediction-request-rag-claude 0.754717 53.0\n",
960
+ "7 prediction-request-reasoning 0.369048 84.0\n",
961
+ "3 prediction-online 0.166667 72.0"
962
+ ]
963
+ },
964
+ "execution_count": 72,
965
+ "metadata": {},
966
+ "output_type": "execute_result"
967
+ }
968
+ ],
969
+ "source": [
970
+ "losing_percentage(winning_trades_percentage_bottom_50.loc[5, 'title'])"
971
+ ]
972
+ },
973
+ {
974
+ "cell_type": "code",
975
+ "execution_count": 73,
976
+ "metadata": {},
977
+ "outputs": [
978
+ {
979
+ "name": "stdout",
980
+ "output_type": "stream",
981
+ "text": [
982
+ "Losing percentage for: Will 'Wednesday' season 2 be released on Netflix by 1 May 2024?\n"
983
+ ]
984
+ },
985
+ {
986
+ "data": {
987
+ "text/html": [
988
+ "<div>\n",
989
+ "<style scoped>\n",
990
+ " .dataframe tbody tr th:only-of-type {\n",
991
+ " vertical-align: middle;\n",
992
+ " }\n",
993
+ "\n",
994
+ " .dataframe tbody tr th {\n",
995
+ " vertical-align: top;\n",
996
+ " }\n",
997
+ "\n",
998
+ " .dataframe thead th {\n",
999
+ " text-align: right;\n",
1000
+ " }\n",
1001
+ "</style>\n",
1002
+ "<table border=\"1\" class=\"dataframe\">\n",
1003
+ " <thead>\n",
1004
+ " <tr style=\"text-align: right;\">\n",
1005
+ " <th></th>\n",
1006
+ " <th>tool</th>\n",
1007
+ " <th>losing_percentage</th>\n",
1008
+ " <th>num_calls</th>\n",
1009
+ " </tr>\n",
1010
+ " </thead>\n",
1011
+ " <tbody>\n",
1012
+ " <tr>\n",
1013
+ " <th>1</th>\n",
1014
+ " <td>prediction-online-sme</td>\n",
1015
+ " <td>0.750000</td>\n",
1016
+ " <td>4.0</td>\n",
1017
+ " </tr>\n",
1018
+ " <tr>\n",
1019
+ " <th>5</th>\n",
1020
+ " <td>prediction-request-reasoning-claude</td>\n",
1021
+ " <td>0.750000</td>\n",
1022
+ " <td>4.0</td>\n",
1023
+ " </tr>\n",
1024
+ " <tr>\n",
1025
+ " <th>2</th>\n",
1026
+ " <td>prediction-request-rag</td>\n",
1027
+ " <td>0.666667</td>\n",
1028
+ " <td>6.0</td>\n",
1029
+ " </tr>\n",
1030
+ " <tr>\n",
1031
+ " <th>3</th>\n",
1032
+ " <td>prediction-request-rag-claude</td>\n",
1033
+ " <td>0.500000</td>\n",
1034
+ " <td>2.0</td>\n",
1035
+ " </tr>\n",
1036
+ " <tr>\n",
1037
+ " <th>4</th>\n",
1038
+ " <td>prediction-request-reasoning</td>\n",
1039
+ " <td>0.400000</td>\n",
1040
+ " <td>5.0</td>\n",
1041
+ " </tr>\n",
1042
+ " <tr>\n",
1043
+ " <th>0</th>\n",
1044
+ " <td>claude-prediction-online</td>\n",
1045
+ " <td>0.000000</td>\n",
1046
+ " <td>1.0</td>\n",
1047
+ " </tr>\n",
1048
+ " </tbody>\n",
1049
+ "</table>\n",
1050
+ "</div>"
1051
+ ],
1052
+ "text/plain": [
1053
+ " tool losing_percentage num_calls\n",
1054
+ "1 prediction-online-sme 0.750000 4.0\n",
1055
+ "5 prediction-request-reasoning-claude 0.750000 4.0\n",
1056
+ "2 prediction-request-rag 0.666667 6.0\n",
1057
+ "3 prediction-request-rag-claude 0.500000 2.0\n",
1058
+ "4 prediction-request-reasoning 0.400000 5.0\n",
1059
+ "0 claude-prediction-online 0.000000 1.0"
1060
+ ]
1061
+ },
1062
+ "execution_count": 73,
1063
+ "metadata": {},
1064
+ "output_type": "execute_result"
1065
+ }
1066
+ ],
1067
+ "source": [
1068
+ "losing_percentage(winning_trades_percentage_bottom_50.loc[6, 'title'])"
1069
+ ]
1070
+ },
1071
+ {
1072
+ "cell_type": "code",
1073
+ "execution_count": 74,
1074
+ "metadata": {},
1075
+ "outputs": [
1076
+ {
1077
+ "name": "stdout",
1078
+ "output_type": "stream",
1079
+ "text": [
1080
+ "Losing percentage for: Will Arsenal win against Bournemouth in the Premier League match on 12 May 2024?\n"
1081
+ ]
1082
+ },
1083
+ {
1084
+ "data": {
1085
+ "text/html": [
1086
+ "<div>\n",
1087
+ "<style scoped>\n",
1088
+ " .dataframe tbody tr th:only-of-type {\n",
1089
+ " vertical-align: middle;\n",
1090
+ " }\n",
1091
+ "\n",
1092
+ " .dataframe tbody tr th {\n",
1093
+ " vertical-align: top;\n",
1094
+ " }\n",
1095
+ "\n",
1096
+ " .dataframe thead th {\n",
1097
+ " text-align: right;\n",
1098
+ " }\n",
1099
+ "</style>\n",
1100
+ "<table border=\"1\" class=\"dataframe\">\n",
1101
+ " <thead>\n",
1102
+ " <tr style=\"text-align: right;\">\n",
1103
+ " <th></th>\n",
1104
+ " <th>tool</th>\n",
1105
+ " <th>losing_percentage</th>\n",
1106
+ " <th>num_calls</th>\n",
1107
+ " </tr>\n",
1108
+ " </thead>\n",
1109
+ " <tbody>\n",
1110
+ " <tr>\n",
1111
+ " <th>0</th>\n",
1112
+ " <td>prediction-offline</td>\n",
1113
+ " <td>1.000000</td>\n",
1114
+ " <td>11.0</td>\n",
1115
+ " </tr>\n",
1116
+ " <tr>\n",
1117
+ " <th>1</th>\n",
1118
+ " <td>prediction-online</td>\n",
1119
+ " <td>1.000000</td>\n",
1120
+ " <td>17.0</td>\n",
1121
+ " </tr>\n",
1122
+ " <tr>\n",
1123
+ " <th>2</th>\n",
1124
+ " <td>prediction-online-sme</td>\n",
1125
+ " <td>1.000000</td>\n",
1126
+ " <td>30.0</td>\n",
1127
+ " </tr>\n",
1128
+ " <tr>\n",
1129
+ " <th>4</th>\n",
1130
+ " <td>prediction-request-rag-claude</td>\n",
1131
+ " <td>1.000000</td>\n",
1132
+ " <td>45.0</td>\n",
1133
+ " </tr>\n",
1134
+ " <tr>\n",
1135
+ " <th>5</th>\n",
1136
+ " <td>prediction-request-reasoning</td>\n",
1137
+ " <td>0.874016</td>\n",
1138
+ " <td>127.0</td>\n",
1139
+ " </tr>\n",
1140
+ " <tr>\n",
1141
+ " <th>3</th>\n",
1142
+ " <td>prediction-request-rag</td>\n",
1143
+ " <td>0.250000</td>\n",
1144
+ " <td>4.0</td>\n",
1145
+ " </tr>\n",
1146
+ " <tr>\n",
1147
+ " <th>6</th>\n",
1148
+ " <td>prediction-request-reasoning-claude</td>\n",
1149
+ " <td>0.000000</td>\n",
1150
+ " <td>2.0</td>\n",
1151
+ " </tr>\n",
1152
+ " </tbody>\n",
1153
+ "</table>\n",
1154
+ "</div>"
1155
+ ],
1156
+ "text/plain": [
1157
+ " tool losing_percentage num_calls\n",
1158
+ "0 prediction-offline 1.000000 11.0\n",
1159
+ "1 prediction-online 1.000000 17.0\n",
1160
+ "2 prediction-online-sme 1.000000 30.0\n",
1161
+ "4 prediction-request-rag-claude 1.000000 45.0\n",
1162
+ "5 prediction-request-reasoning 0.874016 127.0\n",
1163
+ "3 prediction-request-rag 0.250000 4.0\n",
1164
+ "6 prediction-request-reasoning-claude 0.000000 2.0"
1165
+ ]
1166
+ },
1167
+ "execution_count": 74,
1168
+ "metadata": {},
1169
+ "output_type": "execute_result"
1170
+ }
1171
+ ],
1172
+ "source": [
1173
+ "losing_percentage(winning_trades_percentage_bottom_50.loc[7, 'title'])"
1174
+ ]
1175
+ },
1176
+ {
1177
+ "cell_type": "code",
1178
+ "execution_count": 75,
1179
+ "metadata": {},
1180
+ "outputs": [
1181
+ {
1182
+ "name": "stdout",
1183
+ "output_type": "stream",
1184
+ "text": [
1185
+ "Losing percentage for: Will Qualcomm release its Snapdragon X Plus laptop chip by 1 May 2024?\n"
1186
+ ]
1187
+ },
1188
+ {
1189
+ "data": {
1190
+ "text/html": [
1191
+ "<div>\n",
1192
+ "<style scoped>\n",
1193
+ " .dataframe tbody tr th:only-of-type {\n",
1194
+ " vertical-align: middle;\n",
1195
+ " }\n",
1196
+ "\n",
1197
+ " .dataframe tbody tr th {\n",
1198
+ " vertical-align: top;\n",
1199
+ " }\n",
1200
+ "\n",
1201
+ " .dataframe thead th {\n",
1202
+ " text-align: right;\n",
1203
+ " }\n",
1204
+ "</style>\n",
1205
+ "<table border=\"1\" class=\"dataframe\">\n",
1206
+ " <thead>\n",
1207
+ " <tr style=\"text-align: right;\">\n",
1208
+ " <th></th>\n",
1209
+ " <th>tool</th>\n",
1210
+ " <th>losing_percentage</th>\n",
1211
+ " <th>num_calls</th>\n",
1212
+ " </tr>\n",
1213
+ " </thead>\n",
1214
+ " <tbody>\n",
1215
+ " <tr>\n",
1216
+ " <th>0</th>\n",
1217
+ " <td>claude-prediction-offline</td>\n",
1218
+ " <td>1.000000</td>\n",
1219
+ " <td>7.0</td>\n",
1220
+ " </tr>\n",
1221
+ " <tr>\n",
1222
+ " <th>1</th>\n",
1223
+ " <td>prediction-offline</td>\n",
1224
+ " <td>1.000000</td>\n",
1225
+ " <td>1.0</td>\n",
1226
+ " </tr>\n",
1227
+ " <tr>\n",
1228
+ " <th>3</th>\n",
1229
+ " <td>prediction-online-sme</td>\n",
1230
+ " <td>1.000000</td>\n",
1231
+ " <td>19.0</td>\n",
1232
+ " </tr>\n",
1233
+ " <tr>\n",
1234
+ " <th>5</th>\n",
1235
+ " <td>prediction-request-rag-claude</td>\n",
1236
+ " <td>1.000000</td>\n",
1237
+ " <td>15.0</td>\n",
1238
+ " </tr>\n",
1239
+ " <tr>\n",
1240
+ " <th>4</th>\n",
1241
+ " <td>prediction-request-rag</td>\n",
1242
+ " <td>0.941176</td>\n",
1243
+ " <td>17.0</td>\n",
1244
+ " </tr>\n",
1245
+ " <tr>\n",
1246
+ " <th>2</th>\n",
1247
+ " <td>prediction-online</td>\n",
1248
+ " <td>0.800000</td>\n",
1249
+ " <td>5.0</td>\n",
1250
+ " </tr>\n",
1251
+ " <tr>\n",
1252
+ " <th>7</th>\n",
1253
+ " <td>prediction-request-reasoning-claude</td>\n",
1254
+ " <td>0.666667</td>\n",
1255
+ " <td>15.0</td>\n",
1256
+ " </tr>\n",
1257
+ " <tr>\n",
1258
+ " <th>6</th>\n",
1259
+ " <td>prediction-request-reasoning</td>\n",
1260
+ " <td>0.652174</td>\n",
1261
+ " <td>23.0</td>\n",
1262
+ " </tr>\n",
1263
+ " <tr>\n",
1264
+ " <th>8</th>\n",
1265
+ " <td>prediction-url-cot-claude</td>\n",
1266
+ " <td>0.333333</td>\n",
1267
+ " <td>3.0</td>\n",
1268
+ " </tr>\n",
1269
+ " </tbody>\n",
1270
+ "</table>\n",
1271
+ "</div>"
1272
+ ],
1273
+ "text/plain": [
1274
+ " tool losing_percentage num_calls\n",
1275
+ "0 claude-prediction-offline 1.000000 7.0\n",
1276
+ "1 prediction-offline 1.000000 1.0\n",
1277
+ "3 prediction-online-sme 1.000000 19.0\n",
1278
+ "5 prediction-request-rag-claude 1.000000 15.0\n",
1279
+ "4 prediction-request-rag 0.941176 17.0\n",
1280
+ "2 prediction-online 0.800000 5.0\n",
1281
+ "7 prediction-request-reasoning-claude 0.666667 15.0\n",
1282
+ "6 prediction-request-reasoning 0.652174 23.0\n",
1283
+ "8 prediction-url-cot-claude 0.333333 3.0"
1284
+ ]
1285
+ },
1286
+ "execution_count": 75,
1287
+ "metadata": {},
1288
+ "output_type": "execute_result"
1289
+ }
1290
+ ],
1291
+ "source": [
1292
+ "losing_percentage(winning_trades_percentage_bottom_50.loc[8, 'title'])"
1293
+ ]
1294
+ },
1295
+ {
1296
+ "cell_type": "code",
1297
+ "execution_count": 76,
1298
+ "metadata": {},
1299
+ "outputs": [
1300
+ {
1301
+ "name": "stdout",
1302
+ "output_type": "stream",
1303
+ "text": [
1304
+ "Losing percentage for: Will Feyenoord's Arne Slot become the new manager of Liverpool by 1 May 2024?\n"
1305
+ ]
1306
+ },
1307
+ {
1308
+ "data": {
1309
+ "text/html": [
1310
+ "<div>\n",
1311
+ "<style scoped>\n",
1312
+ " .dataframe tbody tr th:only-of-type {\n",
1313
+ " vertical-align: middle;\n",
1314
+ " }\n",
1315
+ "\n",
1316
+ " .dataframe tbody tr th {\n",
1317
+ " vertical-align: top;\n",
1318
+ " }\n",
1319
+ "\n",
1320
+ " .dataframe thead th {\n",
1321
+ " text-align: right;\n",
1322
+ " }\n",
1323
+ "</style>\n",
1324
+ "<table border=\"1\" class=\"dataframe\">\n",
1325
+ " <thead>\n",
1326
+ " <tr style=\"text-align: right;\">\n",
1327
+ " <th></th>\n",
1328
+ " <th>tool</th>\n",
1329
+ " <th>losing_percentage</th>\n",
1330
+ " <th>num_calls</th>\n",
1331
+ " </tr>\n",
1332
+ " </thead>\n",
1333
+ " <tbody>\n",
1334
+ " <tr>\n",
1335
+ " <th>0</th>\n",
1336
+ " <td>claude-prediction-offline</td>\n",
1337
+ " <td>1.000000</td>\n",
1338
+ " <td>4.0</td>\n",
1339
+ " </tr>\n",
1340
+ " <tr>\n",
1341
+ " <th>1</th>\n",
1342
+ " <td>prediction-offline</td>\n",
1343
+ " <td>1.000000</td>\n",
1344
+ " <td>2.0</td>\n",
1345
+ " </tr>\n",
1346
+ " <tr>\n",
1347
+ " <th>8</th>\n",
1348
+ " <td>prediction-url-cot-claude</td>\n",
1349
+ " <td>1.000000</td>\n",
1350
+ " <td>2.0</td>\n",
1351
+ " </tr>\n",
1352
+ " <tr>\n",
1353
+ " <th>6</th>\n",
1354
+ " <td>prediction-request-reasoning</td>\n",
1355
+ " <td>0.916667</td>\n",
1356
+ " <td>12.0</td>\n",
1357
+ " </tr>\n",
1358
+ " <tr>\n",
1359
+ " <th>7</th>\n",
1360
+ " <td>prediction-request-reasoning-claude</td>\n",
1361
+ " <td>0.900000</td>\n",
1362
+ " <td>10.0</td>\n",
1363
+ " </tr>\n",
1364
+ " <tr>\n",
1365
+ " <th>4</th>\n",
1366
+ " <td>prediction-request-rag</td>\n",
1367
+ " <td>0.714286</td>\n",
1368
+ " <td>14.0</td>\n",
1369
+ " </tr>\n",
1370
+ " <tr>\n",
1371
+ " <th>3</th>\n",
1372
+ " <td>prediction-online-sme</td>\n",
1373
+ " <td>0.666667</td>\n",
1374
+ " <td>9.0</td>\n",
1375
+ " </tr>\n",
1376
+ " <tr>\n",
1377
+ " <th>2</th>\n",
1378
+ " <td>prediction-online</td>\n",
1379
+ " <td>0.500000</td>\n",
1380
+ " <td>2.0</td>\n",
1381
+ " </tr>\n",
1382
+ " <tr>\n",
1383
+ " <th>5</th>\n",
1384
+ " <td>prediction-request-rag-claude</td>\n",
1385
+ " <td>0.454545</td>\n",
1386
+ " <td>11.0</td>\n",
1387
+ " </tr>\n",
1388
+ " </tbody>\n",
1389
+ "</table>\n",
1390
+ "</div>"
1391
+ ],
1392
+ "text/plain": [
1393
+ " tool losing_percentage num_calls\n",
1394
+ "0 claude-prediction-offline 1.000000 4.0\n",
1395
+ "1 prediction-offline 1.000000 2.0\n",
1396
+ "8 prediction-url-cot-claude 1.000000 2.0\n",
1397
+ "6 prediction-request-reasoning 0.916667 12.0\n",
1398
+ "7 prediction-request-reasoning-claude 0.900000 10.0\n",
1399
+ "4 prediction-request-rag 0.714286 14.0\n",
1400
+ "3 prediction-online-sme 0.666667 9.0\n",
1401
+ "2 prediction-online 0.500000 2.0\n",
1402
+ "5 prediction-request-rag-claude 0.454545 11.0"
1403
+ ]
1404
+ },
1405
+ "execution_count": 76,
1406
+ "metadata": {},
1407
+ "output_type": "execute_result"
1408
+ }
1409
+ ],
1410
+ "source": [
1411
+ "losing_percentage(winning_trades_percentage_bottom_50.loc[9, 'title'])"
1412
+ ]
1413
+ },
1414
+ {
1415
+ "cell_type": "code",
1416
+ "execution_count": 77,
1417
+ "metadata": {},
1418
+ "outputs": [
1419
+ {
1420
+ "name": "stdout",
1421
+ "output_type": "stream",
1422
+ "text": [
1423
+ "Losing percentage for: Will the FCC receive additional funding for replacing Huawei gear by 10 May 2024?\n"
1424
+ ]
1425
+ },
1426
+ {
1427
+ "data": {
1428
+ "text/html": [
1429
+ "<div>\n",
1430
+ "<style scoped>\n",
1431
+ " .dataframe tbody tr th:only-of-type {\n",
1432
+ " vertical-align: middle;\n",
1433
+ " }\n",
1434
+ "\n",
1435
+ " .dataframe tbody tr th {\n",
1436
+ " vertical-align: top;\n",
1437
+ " }\n",
1438
+ "\n",
1439
+ " .dataframe thead th {\n",
1440
+ " text-align: right;\n",
1441
+ " }\n",
1442
+ "</style>\n",
1443
+ "<table border=\"1\" class=\"dataframe\">\n",
1444
+ " <thead>\n",
1445
+ " <tr style=\"text-align: right;\">\n",
1446
+ " <th></th>\n",
1447
+ " <th>tool</th>\n",
1448
+ " <th>losing_percentage</th>\n",
1449
+ " <th>num_calls</th>\n",
1450
+ " </tr>\n",
1451
+ " </thead>\n",
1452
+ " <tbody>\n",
1453
+ " <tr>\n",
1454
+ " <th>0</th>\n",
1455
+ " <td>claude-prediction-offline</td>\n",
1456
+ " <td>1.000000</td>\n",
1457
+ " <td>6.0</td>\n",
1458
+ " </tr>\n",
1459
+ " <tr>\n",
1460
+ " <th>1</th>\n",
1461
+ " <td>claude-prediction-online</td>\n",
1462
+ " <td>1.000000</td>\n",
1463
+ " <td>3.0</td>\n",
1464
+ " </tr>\n",
1465
+ " <tr>\n",
1466
+ " <th>2</th>\n",
1467
+ " <td>prediction-offline</td>\n",
1468
+ " <td>1.000000</td>\n",
1469
+ " <td>36.0</td>\n",
1470
+ " </tr>\n",
1471
+ " <tr>\n",
1472
+ " <th>6</th>\n",
1473
+ " <td>prediction-request-rag-claude</td>\n",
1474
+ " <td>1.000000</td>\n",
1475
+ " <td>50.0</td>\n",
1476
+ " </tr>\n",
1477
+ " <tr>\n",
1478
+ " <th>4</th>\n",
1479
+ " <td>prediction-online-sme</td>\n",
1480
+ " <td>0.986486</td>\n",
1481
+ " <td>74.0</td>\n",
1482
+ " </tr>\n",
1483
+ " <tr>\n",
1484
+ " <th>5</th>\n",
1485
+ " <td>prediction-request-rag</td>\n",
1486
+ " <td>0.947368</td>\n",
1487
+ " <td>19.0</td>\n",
1488
+ " </tr>\n",
1489
+ " <tr>\n",
1490
+ " <th>3</th>\n",
1491
+ " <td>prediction-online</td>\n",
1492
+ " <td>0.910714</td>\n",
1493
+ " <td>56.0</td>\n",
1494
+ " </tr>\n",
1495
+ " <tr>\n",
1496
+ " <th>9</th>\n",
1497
+ " <td>prediction-url-cot-claude</td>\n",
1498
+ " <td>0.777778</td>\n",
1499
+ " <td>9.0</td>\n",
1500
+ " </tr>\n",
1501
+ " <tr>\n",
1502
+ " <th>7</th>\n",
1503
+ " <td>prediction-request-reasoning</td>\n",
1504
+ " <td>0.465753</td>\n",
1505
+ " <td>73.0</td>\n",
1506
+ " </tr>\n",
1507
+ " <tr>\n",
1508
+ " <th>8</th>\n",
1509
+ " <td>prediction-request-reasoning-claude</td>\n",
1510
+ " <td>0.071429</td>\n",
1511
+ " <td>14.0</td>\n",
1512
+ " </tr>\n",
1513
+ " </tbody>\n",
1514
+ "</table>\n",
1515
+ "</div>"
1516
+ ],
1517
+ "text/plain": [
1518
+ " tool losing_percentage num_calls\n",
1519
+ "0 claude-prediction-offline 1.000000 6.0\n",
1520
+ "1 claude-prediction-online 1.000000 3.0\n",
1521
+ "2 prediction-offline 1.000000 36.0\n",
1522
+ "6 prediction-request-rag-claude 1.000000 50.0\n",
1523
+ "4 prediction-online-sme 0.986486 74.0\n",
1524
+ "5 prediction-request-rag 0.947368 19.0\n",
1525
+ "3 prediction-online 0.910714 56.0\n",
1526
+ "9 prediction-url-cot-claude 0.777778 9.0\n",
1527
+ "7 prediction-request-reasoning 0.465753 73.0\n",
1528
+ "8 prediction-request-reasoning-claude 0.071429 14.0"
1529
+ ]
1530
+ },
1531
+ "execution_count": 77,
1532
+ "metadata": {},
1533
+ "output_type": "execute_result"
1534
+ }
1535
+ ],
1536
+ "source": [
1537
+ "losing_percentage(winning_trades_percentage_bottom_50.loc[10, 'title'])"
1538
+ ]
1539
+ },
1540
+ {
1541
+ "cell_type": "code",
1542
+ "execution_count": 98,
1543
+ "metadata": {},
1544
+ "outputs": [],
1545
+ "source": [
1546
+ "all_q = winning_trades_percentage_bottom_50['title'].unique().tolist()\n",
1547
+ "q_losing = tools[tools['prompt_request'].isin(all_q)]\n",
1548
+ "q_losing = q_losing.groupby(['tool'])['winning_vote'].value_counts().unstack().fillna(0)\n",
1549
+ "q_losing_perc = q_losing[False] / (q_losing[False] + q_losing[True])\n",
1550
+ "q_losing_perc = q_losing_perc.reset_index()\n",
1551
+ "q_losing_perc.columns = ['tool', 'losing_percentage']\n",
1552
+ "q_losing_perc['num_calls'] = list(q_losing.sum(axis=1).values)\n",
1553
+ "q_losing_perc = q_losing_perc.sort_values(by='losing_percentage', ascending=False)"
1554
+ ]
1555
+ },
1556
+ {
1557
+ "cell_type": "code",
1558
+ "execution_count": 99,
1559
+ "metadata": {},
1560
+ "outputs": [
1561
+ {
1562
+ "data": {
1563
+ "text/html": [
1564
+ "<div>\n",
1565
+ "<style scoped>\n",
1566
+ " .dataframe tbody tr th:only-of-type {\n",
1567
+ " vertical-align: middle;\n",
1568
+ " }\n",
1569
+ "\n",
1570
+ " .dataframe tbody tr th {\n",
1571
+ " vertical-align: top;\n",
1572
+ " }\n",
1573
+ "\n",
1574
+ " .dataframe thead th {\n",
1575
+ " text-align: right;\n",
1576
+ " }\n",
1577
+ "</style>\n",
1578
+ "<table border=\"1\" class=\"dataframe\">\n",
1579
+ " <thead>\n",
1580
+ " <tr style=\"text-align: right;\">\n",
1581
+ " <th></th>\n",
1582
+ " <th>tool</th>\n",
1583
+ " <th>losing_percentage</th>\n",
1584
+ " <th>num_calls</th>\n",
1585
+ " </tr>\n",
1586
+ " </thead>\n",
1587
+ " <tbody>\n",
1588
+ " <tr>\n",
1589
+ " <th>3</th>\n",
1590
+ " <td>prediction-offline-sme</td>\n",
1591
+ " <td>1.000000</td>\n",
1592
+ " <td>2.0</td>\n",
1593
+ " </tr>\n",
1594
+ " <tr>\n",
1595
+ " <th>7</th>\n",
1596
+ " <td>prediction-request-rag-claude</td>\n",
1597
+ " <td>0.913007</td>\n",
1598
+ " <td>1184.0</td>\n",
1599
+ " </tr>\n",
1600
+ " <tr>\n",
1601
+ " <th>2</th>\n",
1602
+ " <td>prediction-offline</td>\n",
1603
+ " <td>0.893281</td>\n",
1604
+ " <td>1012.0</td>\n",
1605
+ " </tr>\n",
1606
+ " <tr>\n",
1607
+ " <th>6</th>\n",
1608
+ " <td>prediction-request-rag</td>\n",
1609
+ " <td>0.889881</td>\n",
1610
+ " <td>336.0</td>\n",
1611
+ " </tr>\n",
1612
+ " <tr>\n",
1613
+ " <th>5</th>\n",
1614
+ " <td>prediction-online-sme</td>\n",
1615
+ " <td>0.857143</td>\n",
1616
+ " <td>1722.0</td>\n",
1617
+ " </tr>\n",
1618
+ " <tr>\n",
1619
+ " <th>4</th>\n",
1620
+ " <td>prediction-online</td>\n",
1621
+ " <td>0.853553</td>\n",
1622
+ " <td>1154.0</td>\n",
1623
+ " </tr>\n",
1624
+ " <tr>\n",
1625
+ " <th>8</th>\n",
1626
+ " <td>prediction-request-reasoning</td>\n",
1627
+ " <td>0.847451</td>\n",
1628
+ " <td>2727.0</td>\n",
1629
+ " </tr>\n",
1630
+ " <tr>\n",
1631
+ " <th>10</th>\n",
1632
+ " <td>prediction-url-cot-claude</td>\n",
1633
+ " <td>0.846154</td>\n",
1634
+ " <td>130.0</td>\n",
1635
+ " </tr>\n",
1636
+ " <tr>\n",
1637
+ " <th>1</th>\n",
1638
+ " <td>claude-prediction-online</td>\n",
1639
+ " <td>0.735849</td>\n",
1640
+ " <td>53.0</td>\n",
1641
+ " </tr>\n",
1642
+ " <tr>\n",
1643
+ " <th>9</th>\n",
1644
+ " <td>prediction-request-reasoning-claude</td>\n",
1645
+ " <td>0.659664</td>\n",
1646
+ " <td>238.0</td>\n",
1647
+ " </tr>\n",
1648
+ " <tr>\n",
1649
+ " <th>0</th>\n",
1650
+ " <td>claude-prediction-offline</td>\n",
1651
+ " <td>0.591549</td>\n",
1652
+ " <td>142.0</td>\n",
1653
+ " </tr>\n",
1654
+ " </tbody>\n",
1655
+ "</table>\n",
1656
+ "</div>"
1657
+ ],
1658
+ "text/plain": [
1659
+ " tool losing_percentage num_calls\n",
1660
+ "3 prediction-offline-sme 1.000000 2.0\n",
1661
+ "7 prediction-request-rag-claude 0.913007 1184.0\n",
1662
+ "2 prediction-offline 0.893281 1012.0\n",
1663
+ "6 prediction-request-rag 0.889881 336.0\n",
1664
+ "5 prediction-online-sme 0.857143 1722.0\n",
1665
+ "4 prediction-online 0.853553 1154.0\n",
1666
+ "8 prediction-request-reasoning 0.847451 2727.0\n",
1667
+ "10 prediction-url-cot-claude 0.846154 130.0\n",
1668
+ "1 claude-prediction-online 0.735849 53.0\n",
1669
+ "9 prediction-request-reasoning-claude 0.659664 238.0\n",
1670
+ "0 claude-prediction-offline 0.591549 142.0"
1671
+ ]
1672
+ },
1673
+ "execution_count": 99,
1674
+ "metadata": {},
1675
+ "output_type": "execute_result"
1676
+ }
1677
+ ],
1678
+ "source": [
1679
+ "q_losing_perc"
1680
+ ]
1681
+ },
1682
+ {
1683
+ "cell_type": "code",
1684
+ "execution_count": 103,
1685
+ "metadata": {},
1686
+ "outputs": [
1687
+ {
1688
+ "data": {
1689
+ "text/html": [
1690
+ "<div>\n",
1691
+ "<style scoped>\n",
1692
+ " .dataframe tbody tr th:only-of-type {\n",
1693
+ " vertical-align: middle;\n",
1694
+ " }\n",
1695
+ "\n",
1696
+ " .dataframe tbody tr th {\n",
1697
+ " vertical-align: top;\n",
1698
+ " }\n",
1699
+ "\n",
1700
+ " .dataframe thead th {\n",
1701
+ " text-align: right;\n",
1702
+ " }\n",
1703
+ "</style>\n",
1704
+ "<table border=\"1\" class=\"dataframe\">\n",
1705
+ " <thead>\n",
1706
+ " <tr style=\"text-align: right;\">\n",
1707
+ " <th>confidence</th>\n",
1708
+ " <th>0.00</th>\n",
1709
+ " <th>0.10</th>\n",
1710
+ " <th>0.20</th>\n",
1711
+ " <th>0.30</th>\n",
1712
+ " <th>0.40</th>\n",
1713
+ " <th>0.50</th>\n",
1714
+ " <th>0.55</th>\n",
1715
+ " <th>0.60</th>\n",
1716
+ " <th>0.65</th>\n",
1717
+ " <th>0.70</th>\n",
1718
+ " <th>0.75</th>\n",
1719
+ " <th>0.80</th>\n",
1720
+ " <th>0.85</th>\n",
1721
+ " <th>0.90</th>\n",
1722
+ " <th>0.95</th>\n",
1723
+ " <th>0.99</th>\n",
1724
+ " <th>1.00</th>\n",
1725
+ " </tr>\n",
1726
+ " <tr>\n",
1727
+ " <th>tool</th>\n",
1728
+ " <th></th>\n",
1729
+ " <th></th>\n",
1730
+ " <th></th>\n",
1731
+ " <th></th>\n",
1732
+ " <th></th>\n",
1733
+ " <th></th>\n",
1734
+ " <th></th>\n",
1735
+ " <th></th>\n",
1736
+ " <th></th>\n",
1737
+ " <th></th>\n",
1738
+ " <th></th>\n",
1739
+ " <th></th>\n",
1740
+ " <th></th>\n",
1741
+ " <th></th>\n",
1742
+ " <th></th>\n",
1743
+ " <th></th>\n",
1744
+ " <th></th>\n",
1745
+ " </tr>\n",
1746
+ " </thead>\n",
1747
+ " <tbody>\n",
1748
+ " <tr>\n",
1749
+ " <th>claude-prediction-offline</th>\n",
1750
+ " <td>0.0</td>\n",
1751
+ " <td>0.0</td>\n",
1752
+ " <td>5.0</td>\n",
1753
+ " <td>46.0</td>\n",
1754
+ " <td>4.0</td>\n",
1755
+ " <td>0.0</td>\n",
1756
+ " <td>0.0</td>\n",
1757
+ " <td>87.0</td>\n",
1758
+ " <td>0.0</td>\n",
1759
+ " <td>0.0</td>\n",
1760
+ " <td>0.0</td>\n",
1761
+ " <td>0.0</td>\n",
1762
+ " <td>0.0</td>\n",
1763
+ " <td>0.0</td>\n",
1764
+ " <td>0.0</td>\n",
1765
+ " <td>0.0</td>\n",
1766
+ " <td>0.0</td>\n",
1767
+ " </tr>\n",
1768
+ " <tr>\n",
1769
+ " <th>claude-prediction-online</th>\n",
1770
+ " <td>0.0</td>\n",
1771
+ " <td>0.0</td>\n",
1772
+ " <td>2.0</td>\n",
1773
+ " <td>10.0</td>\n",
1774
+ " <td>7.0</td>\n",
1775
+ " <td>3.0</td>\n",
1776
+ " <td>0.0</td>\n",
1777
+ " <td>30.0</td>\n",
1778
+ " <td>0.0</td>\n",
1779
+ " <td>0.0</td>\n",
1780
+ " <td>0.0</td>\n",
1781
+ " <td>0.0</td>\n",
1782
+ " <td>0.0</td>\n",
1783
+ " <td>1.0</td>\n",
1784
+ " <td>0.0</td>\n",
1785
+ " <td>0.0</td>\n",
1786
+ " <td>0.0</td>\n",
1787
+ " </tr>\n",
1788
+ " <tr>\n",
1789
+ " <th>prediction-offline</th>\n",
1790
+ " <td>0.0</td>\n",
1791
+ " <td>267.0</td>\n",
1792
+ " <td>2.0</td>\n",
1793
+ " <td>13.0</td>\n",
1794
+ " <td>302.0</td>\n",
1795
+ " <td>189.0</td>\n",
1796
+ " <td>0.0</td>\n",
1797
+ " <td>231.0</td>\n",
1798
+ " <td>3.0</td>\n",
1799
+ " <td>0.0</td>\n",
1800
+ " <td>0.0</td>\n",
1801
+ " <td>0.0</td>\n",
1802
+ " <td>1.0</td>\n",
1803
+ " <td>2.0</td>\n",
1804
+ " <td>0.0</td>\n",
1805
+ " <td>0.0</td>\n",
1806
+ " <td>1.0</td>\n",
1807
+ " </tr>\n",
1808
+ " <tr>\n",
1809
+ " <th>prediction-offline-sme</th>\n",
1810
+ " <td>0.0</td>\n",
1811
+ " <td>0.0</td>\n",
1812
+ " <td>0.0</td>\n",
1813
+ " <td>0.0</td>\n",
1814
+ " <td>0.0</td>\n",
1815
+ " <td>0.0</td>\n",
1816
+ " <td>0.0</td>\n",
1817
+ " <td>0.0</td>\n",
1818
+ " <td>0.0</td>\n",
1819
+ " <td>0.0</td>\n",
1820
+ " <td>2.0</td>\n",
1821
+ " <td>0.0</td>\n",
1822
+ " <td>0.0</td>\n",
1823
+ " <td>0.0</td>\n",
1824
+ " <td>0.0</td>\n",
1825
+ " <td>0.0</td>\n",
1826
+ " <td>0.0</td>\n",
1827
+ " </tr>\n",
1828
+ " <tr>\n",
1829
+ " <th>prediction-online</th>\n",
1830
+ " <td>0.0</td>\n",
1831
+ " <td>22.0</td>\n",
1832
+ " <td>4.0</td>\n",
1833
+ " <td>5.0</td>\n",
1834
+ " <td>43.0</td>\n",
1835
+ " <td>23.0</td>\n",
1836
+ " <td>8.0</td>\n",
1837
+ " <td>670.0</td>\n",
1838
+ " <td>99.0</td>\n",
1839
+ " <td>2.0</td>\n",
1840
+ " <td>76.0</td>\n",
1841
+ " <td>28.0</td>\n",
1842
+ " <td>55.0</td>\n",
1843
+ " <td>25.0</td>\n",
1844
+ " <td>11.0</td>\n",
1845
+ " <td>0.0</td>\n",
1846
+ " <td>20.0</td>\n",
1847
+ " </tr>\n",
1848
+ " <tr>\n",
1849
+ " <th>prediction-online-sme</th>\n",
1850
+ " <td>1.0</td>\n",
1851
+ " <td>27.0</td>\n",
1852
+ " <td>10.0</td>\n",
1853
+ " <td>0.0</td>\n",
1854
+ " <td>71.0</td>\n",
1855
+ " <td>2.0</td>\n",
1856
+ " <td>0.0</td>\n",
1857
+ " <td>679.0</td>\n",
1858
+ " <td>234.0</td>\n",
1859
+ " <td>39.0</td>\n",
1860
+ " <td>149.0</td>\n",
1861
+ " <td>76.0</td>\n",
1862
+ " <td>109.0</td>\n",
1863
+ " <td>80.0</td>\n",
1864
+ " <td>6.0</td>\n",
1865
+ " <td>0.0</td>\n",
1866
+ " <td>39.0</td>\n",
1867
+ " </tr>\n",
1868
+ " <tr>\n",
1869
+ " <th>prediction-request-rag</th>\n",
1870
+ " <td>0.0</td>\n",
1871
+ " <td>3.0</td>\n",
1872
+ " <td>2.0</td>\n",
1873
+ " <td>0.0</td>\n",
1874
+ " <td>4.0</td>\n",
1875
+ " <td>4.0</td>\n",
1876
+ " <td>0.0</td>\n",
1877
+ " <td>25.0</td>\n",
1878
+ " <td>5.0</td>\n",
1879
+ " <td>48.0</td>\n",
1880
+ " <td>11.0</td>\n",
1881
+ " <td>36.0</td>\n",
1882
+ " <td>57.0</td>\n",
1883
+ " <td>16.0</td>\n",
1884
+ " <td>11.0</td>\n",
1885
+ " <td>1.0</td>\n",
1886
+ " <td>20.0</td>\n",
1887
+ " </tr>\n",
1888
+ " <tr>\n",
1889
+ " <th>prediction-request-rag-claude</th>\n",
1890
+ " <td>0.0</td>\n",
1891
+ " <td>0.0</td>\n",
1892
+ " <td>1.0</td>\n",
1893
+ " <td>32.0</td>\n",
1894
+ " <td>0.0</td>\n",
1895
+ " <td>0.0</td>\n",
1896
+ " <td>0.0</td>\n",
1897
+ " <td>175.0</td>\n",
1898
+ " <td>0.0</td>\n",
1899
+ " <td>513.0</td>\n",
1900
+ " <td>0.0</td>\n",
1901
+ " <td>209.0</td>\n",
1902
+ " <td>3.0</td>\n",
1903
+ " <td>40.0</td>\n",
1904
+ " <td>3.0</td>\n",
1905
+ " <td>0.0</td>\n",
1906
+ " <td>0.0</td>\n",
1907
+ " </tr>\n",
1908
+ " <tr>\n",
1909
+ " <th>prediction-request-reasoning</th>\n",
1910
+ " <td>0.0</td>\n",
1911
+ " <td>3.0</td>\n",
1912
+ " <td>103.0</td>\n",
1913
+ " <td>1.0</td>\n",
1914
+ " <td>58.0</td>\n",
1915
+ " <td>97.0</td>\n",
1916
+ " <td>0.0</td>\n",
1917
+ " <td>315.0</td>\n",
1918
+ " <td>176.0</td>\n",
1919
+ " <td>441.0</td>\n",
1920
+ " <td>317.0</td>\n",
1921
+ " <td>339.0</td>\n",
1922
+ " <td>159.0</td>\n",
1923
+ " <td>44.0</td>\n",
1924
+ " <td>58.0</td>\n",
1925
+ " <td>0.0</td>\n",
1926
+ " <td>97.0</td>\n",
1927
+ " </tr>\n",
1928
+ " <tr>\n",
1929
+ " <th>prediction-request-reasoning-claude</th>\n",
1930
+ " <td>0.0</td>\n",
1931
+ " <td>0.0</td>\n",
1932
+ " <td>0.0</td>\n",
1933
+ " <td>3.0</td>\n",
1934
+ " <td>4.0</td>\n",
1935
+ " <td>0.0</td>\n",
1936
+ " <td>0.0</td>\n",
1937
+ " <td>27.0</td>\n",
1938
+ " <td>0.0</td>\n",
1939
+ " <td>38.0</td>\n",
1940
+ " <td>4.0</td>\n",
1941
+ " <td>76.0</td>\n",
1942
+ " <td>0.0</td>\n",
1943
+ " <td>8.0</td>\n",
1944
+ " <td>1.0</td>\n",
1945
+ " <td>0.0</td>\n",
1946
+ " <td>2.0</td>\n",
1947
+ " </tr>\n",
1948
+ " <tr>\n",
1949
+ " <th>prediction-url-cot-claude</th>\n",
1950
+ " <td>0.0</td>\n",
1951
+ " <td>2.0</td>\n",
1952
+ " <td>1.0</td>\n",
1953
+ " <td>2.0</td>\n",
1954
+ " <td>0.0</td>\n",
1955
+ " <td>0.0</td>\n",
1956
+ " <td>0.0</td>\n",
1957
+ " <td>40.0</td>\n",
1958
+ " <td>0.0</td>\n",
1959
+ " <td>60.0</td>\n",
1960
+ " <td>0.0</td>\n",
1961
+ " <td>22.0</td>\n",
1962
+ " <td>0.0</td>\n",
1963
+ " <td>3.0</td>\n",
1964
+ " <td>0.0</td>\n",
1965
+ " <td>0.0</td>\n",
1966
+ " <td>0.0</td>\n",
1967
+ " </tr>\n",
1968
+ " </tbody>\n",
1969
+ "</table>\n",
1970
+ "</div>"
1971
+ ],
1972
+ "text/plain": [
1973
+ "confidence 0.00 0.10 0.20 0.30 0.40 0.50 \\\n",
1974
+ "tool \n",
1975
+ "claude-prediction-offline 0.0 0.0 5.0 46.0 4.0 0.0 \n",
1976
+ "claude-prediction-online 0.0 0.0 2.0 10.0 7.0 3.0 \n",
1977
+ "prediction-offline 0.0 267.0 2.0 13.0 302.0 189.0 \n",
1978
+ "prediction-offline-sme 0.0 0.0 0.0 0.0 0.0 0.0 \n",
1979
+ "prediction-online 0.0 22.0 4.0 5.0 43.0 23.0 \n",
1980
+ "prediction-online-sme 1.0 27.0 10.0 0.0 71.0 2.0 \n",
1981
+ "prediction-request-rag 0.0 3.0 2.0 0.0 4.0 4.0 \n",
1982
+ "prediction-request-rag-claude 0.0 0.0 1.0 32.0 0.0 0.0 \n",
1983
+ "prediction-request-reasoning 0.0 3.0 103.0 1.0 58.0 97.0 \n",
1984
+ "prediction-request-reasoning-claude 0.0 0.0 0.0 3.0 4.0 0.0 \n",
1985
+ "prediction-url-cot-claude 0.0 2.0 1.0 2.0 0.0 0.0 \n",
1986
+ "\n",
1987
+ "confidence 0.55 0.60 0.65 0.70 0.75 0.80 \\\n",
1988
+ "tool \n",
1989
+ "claude-prediction-offline 0.0 87.0 0.0 0.0 0.0 0.0 \n",
1990
+ "claude-prediction-online 0.0 30.0 0.0 0.0 0.0 0.0 \n",
1991
+ "prediction-offline 0.0 231.0 3.0 0.0 0.0 0.0 \n",
1992
+ "prediction-offline-sme 0.0 0.0 0.0 0.0 2.0 0.0 \n",
1993
+ "prediction-online 8.0 670.0 99.0 2.0 76.0 28.0 \n",
1994
+ "prediction-online-sme 0.0 679.0 234.0 39.0 149.0 76.0 \n",
1995
+ "prediction-request-rag 0.0 25.0 5.0 48.0 11.0 36.0 \n",
1996
+ "prediction-request-rag-claude 0.0 175.0 0.0 513.0 0.0 209.0 \n",
1997
+ "prediction-request-reasoning 0.0 315.0 176.0 441.0 317.0 339.0 \n",
1998
+ "prediction-request-reasoning-claude 0.0 27.0 0.0 38.0 4.0 76.0 \n",
1999
+ "prediction-url-cot-claude 0.0 40.0 0.0 60.0 0.0 22.0 \n",
2000
+ "\n",
2001
+ "confidence 0.85 0.90 0.95 0.99 1.00 \n",
2002
+ "tool \n",
2003
+ "claude-prediction-offline 0.0 0.0 0.0 0.0 0.0 \n",
2004
+ "claude-prediction-online 0.0 1.0 0.0 0.0 0.0 \n",
2005
+ "prediction-offline 1.0 2.0 0.0 0.0 1.0 \n",
2006
+ "prediction-offline-sme 0.0 0.0 0.0 0.0 0.0 \n",
2007
+ "prediction-online 55.0 25.0 11.0 0.0 20.0 \n",
2008
+ "prediction-online-sme 109.0 80.0 6.0 0.0 39.0 \n",
2009
+ "prediction-request-rag 57.0 16.0 11.0 1.0 20.0 \n",
2010
+ "prediction-request-rag-claude 3.0 40.0 3.0 0.0 0.0 \n",
2011
+ "prediction-request-reasoning 159.0 44.0 58.0 0.0 97.0 \n",
2012
+ "prediction-request-reasoning-claude 0.0 8.0 1.0 0.0 2.0 \n",
2013
+ "prediction-url-cot-claude 0.0 3.0 0.0 0.0 0.0 "
2014
+ ]
2015
+ },
2016
+ "execution_count": 103,
2017
+ "metadata": {},
2018
+ "output_type": "execute_result"
2019
+ }
2020
+ ],
2021
+ "source": [
2022
+ "all_q = winning_trades_percentage_bottom_50['title'].unique().tolist()\n",
2023
+ "q_losing = tools[tools['prompt_request'].isin(all_q)]\n",
2024
+ "q_losing.groupby(['tool'])['confidence'].value_counts().unstack().fillna(0)"
2025
+ ]
2026
+ },
2027
+ {
2028
+ "cell_type": "code",
2029
+ "execution_count": null,
2030
+ "metadata": {},
2031
+ "outputs": [],
2032
+ "source": []
2033
+ }
2034
+ ],
2035
+ "metadata": {
2036
+ "kernelspec": {
2037
+ "display_name": "akash",
2038
+ "language": "python",
2039
+ "name": "python3"
2040
+ },
2041
+ "language_info": {
2042
+ "codemirror_mode": {
2043
+ "name": "ipython",
2044
+ "version": 3
2045
+ },
2046
+ "file_extension": ".py",
2047
+ "mimetype": "text/x-python",
2048
+ "name": "python",
2049
+ "nbconvert_exporter": "python",
2050
+ "pygments_lexer": "ipython3",
2051
+ "version": "3.10.14"
2052
+ }
2053
+ },
2054
+ "nbformat": 4,
2055
+ "nbformat_minor": 2
2056
+ }
data/all_trades_profitability.parquet CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ae0de6d7e607b8ac33140081ab5415b9c16e7359d23b196e555535af0d78965c
3
- size 8251611
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:759ac5ecc5e08af691ebb0c486283167d19bcdd7936ca83bca8e61615e77422c
3
+ size 8282206
data/delivers.parquet CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5002d6ef2bf5d2e69def7f6c69090f72c961c0eee7724870b0960c20514b1180
3
- size 1707150876
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4332e5ccb2f9600adfdada29a2796936ae5a5c23d1ba48188552593adf5eb3d
3
+ size 1729282728
data/fpmmTrades.parquet CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bb0cd005a2bb7b37b04e0388538249ab6434c9de532b337fcee775ab9205064c
3
- size 20528876
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d86c6b4527dce48f065ce3d0221973d7e0f600847ebe1439066d0279bd2544e
3
+ size 20736626
data/fpmms.parquet CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5b0b82cf173571152d11bbcabd94f675e8d84c148925f47a96c5192d9b9e2f67
3
- size 319767
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6010818e9c94b64d1823402ca2d7aa81cb8ea84e6bc378a611f815b550a4645d
3
+ size 320603
data/requests.parquet CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a3de88b6c91037ed4245a60473dcca4dce395d1583ec5cb39f79ab0e42759904
3
- size 46486507
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f94e2e01267b583852cc9b693b92226358bb64b8c5397f1665c35949c95c8df
3
+ size 46998239
data/summary_profitability.parquet CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0ef6d6a03b5f872d0228881b74e3a2427c4e8a5f7fd02776eb70683605ccbb4b
3
- size 52394
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:55bc5f2854a889cb7ceb23391d17bec764f2942c032982d2c14ac76a0bda4507
3
+ size 52443
data/t_map.pkl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2738a5a8e98ca83c409251237cc338ed540c0ea58779bf23ea59255fa88b42d5
3
- size 7749840
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe78308ac5e9015acb93e46876d0cf439c7a2ad2f3d41e2f4cbedd649c67ad99
3
+ size 7837580
data/tools.parquet CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b72e906e6ba9e73fc39bd46eea8a17f00cfd15ecca8971bf35bc1c86c837bd99
3
- size 1713482531
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:12204e9ffd2d16817c86a7318a80e1e649be35dd9d127b448bd1c518712d67c1
3
+ size 1735733118
test.ipynb CHANGED
The diff for this file is too large to render. See raw diff