davanstrien HF staff commited on
Commit
4dd3b94
1 Parent(s): 0cf3439

Add BERTopic model

Browse files
Files changed (4) hide show
  1. README.md +314 -0
  2. config.json +14 -0
  3. topic_embeddings.safetensors +3 -0
  4. topics.json +0 -0
README.md ADDED
@@ -0,0 +1,314 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # label_model_merged
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("davanstrien/label_model_merged")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 247
34
+ * Number of training documents: 14986
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | pre - roll - heavy - farm - health | 5 | -1_pre_roll_heavy_farm |
42
+ | 0 | label_1 label_2 - label_0 label_1 label_2 - label_1 - label_0 label_1 - label_2 | 1386 | 0_label_1 label_2_label_0 label_1 label_2_label_1_label_0 label_1 |
43
+ | 1 | label_1 label_2 label_3 - label_3 label_4 label_5 - label_4 label_5 - label_2 label_3 label_4 - label_5 | 1042 | 1_label_1 label_2 label_3_label_3 label_4 label_5_label_4 label_5_label_2 label_3 label_4 |
44
+ | 2 | negative positive - positive negative - negative - positive - target | 803 | 2_negative positive_positive negative_negative_positive |
45
+ | 3 | loc misc org - misc org - loc misc - misc - org loc | 652 | 3_loc misc org_misc org_loc misc_misc |
46
+ | 4 | neutral positive - neutral - positive negative - negative - positive | 509 | 4_neutral positive_neutral_positive negative_negative |
47
+ | 5 | label_0 - country - city - label_1 - label_0 label_1 | 357 | 5_label_0_country_city_label_1 |
48
+ | 6 | contradiction - entailment - neutral - - | 351 | 6_contradiction_entailment_neutral_ |
49
+ | 7 | label_0 - positive - - - | 335 | 7_label_0_positive__ |
50
+ | 8 | 99 - - - - | 327 | 8_99___ |
51
+ | 9 | label_1 label_2 label_3 - label_2 label_3 label_4 - label_3 label_4 - label_2 label_3 - label_4 | 302 | 9_label_1 label_2 label_3_label_2 label_3 label_4_label_3 label_4_label_2 label_3 |
52
+ | 10 | entailment - true - child - related - non | 257 | 10_entailment_true_child_related |
53
+ | 11 | terrier - snake - dog - bear - wolf | 245 | 11_terrier_snake_dog_bear |
54
+ | 12 | loc misc org - loc misc - misc org - misc - org loc | 240 | 12_loc misc org_loc misc_misc org_misc |
55
+ | 13 | label_5 label_6 label_7 - label_6 label_7 - label_4 label_5 label_6 - label_5 label_6 - label_7 | 231 | 13_label_5 label_6 label_7_label_6 label_7_label_4 label_5 label_6_label_5 label_6 |
56
+ | 14 | calendar - greeting - weather - transfer - calculator | 229 | 14_calendar_greeting_weather_transfer |
57
+ | 15 | label_1 label_2 label_3 - label_2 label_3 - label_3 - label_1 label_2 - label_0 label_1 label_2 | 226 | 15_label_1 label_2 label_3_label_2 label_3_label_3_label_1 label_2 |
58
+ | 16 | delete - unrelated - bad - related - rel | 207 | 16_delete_unrelated_bad_related |
59
+ | 17 | label_12 label_13 label_14 - label_11 label_12 label_13 - label_13 label_14 - label_12 label_13 - label_10 label_11 label_12 | 172 | 17_label_12 label_13 label_14_label_11 label_12 label_13_label_13 label_14_label_12 label_13 |
60
+ | 18 | loc org - org loc - org - loc - loc loc | 166 | 18_loc org_org loc_org_loc |
61
+ | 19 | left - right - stop - yes - zero | 130 | 19_left_right_stop_yes |
62
+ | 20 | label_6 label_60 label_61 - label_60 label_61 - label_60 label_61 label_62 - label_62 label_63 - label_59 label_6 label_60 | 123 | 20_label_6 label_60 label_61_label_60 label_61_label_60 label_61 label_62_label_62 label_63 |
63
+ | 21 | unrelated - - - - | 117 | 21_unrelated___ |
64
+ | 22 | forest - industrial - river - transport - disaster | 110 | 22_forest_industrial_river_transport |
65
+ | 23 | label_4 label_5 label_6 - label_5 label_6 - label_6 - label_1 label_2 label_3 - label_3 label_4 label_5 | 107 | 23_label_4 label_5 label_6_label_5 label_6_label_6_label_1 label_2 label_3 |
66
+ | 24 | question - quantity - - - | 106 | 24_question_quantity__ |
67
+ | 25 | healthy - leaf - rust - plant - mildew | 103 | 25_healthy_leaf_rust_plant |
68
+ | 26 | disease - blood - bio - healthy - sexual | 100 | 26_disease_blood_bio_healthy |
69
+ | 27 | work - group - corporation - person product - product | 92 | 27_work_group_corporation_person product |
70
+ | 28 | surprise anger - sadness surprise - fear joy - anger fear - joy love | 80 | 28_surprise anger_sadness surprise_fear joy_anger fear |
71
+ | 29 | duplicate - common - non - - | 78 | 29_duplicate_common_non_ |
72
+ | 30 | steak - hamburger - restaurant - pizza - joint | 76 | 30_steak_hamburger_restaurant_pizza |
73
+ | 31 | room - service - transport - product - forest | 74 | 31_room_service_transport_product |
74
+ | 32 | dis - - - - | 74 | 32_dis___ |
75
+ | 33 | - - - - | 73 | 33____ |
76
+ | 34 | loc org - org - date - loc - set | 70 | 34_loc org_org_date_loc |
77
+ | 35 | label_17 label_18 label_19 - label_18 label_19 - label_18 label_19 label_2 - label_19 label_2 - label_16 label_17 label_18 | 70 | 35_label_17 label_18 label_19_label_18 label_19_label_18 label_19 label_2_label_19 label_2 |
78
+ | 36 | 03 - 02 - second - - | 65 | 36_03_02_second_ |
79
+ | 37 | anger fear - joy love - surprise - joy - love | 65 | 37_anger fear_joy love_surprise_joy |
80
+ | 38 | real - true - image - news - | 64 | 38_real_true_image_news |
81
+ | 39 | - - - - | 63 | 39____ |
82
+ | 40 | pos - neg - neu - - | 62 | 40_pos_neg_neu_ |
83
+ | 41 | 45 - 30 - 55 - 35 - 10 | 61 | 41_45_30_55_35 |
84
+ | 42 | ge - wifi - na - alpha - fan | 61 | 42_ge_wifi_na_alpha |
85
+ | 43 | label_1 label_10 label_11 - label_10 label_11 - label_8 label_9 label_0 - label_9 label_0 label_1 - label_9 label_0 | 61 | 43_label_1 label_10 label_11_label_10 label_11_label_8 label_9 label_0_label_9 label_0 label_1 |
86
+ | 44 | event - group - corporation - person product - product | 61 | 44_event_group_corporation_person product |
87
+ | 45 | label_19 label_2 label_20 - label_2 label_20 - label_20 - label_18 label_19 label_2 - label_18 label_19 | 60 | 45_label_19 label_2 label_20_label_2 label_20_label_20_label_18 label_19 label_2 |
88
+ | 46 | fear happy - sad - happy - disgust fear - angry | 58 | 46_fear happy_sad_happy_disgust fear |
89
+ | 47 | battery - volume - chinese - juice - socks | 58 | 47_battery_volume_chinese_juice |
90
+ | 48 | prep - nn - bio - cc - pro | 56 | 48_prep_nn_bio_cc |
91
+ | 49 | good - poor - ok - great - bad | 56 | 49_good_poor_ok_great |
92
+ | 50 | date - city - fur - day - ar | 54 | 50_date_city_fur_day |
93
+ | 51 | 15 - 18 19 20 - 19 20 - 17 18 19 - 18 19 | 54 | 51_15_18 19 20_19 20_17 18 19 |
94
+ | 52 | menu - price - num - - | 52 | 52_menu_price_num_ |
95
+ | 53 | common - fat - loose - small - sugar | 52 | 53_common_fat_loose_small |
96
+ | 54 | append_ - replace_ - append_ append_ - replace_ replace_ - append_ append_ append_ | 49 | 54_append__replace__append_ append__replace_ replace_ |
97
+ | 55 | append_ - replace_ - append_ append_ - replace_ replace_ - append_ append_ append_ | 48 | 55_append__replace__append_ append__replace_ replace_ |
98
+ | 56 | animals - flying - tech - dance - tiger | 48 | 56_animals_flying_tech_dance |
99
+ | 57 | self - question - neutral - yes - greeting | 47 | 57_self_question_neutral_yes |
100
+ | 58 | mt - cv - tr - tm - drug | 47 | 58_mt_cv_tr_tm |
101
+ | 59 | organization person - location organization - organization - location - person | 46 | 59_organization person_location organization_organization_location |
102
+ | 60 | - - - - | 45 | 60____ |
103
+ | 61 | joy - anger - sadness - sad - happy | 44 | 61_joy_anger_sadness_sad |
104
+ | 62 | daisy - tulip - rose - - | 43 | 62_daisy_tulip_rose_ |
105
+ | 63 | positive - negative - neutral - neutral positive - positive negative | 42 | 63_positive_negative_neutral_neutral positive |
106
+ | 64 | windows - pm - 21 - office - 20 | 42 | 64_windows_pm_21_office |
107
+ | 65 | label_14 label_15 - label_13 label_14 label_15 - label_15 - label_12 label_13 label_14 - label_11 label_12 label_13 | 42 | 65_label_14 label_15_label_13 label_14 label_15_label_15_label_12 label_13 label_14 |
108
+ | 66 | position - statement - lead - request - study | 42 | 66_position_statement_lead_request |
109
+ | 67 | business - news - entertainment - tech - sport | 41 | 67_business_news_entertainment_tech |
110
+ | 68 | hate - speech - language - reporting - non | 41 | 68_hate_speech_language_reporting |
111
+ | 69 | bd - nan - id - bg - | 41 | 69_bd_nan_id_bg |
112
+ | 70 | cream - burger - carrot - ice cream - salad | 41 | 70_cream_burger_carrot_ice cream |
113
+ | 71 | human - machine - ai - artificial - art | 40 | 71_human_machine_ai_artificial |
114
+ | 72 | open - high - tie - abstract - button | 40 | 72_open_high_tie_abstract |
115
+ | 73 | label_23 label_24 label_25 - label_24 label_25 - label_22 label_23 label_24 - label_23 label_24 - label_21 label_22 label_23 | 40 | 73_label_23 label_24 label_25_label_24 label_25_label_22 label_23 label_24_label_23 label_24 |
116
+ | 74 | label_8 label_9 label_0 - label_9 label_0 label_1 - label_9 label_0 - label_7 label_8 label_9 - label_8 label_9 | 39 | 74_label_8 label_9 label_0_label_9 label_0 label_1_label_9 label_0_label_7 label_8 label_9 |
117
+ | 75 | cat - dog - cats - dogs - drinking | 39 | 75_cat_dog_cats_dogs |
118
+ | 76 | org org - loc loc - org - misc - loc | 38 | 76_org org_loc loc_org_misc |
119
+ | 77 | airplane - deer - bird - ship - frog | 38 | 77_airplane_deer_bird_ship |
120
+ | 78 | label_32 label_33 label_34 - label_33 label_34 - label_31 label_32 label_33 - label_32 label_33 - label_30 label_31 label_32 | 38 | 78_label_32 label_33 label_34_label_33 label_34_label_31 label_32 label_33_label_32 label_33 |
121
+ | 79 | true - - - - | 38 | 79_true___ |
122
+ | 80 | family - sports - music - related - health | 38 | 80_family_sports_music_related |
123
+ | 81 | star - positive - negative - amazon - negative positive | 37 | 81_star_positive_negative_amazon |
124
+ | 82 | hospital - unknown - description - material - pad | 37 | 82_hospital_unknown_description_material |
125
+ | 83 | threat - hate - reward - quality - content | 36 | 83_threat_hate_reward_quality |
126
+ | 84 | music - speech - instrument - engine - wind | 35 | 84_music_speech_instrument_engine |
127
+ | 85 | closure - annual - statement - issues - reward | 35 | 85_closure_annual_statement_issues |
128
+ | 86 | adp - aux - sconj - pron - noun | 35 | 86_adp_aux_sconj_pron |
129
+ | 87 | experience - location - skill - address - result | 35 | 87_experience_location_skill_address |
130
+ | 88 | - - - - | 34 | 88____ |
131
+ | 89 | test - train - risk - non - high | 34 | 89_test_train_risk_non |
132
+ | 90 | samoyed - corgi - husky - golden retriever - golden | 34 | 90_samoyed_corgi_husky_golden retriever |
133
+ | 91 | unk - zero - 10 - 12 13 14 - 13 14 15 | 33 | 91_unk_zero_10_12 13 14 |
134
+ | 92 | non - neutral - ok - lead - | 33 | 92_non_neutral_ok_lead |
135
+ | 93 | normal - covid - virus - regular - disorder | 33 | 93_normal_covid_virus_regular |
136
+ | 94 | test - help - app - risk - joke | 32 | 94_test_help_app_risk |
137
+ | 95 | replace_ - append_ - replace_ replace_ - append_ append_ - replace_ replace_ replace_ | 32 | 95_replace__append__replace_ replace__append_ append_ |
138
+ | 96 | disease - issues - pressure - drug - blood | 31 | 96_disease_issues_pressure_drug |
139
+ | 97 | women - casual - sexual - individual - use | 31 | 97_women_casual_sexual_individual |
140
+ | 98 | address - balance - code - second - currency | 30 | 98_address_balance_code_second |
141
+ | 99 | hate - non - neutral - - | 30 | 99_hate_non_neutral_ |
142
+ | 100 | normal - cell - large - clean - healthy | 29 | 100_normal_cell_large_clean |
143
+ | 101 | neutral - se - - - | 29 | 101_neutral_se__ |
144
+ | 102 | male - female - hair - skin - men | 29 | 102_male_female_hair_skin |
145
+ | 103 | title - page - section - abstract - table | 28 | 103_title_page_section_abstract |
146
+ | 104 | number - gender - case - person - fin | 28 | 104_number_gender_case_person |
147
+ | 105 | man - bird - flower - long - double | 28 | 105_man_bird_flower_long |
148
+ | 106 | contradiction - entailment - neutral - - | 28 | 106_contradiction_entailment_neutral_ |
149
+ | 107 | non - - - - | 27 | 107_non___ |
150
+ | 108 | tim - fac - org - pro - loc | 27 | 108_tim_fac_org_pro |
151
+ | 109 | lincoln - jaguar - visual - audio - sony | 27 | 109_lincoln_jaguar_visual_audio |
152
+ | 110 | statement - info - check - ad - news | 27 | 110_statement_info_check_ad |
153
+ | 111 | ben - ext - root - exp - loc | 26 | 111_ben_ext_root_exp |
154
+ | 112 | yes - - - - | 26 | 112_yes___ |
155
+ | 113 | queen - jack - king - south - war | 26 | 113_queen_jack_king_south |
156
+ | 114 | - - - - | 26 | 114____ |
157
+ | 115 | ft - cardinal - act - loc - loc loc | 25 | 115_ft_cardinal_act_loc |
158
+ | 116 | bio - chemical - food - - | 25 | 116_bio_chemical_food_ |
159
+ | 117 | ft - cardinal - act - loc - loc misc org | 25 | 117_ft_cardinal_act_loc |
160
+ | 118 | metric - task - - - | 25 | 118_metric_task__ |
161
+ | 119 | email - age - patient - zip - organization | 25 | 119_email_age_patient_zip |
162
+ | 120 | ent - im - ru - mat - art | 25 | 120_ent_im_ru_mat |
163
+ | 121 | ex - pt - galaxy - moon - 8888 | 24 | 121_ex_pt_galaxy_moon |
164
+ | 122 | - - - - | 24 | 122____ |
165
+ | 123 | neu - sad - dis - joy - | 24 | 123_neu_sad_dis_joy |
166
+ | 124 | label_122 - label_121 - label_120 - label_123 - label_119 | 24 | 124_label_122_label_121_label_120_label_123 |
167
+ | 125 | mixed - positive - negative - neutral positive - neutral | 24 | 125_mixed_positive_negative_neutral positive |
168
+ | 126 | date event - percent person - quantity - money - percent | 24 | 126_date event_percent person_quantity_money |
169
+ | 127 | fear joy - sadness surprise - surprise - joy - sadness | 24 | 127_fear joy_sadness surprise_surprise_joy |
170
+ | 128 | disgust - sadness surprise - joy love - surprise - joy | 24 | 128_disgust_sadness surprise_joy love_surprise |
171
+ | 129 | magnet - motor - hello - undefined - start | 24 | 129_magnet_motor_hello_undefined |
172
+ | 130 | loc loc - loc - pers - hi - en | 24 | 130_loc loc_loc_pers_hi |
173
+ | 131 | event - pers - fac - pro - loc org | 24 | 131_event_pers_fac_pro |
174
+ | 132 | disorder - body - patient - age - disease | 23 | 132_disorder_body_patient_age |
175
+ | 133 | happiness - fear - anger disgust - disgust - sadness | 23 | 133_happiness_fear_anger disgust_disgust |
176
+ | 134 | control - la - social - sin - civil | 23 | 134_control_la_social_sin |
177
+ | 135 | label_98 label_99 - label_97 label_98 label_99 - label_97 label_98 - label_95 label_96 - label_96 label_97 label_98 | 23 | 135_label_98 label_99_label_97 label_98 label_99_label_97 label_98_label_95 label_96 |
178
+ | 136 | greek - chinese - italian - japanese - dutch | 23 | 136_greek_chinese_italian_japanese |
179
+ | 137 | clean - - - - | 23 | 137_clean___ |
180
+ | 138 | protein - chemical - cell - - | 22 | 138_protein_chemical_cell_ |
181
+ | 139 | treatment - disease - location organization - organization person - organization | 22 | 139_treatment_disease_location organization_organization person |
182
+ | 140 | institution - tools - org - loc - organization | 22 | 140_institution_tools_org_loc |
183
+ | 141 | statement - question - - - | 22 | 141_statement_question__ |
184
+ | 142 | period - question - noun - number - | 21 | 142_period_question_noun_number |
185
+ | 143 | regular - - - - | 21 | 143_regular___ |
186
+ | 144 | rna - - - - | 21 | 144_rna___ |
187
+ | 145 | rs - - - - | 21 | 145_rs___ |
188
+ | 146 | address - id - job - email - country | 21 | 146_address_id_job_email |
189
+ | 147 | neg - neu - good - - | 21 | 147_neg_neu_good_ |
190
+ | 148 | label_122 label_123 - label_123 - label_122 - label_121 - label_120 | 20 | 148_label_122 label_123_label_123_label_122_label_121 |
191
+ | 149 | drink - tea - wine - coffee - soft | 20 | 149_drink_tea_wine_coffee |
192
+ | 150 | miscellaneous - organization - percent - money - percent person | 20 | 150_miscellaneous_organization_percent_money |
193
+ | 151 | description - invoice - zip - state - city | 20 | 151_description_invoice_zip_state |
194
+ | 152 | sports - tech - business - sport - | 20 | 152_sports_tech_business_sport |
195
+ | 153 | ok - vin - rl - ft - year | 20 | 153_ok_vin_rl_ft |
196
+ | 154 | healthy - - - - | 20 | 154_healthy___ |
197
+ | 155 | association - event - ticket - disaster - map | 20 | 155_association_event_ticket_disaster |
198
+ | 156 | 10 11 - 10 11 12 - 11 12 - 11 - 11 12 13 | 19 | 156_10 11_10 11 12_11 12_11 |
199
+ | 157 | noun num pron - num pron propn - num pron - pron propn punct - pron propn | 19 | 157_noun num pron_num pron propn_num pron_pron propn punct |
200
+ | 158 | cell - organ - organism - multi - tissue | 18 | 158_cell_organ_organism_multi |
201
+ | 159 | 02 - ent - express - act - delete | 18 | 159_02_ent_express_act |
202
+ | 160 | sym verb adj - verb adj adp - intj noun num - det intj noun - adj adp adv | 18 | 160_sym verb adj_verb adj adp_intj noun num_det intj noun |
203
+ | 161 | 12 - - - - | 18 | 161_12___ |
204
+ | 162 | org org - org - drug - - | 18 | 162_org org_org_drug_ |
205
+ | 163 | short - long - sl - ac - pad | 18 | 163_short_long_sl_ac |
206
+ | 164 | plastic - paper - glass - metal - sheet | 18 | 164_plastic_paper_glass_metal |
207
+ | 165 | ii - blank - iii - vi - et | 17 | 165_ii_blank_iii_vi |
208
+ | 166 | normal - virus - desert - smoke - pressure | 17 | 166_normal_virus_desert_smoke |
209
+ | 167 | skill - skills - - - | 17 | 167_skill_skills__ |
210
+ | 168 | protein - rna - cell - line - type | 17 | 168_protein_rna_cell_line |
211
+ | 169 | korean - russian - dutch - french - thai | 17 | 169_korean_russian_dutch_french |
212
+ | 170 | rainbow - rain - snow - color - green | 17 | 170_rainbow_rain_snow_color |
213
+ | 171 | company - role - institution - skill - loc org | 16 | 171_company_role_institution_skill |
214
+ | 172 | exp - pp - intj - punc - prep | 16 | 172_exp_pp_intj_punc |
215
+ | 173 | key - menu - - - | 16 | 173_key_menu__ |
216
+ | 174 | adult - young - child - male - female | 16 | 174_adult_young_child_male |
217
+ | 175 | normal - - - - | 16 | 175_normal___ |
218
+ | 176 | mask - bright - sharp - head - normal | 16 | 176_mask_bright_sharp_head |
219
+ | 177 | anger disgust fear - anger disgust - disgust fear - disgust - surprise anger | 16 | 177_anger disgust fear_anger disgust_disgust fear_disgust |
220
+ | 178 | - - - - | 16 | 178____ |
221
+ | 179 | objective - non - neutral - - | 16 | 179_objective_non_neutral_ |
222
+ | 180 | cr - sd - db - - | 16 | 180_cr_sd_db_ |
223
+ | 181 | - - - - | 16 | 181____ |
224
+ | 182 | label_29 label_3 label_30 - label_27 label_28 label_29 - label_26 label_27 label_28 - label_28 label_29 label_3 - label_29 label_3 | 16 | 182_label_29 label_3 label_30_label_27 label_28 label_29_label_26 label_27 label_28_label_28 label_29 label_3 |
225
+ | 183 | test - - - - | 15 | 183_test___ |
226
+ | 184 | good - bad - non - - | 15 | 184_good_bad_non_ |
227
+ | 185 | local - por - art - da - em | 15 | 185_local_por_art_da |
228
+ | 186 | label_122 label_123 - label_97 label_98 label_99 - label_97 label_98 - label_96 label_97 label_98 - label_98 label_99 | 15 | 186_label_122 label_123_label_97 label_98 label_99_label_97 label_98_label_96 label_97 label_98 |
229
+ | 187 | prod - loc - evt - misc - org org | 15 | 187_prod_loc_evt_misc |
230
+ | 188 | invoice - email - form - letter - report | 15 | 188_invoice_email_form_letter |
231
+ | 189 | end - head - cross - - | 15 | 189_end_head_cross_ |
232
+ | 190 | target - instrument - opinion - question - price | 15 | 190_target_instrument_opinion_question |
233
+ | 191 | unrelated - support - - - | 15 | 191_unrelated_support__ |
234
+ | 192 | ru - pl - bg - en - es | 14 | 192_ru_pl_bg_en |
235
+ | 193 | road - good - bike - - | 14 | 193_road_good_bike_ |
236
+ | 194 | human - organism - plants - - | 14 | 194_human_organism_plants_ |
237
+ | 195 | label_7 label_8 label_9 - label_8 label_9 - label_0 label_1 label_10 - label_1 label_10 - label_10 | 14 | 195_label_7 label_8 label_9_label_8 label_9_label_0 label_1 label_10_label_1 label_10 |
238
+ | 196 | replace_ - append_ - replace_ replace_ - append_ append_ - replace_ replace_ replace_ | 13 | 196_replace__append__replace_ replace__append_ append_ |
239
+ | 197 | brand - company - tm - color - item | 13 | 197_brand_company_tm_color |
240
+ | 198 | pro - neutral - russian - support - attack | 13 | 198_pro_neutral_russian_support |
241
+ | 199 | 18 19 20 - 19 20 - 23 - 17 18 19 - 21 | 13 | 199_18 19 20_19 20_23_17 18 19 |
242
+ | 200 | crime - pers - time - book - day | 13 | 200_crime_pers_time_book |
243
+ | 201 | neutral - positive - negative - positive negative - neutral positive | 13 | 201_neutral_positive_negative_positive negative |
244
+ | 202 | - - - - | 13 | 202____ |
245
+ | 203 | chemical - disease - bio - - | 13 | 203_chemical_disease_bio_ |
246
+ | 204 | angry - happy - sad - neutral - 60 | 12 | 204_angry_happy_sad_neutral |
247
+ | 205 | organisation - task - country - location - product | 12 | 205_organisation_task_country_location |
248
+ | 206 | iv - iii - vi - ii - unknown | 12 | 206_iv_iii_vi_ii |
249
+ | 207 | neutral - risk - - - | 12 | 207_neutral_risk__ |
250
+ | 208 | container - id - type - person - number | 12 | 208_container_id_type_person |
251
+ | 209 | target - - - - | 12 | 209_target___ |
252
+ | 210 | pop - metal - country - song - rock | 12 | 210_pop_metal_country_song |
253
+ | 211 | email - os - language - method - function | 12 | 211_email_os_language_method |
254
+ | 212 | contradiction - non - entailment - - | 12 | 212_contradiction_non_entailment_ |
255
+ | 213 | background - objective - method - result - | 12 | 213_background_objective_method_result |
256
+ | 214 | convertible - cab - type - series - martin | 12 | 214_convertible_cab_type_series |
257
+ | 215 | public - smoking - drinking - ambiguous - non | 12 | 215_public_smoking_drinking_ambiguous |
258
+ | 216 | rust - - - - | 12 | 216_rust___ |
259
+ | 217 | persian - mr - man - flying - ghost | 12 | 217_persian_mr_man_flying |
260
+ | 218 | quote - yes - middle - request - | 12 | 218_quote_yes_middle_request |
261
+ | 219 | text - mixed - - - | 12 | 219_text_mixed__ |
262
+ | 220 | punc - prep - digit - latin - conj | 12 | 220_punc_prep_digit_latin |
263
+ | 221 | panda - air - mr - ticket - little | 12 | 221_panda_air_mr_ticket |
264
+ | 222 | - - - - | 12 | 222____ |
265
+ | 223 | sym verb adj - intj noun num - verb adj adp - cconj det intj - aux cconj det | 12 | 223_sym verb adj_intj noun num_verb adj adp_cconj det intj |
266
+ | 224 | healthy - tomato - plant - pepper - spot | 11 | 224_healthy_tomato_plant_pepper |
267
+ | 225 | sony - lg - tv - galaxy - monitor | 11 | 225_sony_lg_tv_galaxy |
268
+ | 226 | new - city - mid - location - south | 11 | 226_new_city_mid_location |
269
+ | 227 | space - - - - | 11 | 227_space___ |
270
+ | 228 | cloud - racing - motorcycle - boy - bus | 11 | 228_cloud_racing_motorcycle_boy |
271
+ | 229 | punc - zero - pers - neg - reflex | 11 | 229_punc_zero_pers_neg |
272
+ | 230 | energy - arts - high - systems - computer | 11 | 230_energy_arts_high_systems |
273
+ | 231 | dis - ad - media - site - plant | 11 | 231_dis_ad_media_site |
274
+ | 232 | world - tech - business - sports - female | 11 | 232_world_tech_business_sports |
275
+ | 233 | sadness - anger - anger fear - joy - fear | 10 | 233_sadness_anger_anger fear_joy |
276
+ | 234 | neg - adj - sym - propn - num | 10 | 234_neg_adj_sym_propn |
277
+ | 235 | bulldog - cat - husky - pug - corgi | 9 | 235_bulldog_cat_husky_pug |
278
+ | 236 | - - - - | 8 | 236____ |
279
+ | 237 | origin - quote - actor - opinion - language | 7 | 237_origin_quote_actor_opinion |
280
+ | 238 | na - nb - nc - neu - ng | 7 | 238_na_nb_nc_neu |
281
+ | 239 | - - - - | 7 | 239____ |
282
+ | 240 | ci - aa - joy - im - ip | 7 | 240_ci_aa_joy_im |
283
+ | 241 | - - - - | 6 | 241____ |
284
+ | 242 | skill - email - address - grade - language | 6 | 242_skill_email_address_grade |
285
+ | 243 | sexual - threat - christian - hate - male | 6 | 243_sexual_threat_christian_hate |
286
+ | 244 | transmission - wind - tower - pole - | 6 | 244_transmission_wind_tower_pole |
287
+ | 245 | label_14 label_15 - label_13 label_14 label_15 - label_15 - label_12 label_13 label_14 - label_11 label_12 label_13 | 6 | 245_label_14 label_15_label_13 label_14 label_15_label_15_label_12 label_13 label_14 |
288
+
289
+ </details>
290
+
291
+ ## Training hyperparameters
292
+
293
+ * calculate_probabilities: False
294
+ * language: None
295
+ * low_memory: False
296
+ * min_topic_size: 10
297
+ * n_gram_range: (1, 1)
298
+ * nr_topics: None
299
+ * seed_topic_list: None
300
+ * top_n_words: 10
301
+ * verbose: True
302
+
303
+ ## Framework versions
304
+
305
+ * Numpy: 1.22.4
306
+ * HDBSCAN: 0.8.29
307
+ * UMAP: 0.5.3
308
+ * Pandas: 1.5.3
309
+ * Scikit-Learn: 1.2.2
310
+ * Sentence-transformers: 2.2.2
311
+ * Transformers: 4.29.2
312
+ * Numba: 0.56.4
313
+ * Plotly: 5.13.1
314
+ * Python: 3.10.11
config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": false,
3
+ "language": null,
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 1
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": true
14
+ }
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6dfc68ba36ab2801761db7804d39995eb77b59dda172ae2398fe7b07821a22bf
3
+ size 758872
topics.json ADDED
The diff for this file is too large to render. See raw diff