davanstrien HF staff commited on
Commit
3637239
1 Parent(s): 1a0f1cd

Add BERTopic model

Browse files
Files changed (4) hide show
  1. README.md +319 -0
  2. config.json +14 -0
  3. topic_embeddings.safetensors +3 -0
  4. topics.json +0 -0
README.md ADDED
@@ -0,0 +1,319 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # label_model
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("davanstrien/label_model")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 252
34
+ * Number of training documents: 14986
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | date - city - pre - heavy - fur | 5 | -1_date_city_pre_heavy |
42
+ | 0 | label_1 label_2 - label_0 label_1 label_2 - label_0 label_1 - label_1 - label_2 | 1333 | 0_label_1 label_2_label_0 label_1 label_2_label_0 label_1_label_1 |
43
+ | 1 | label_1 label_2 label_3 - label_3 label_4 label_5 - label_4 label_5 - label_2 label_3 label_4 - label_5 | 1043 | 1_label_1 label_2 label_3_label_3 label_4 label_5_label_4 label_5_label_2 label_3 label_4 |
44
+ | 2 | negative positive - positive negative - negative - positive - target | 803 | 2_negative positive_positive negative_negative_positive |
45
+ | 3 | loc misc org - loc misc - misc org - misc - org loc | 651 | 3_loc misc org_loc misc_misc org_misc |
46
+ | 4 | neutral positive - neutral - positive negative - negative - positive | 479 | 4_neutral positive_neutral_positive negative_negative |
47
+ | 5 | label_0 - - - - | 357 | 5_label_0___ |
48
+ | 6 | contradiction - entailment - neutral - ambiguous - | 348 | 6_contradiction_entailment_neutral_ambiguous |
49
+ | 7 | label_0 - - - - | 334 | 7_label_0___ |
50
+ | 8 | 99 - - - - | 326 | 8_99___ |
51
+ | 9 | label_1 label_2 label_3 - label_2 label_3 label_4 - label_3 label_4 - label_2 label_3 - label_4 | 300 | 9_label_1 label_2 label_3_label_2 label_3 label_4_label_3 label_4_label_2 label_3 |
52
+ | 10 | entailment - true - child - related - non | 257 | 10_entailment_true_child_related |
53
+ | 11 | snake - dog - bear - wolf - sea | 245 | 11_snake_dog_bear_wolf |
54
+ | 12 | label_5 label_6 label_7 - label_6 label_7 - label_4 label_5 label_6 - label_5 label_6 - label_6 label_7 label_8 | 241 | 12_label_5 label_6 label_7_label_6 label_7_label_4 label_5 label_6_label_5 label_6 |
55
+ | 13 | loc misc org - loc misc - misc org - misc - org loc | 229 | 13_loc misc org_loc misc_misc org_misc |
56
+ | 14 | weather - transfer - alarm - text - time | 228 | 14_weather_transfer_alarm_text |
57
+ | 15 | label_1 label_2 label_3 - label_2 label_3 - label_3 - label_1 label_2 - label_0 label_1 label_2 | 222 | 15_label_1 label_2 label_3_label_2 label_3_label_3_label_1 label_2 |
58
+ | 16 | delete - different - bad - related - rel | 207 | 16_delete_different_bad_related |
59
+ | 17 | label_12 label_13 label_14 - label_11 label_12 label_13 - label_13 label_14 - label_12 label_13 - label_10 label_11 label_12 | 172 | 17_label_12 label_13 label_14_label_11 label_12 label_13_label_13 label_14_label_12 label_13 |
60
+ | 18 | - - - - | 166 | 18____ |
61
+ | 19 | loc org loc - loc org - org loc - org - loc | 142 | 19_loc org loc_loc org_org loc_org |
62
+ | 20 | label_6 label_60 label_61 - label_60 label_61 - label_62 label_63 - label_61 label_62 label_63 - label_61 label_62 | 126 | 20_label_6 label_60 label_61_label_60 label_61_label_62 label_63_label_61 label_62 label_63 |
63
+ | 21 | label_4 label_5 label_6 - label_5 label_6 - label_6 - label_1 label_2 label_3 - label_3 label_4 label_5 | 117 | 21_label_4 label_5 label_6_label_5 label_6_label_6_label_1 label_2 label_3 |
64
+ | 22 | test - second - - - | 106 | 22_test_second__ |
65
+ | 23 | forest - industrial - transport - low - bamboo | 104 | 23_forest_industrial_transport_low |
66
+ | 24 | answer - header - question - quantity - | 104 | 24_answer_header_question_quantity |
67
+ | 25 | healthy - leaf - rust - plant - spot | 103 | 25_healthy_leaf_rust_plant |
68
+ | 26 | left - right - stop - yes - unknown | 100 | 26_left_right_stop_yes |
69
+ | 27 | en - na - alpha - fan - lifestyle | 93 | 27_en_na_alpha_fan |
70
+ | 28 | label_13 label_14 label_15 - label_14 label_15 - label_15 - label_12 label_13 label_14 - label_11 label_12 label_13 | 92 | 28_label_13 label_14 label_15_label_14 label_15_label_15_label_12 label_13 label_14 |
71
+ | 29 | disease - bio - disorder - healthy - | 86 | 29_disease_bio_disorder_healthy |
72
+ | 30 | work - group - person product - product - location | 86 | 30_work_group_person product_product |
73
+ | 31 | fear joy - sadness surprise - anger fear - joy love - surprise | 82 | 31_fear joy_sadness surprise_anger fear_joy love |
74
+ | 32 | common - non - different - - | 78 | 32_common_non_different_ |
75
+ | 33 | dis - - - - | 76 | 33_dis___ |
76
+ | 34 | - - - - | 73 | 34____ |
77
+ | 35 | restaurant - pizza - place - salad - food | 69 | 35_restaurant_pizza_place_salad |
78
+ | 36 | cconj det intj - adj adp adv - det intj noun - det intj - noun num pron | 66 | 36_cconj det intj_adj adp adv_det intj noun_det intj |
79
+ | 37 | label_17 label_18 label_19 - label_18 label_19 label_2 - label_18 label_19 - label_19 label_2 - label_16 label_17 label_18 | 66 | 37_label_17 label_18 label_19_label_18 label_19 label_2_label_18 label_19_label_19 label_2 |
80
+ | 38 | ll - year - related - cause - delete | 65 | 38_ll_year_related_cause |
81
+ | 39 | anger fear - joy love - surprise - joy - love | 64 | 39_anger fear_joy love_surprise_joy |
82
+ | 40 | true - news - partial - - | 64 | 40_true_news_partial_ |
83
+ | 41 | - - - - | 63 | 41____ |
84
+ | 42 | label_1 label_10 label_11 - label_10 label_11 - label_8 label_9 label_0 - label_7 label_8 label_9 - label_8 label_9 | 62 | 42_label_1 label_10 label_11_label_10 label_11_label_8 label_9 label_0_label_7 label_8 label_9 |
85
+ | 43 | pos - neg - - - | 62 | 43_pos_neg__ |
86
+ | 44 | loc org - org - loc - date - sex | 61 | 44_loc org_org_loc_date |
87
+ | 45 | label_19 label_2 label_20 - label_2 label_20 - label_20 - label_18 label_19 label_2 - label_18 label_19 | 60 | 45_label_19 label_2 label_20_label_2 label_20_label_20_label_18 label_19 label_2 |
88
+ | 46 | event - group - person product - product - location | 57 | 46_event_group_person product_product |
89
+ | 47 | bio - chemical - disease - effect - food | 57 | 47_bio_chemical_disease_effect |
90
+ | 48 | 234 - 19 20 21 - 20 21 22 - 22 23 24 - 23 24 | 57 | 48_234_19 20 21_20 21 22_22 23 24 |
91
+ | 49 | fear happy neutral - happy neutral - fear happy - sad - happy | 53 | 49_fear happy neutral_happy neutral_fear happy_sad |
92
+ | 50 | battery - volume - juice - chinese - korean | 53 | 50_battery_volume_juice_chinese |
93
+ | 51 | menu - price - num - - | 52 | 51_menu_price_num_ |
94
+ | 52 | poor - ok - good - bad - great | 52 | 52_poor_ok_good_bad |
95
+ | 53 | ll - cause - delete - unknown - | 51 | 53_ll_cause_delete_unknown |
96
+ | 54 | hospital - unknown - en - material - digital | 48 | 54_hospital_unknown_en_material |
97
+ | 55 | ll - cause - delete - unknown - | 48 | 55_ll_cause_delete_unknown |
98
+ | 56 | self - question - neutral - yes - statement | 48 | 56_self_question_neutral_yes |
99
+ | 57 | fat - loose - small - sugar - common | 47 | 57_fat_loose_small_sugar |
100
+ | 58 | true - - - - | 47 | 58_true___ |
101
+ | 59 | cream - drinks - seafood - fruit - ice cream | 46 | 59_cream_drinks_seafood_fruit |
102
+ | 60 | tr - ru - pers - pt - prod | 46 | 60_tr_ru_pers_pt |
103
+ | 61 | - - - - | 45 | 61____ |
104
+ | 62 | clothing - care - kitchen - personal - health | 44 | 62_clothing_care_kitchen_personal |
105
+ | 63 | business - news - tech - entertainment - sport | 43 | 63_business_news_tech_entertainment |
106
+ | 64 | non - partial - neutral - yes - ok | 43 | 64_non_partial_neutral_yes |
107
+ | 65 | organization person - location organization - organization - location - person | 43 | 65_organization person_location organization_organization_location |
108
+ | 66 | daisy - tulip - rose - - | 43 | 66_daisy_tulip_rose_ |
109
+ | 67 | joy - sadness - anger - angry - happy | 42 | 67_joy_sadness_anger_angry |
110
+ | 68 | samoyed - corgi - husky - pomeranian - golden | 41 | 68_samoyed_corgi_husky_pomeranian |
111
+ | 69 | music - instrument - engine - wind - animals | 41 | 69_music_instrument_engine_wind |
112
+ | 70 | hate - language - reporting - non - normal | 41 | 70_hate_language_reporting_non |
113
+ | 71 | label_23 label_24 label_25 - label_24 label_25 - label_22 label_23 label_24 - label_23 label_24 - label_21 label_22 label_23 | 41 | 71_label_23 label_24 label_25_label_24 label_25_label_22 label_23 label_24_label_23 label_24 |
114
+ | 72 | id - - - - | 40 | 72_id___ |
115
+ | 73 | animals - tech - dance - tiger - sport | 40 | 73_animals_tech_dance_tiger |
116
+ | 74 | org org - loc loc - org - misc - loc | 40 | 74_org org_loc loc_org_misc |
117
+ | 75 | star - positive - negative - negative positive - | 38 | 75_star_positive_negative_negative positive |
118
+ | 76 | bird - ship - frog - horse - truck | 37 | 76_bird_ship_frog_horse |
119
+ | 77 | cat - cats - dog - dogs - sleeping | 37 | 77_cat_cats_dog_dogs |
120
+ | 78 | family - sports - music - related - health | 37 | 78_family_sports_music_related |
121
+ | 79 | label_8 label_9 label_0 - label_9 label_0 label_1 - label_9 label_0 - label_7 label_8 label_9 - label_8 label_9 | 37 | 79_label_8 label_9 label_0_label_9 label_0 label_1_label_9 label_0_label_7 label_8 label_9 |
122
+ | 80 | room - service - transport - care - kitchen | 37 | 80_room_service_transport_care |
123
+ | 81 | positive - negative - neutral positive - neutral - positive negative | 37 | 81_positive_negative_neutral positive_neutral |
124
+ | 82 | test - play - train - non - live | 36 | 82_test_play_train_non |
125
+ | 83 | tim - evt - pro - gpe - org | 36 | 83_tim_evt_pro_gpe |
126
+ | 84 | cold - disease - pressure - drug - blood | 36 | 84_cold_disease_pressure_drug |
127
+ | 85 | non - early - late - - | 35 | 85_non_early_late_ |
128
+ | 86 | 21 - office - 20 - 17 - 16 | 34 | 86_21_office_20_17 |
129
+ | 87 | prep - nn - cc - pro - ex | 34 | 87_prep_nn_cc_pro |
130
+ | 88 | evidence - position - statement - lead - request | 33 | 88_evidence_position_statement_lead |
131
+ | 89 | adp - aux - sconj - cconj - det noun | 33 | 89_adp_aux_sconj_cconj |
132
+ | 90 | job - start - help - address - quantity | 33 | 90_job_start_help_address |
133
+ | 91 | gender - number - case - ind - person | 33 | 91_gender_number_case_ind |
134
+ | 92 | threat - hate - adult - target - male | 33 | 92_threat_hate_adult_target |
135
+ | 93 | institution - tools - organization - org - agent | 32 | 93_institution_tools_organization_org |
136
+ | 94 | - - - - | 32 | 94____ |
137
+ | 95 | email - age - patient - state - zip | 32 | 95_email_age_patient_state |
138
+ | 96 | mixed - positive - negative - neutral - neutral positive | 32 | 96_mixed_positive_negative_neutral |
139
+ | 97 | test - help - joke - contact - report | 32 | 97_test_help_joke_contact |
140
+ | 98 | address - balance - statement - request - second | 31 | 98_address_balance_statement_request |
141
+ | 99 | - - - - | 31 | 99____ |
142
+ | 100 | hate - non - neutral - - | 30 | 100_hate_non_neutral_ |
143
+ | 101 | - - - - | 30 | 101____ |
144
+ | 102 | unk - zero - seven - 10 - blank | 30 | 102_unk_zero_seven_10 |
145
+ | 103 | male - female - young - adult - skin | 30 | 103_male_female_young_adult |
146
+ | 104 | 94 - 59 60 - 49 50 - 81 - 97 | 29 | 104_94_59 60_49 50_81 |
147
+ | 105 | normal - cell - large - clean - lower | 29 | 105_normal_cell_large_clean |
148
+ | 106 | lincoln - jaguar - audio - source - general | 28 | 106_lincoln_jaguar_audio_source |
149
+ | 107 | title - section - header - list - item | 28 | 107_title_section_header_list |
150
+ | 108 | - - - - | 28 | 108____ |
151
+ | 109 | yes - - - - | 27 | 109_yes___ |
152
+ | 110 | - - - - | 26 | 110____ |
153
+ | 111 | contradiction - entailment - neutral - non - | 26 | 111_contradiction_entailment_neutral_non |
154
+ | 112 | instrument - org org - org org org - term - org | 26 | 112_instrument_org org_org org org_term |
155
+ | 113 | ft - cardinal - act - loc - loc loc | 25 | 113_ft_cardinal_act_loc |
156
+ | 114 | event - pro - pers - loc org - prod | 25 | 114_event_pro_pers_loc org |
157
+ | 115 | ben - ext - exp - root - loc | 25 | 115_ben_ext_exp_root |
158
+ | 116 | - - - - | 25 | 116____ |
159
+ | 117 | low - - - - | 25 | 117_low___ |
160
+ | 118 | ft - cardinal - act - loc - loc misc org | 25 | 118_ft_cardinal_act_loc |
161
+ | 119 | statement - question - evidence - experience - answer | 25 | 119_statement_question_evidence_experience |
162
+ | 120 | label_122 - label_121 - label_120 - label_123 - label_119 | 24 | 120_label_122_label_121_label_120_label_123 |
163
+ | 121 | clean - - - - | 24 | 121_clean___ |
164
+ | 122 | ru - tr - el - en - hi | 24 | 122_ru_tr_el_en |
165
+ | 123 | disgust - sadness surprise - joy love - surprise - joy | 24 | 123_disgust_sadness surprise_joy love_surprise |
166
+ | 124 | statement - info - check - news - non | 24 | 124_statement_info_check_news |
167
+ | 125 | motor - start - help - housing - yes | 24 | 125_motor_start_help_housing |
168
+ | 126 | greek - chinese - italian - japanese - dutch | 24 | 126_greek_chinese_italian_japanese |
169
+ | 127 | anger disgust - fear - disgust - sadness - anger | 23 | 127_anger disgust_fear_disgust_sadness |
170
+ | 128 | date event - percent person - quantity - money - percent | 23 | 128_date event_percent person_quantity_money |
171
+ | 129 | label_95 label_96 label_97 - label_97 label_98 label_99 - label_97 label_98 - label_94 label_95 label_96 - label_94 label_95 | 23 | 129_label_95 label_96 label_97_label_97 label_98 label_99_label_97 label_98_label_94 label_95 label_96 |
172
+ | 130 | period - question - noun - number - | 23 | 130_period_question_noun_number |
173
+ | 131 | neutral - - - - | 22 | 131_neutral___ |
174
+ | 132 | local - la - pad - data - personal | 22 | 132_local_la_pad_data |
175
+ | 133 | partial - - - - | 22 | 133_partial___ |
176
+ | 134 | human - art - machine - - | 22 | 134_human_art_machine_ |
177
+ | 135 | fear joy - sadness surprise - surprise - disgust fear - joy | 21 | 135_fear joy_sadness surprise_surprise_disgust fear |
178
+ | 136 | location organization - organization person - organization - price - disease | 21 | 136_location organization_organization person_organization_price |
179
+ | 137 | 14 15 16 - 12 13 14 - 13 14 15 - 11 12 13 - 10 11 12 | 21 | 137_14 15 16_12 13 14_13 14 15_11 12 13 |
180
+ | 138 | sports - tech - business - sport - | 21 | 138_sports_tech_business_sport |
181
+ | 139 | disorder - body - patient - age - disease | 20 | 139_disorder_body_patient_age |
182
+ | 140 | sad - dis - sur - joy - | 20 | 140_sad_dis_sur_joy |
183
+ | 141 | healthy - - - - | 20 | 141_healthy___ |
184
+ | 142 | drink - tea - wine - coffee - soft | 20 | 142_drink_tea_wine_coffee |
185
+ | 143 | protein - chemical - cell - - | 20 | 143_protein_chemical_cell_ |
186
+ | 144 | rna - - - - | 20 | 144_rna___ |
187
+ | 145 | normal - covid - - - | 20 | 145_normal_covid__ |
188
+ | 146 | ex - pt - - - | 20 | 146_ex_pt__ |
189
+ | 147 | ok - ft - year - int - rel | 20 | 147_ok_ft_year_int |
190
+ | 148 | header - currency - item - zip - state | 20 | 148_header_currency_item_zip |
191
+ | 149 | label_122 label_123 - label_123 - label_122 - label_121 - label_120 | 19 | 149_label_122 label_123_label_123_label_122_label_121 |
192
+ | 150 | anger disgust - anger disgust fear - disgust fear - disgust - sadness surprise | 19 | 150_anger disgust_anger disgust fear_disgust fear_disgust |
193
+ | 151 | na - nn - ft - dis - bio | 19 | 151_na_nn_ft_dis |
194
+ | 152 | angry - happy - sad - happy neutral - neutral | 19 | 152_angry_happy_sad_happy neutral |
195
+ | 153 | organization percent person - organization percent - miscellaneous - percent person - percent | 19 | 153_organization percent person_organization percent_miscellaneous_percent person |
196
+ | 154 | paper - metal - glass - tray - ticket | 19 | 154_paper_metal_glass_tray |
197
+ | 155 | mask - normal - sharp - head - green | 19 | 155_mask_normal_sharp_head |
198
+ | 156 | noun num pron - num pron propn - pron propn punct - num pron - adj adp adv | 18 | 156_noun num pron_num pron propn_pron propn punct_num pron |
199
+ | 157 | answer - - - - | 18 | 157_answer___ |
200
+ | 158 | review - id - job - email - state | 18 | 158_review_id_job_email |
201
+ | 159 | seven - queen - jack - king - war | 18 | 159_seven_queen_jack_king |
202
+ | 160 | neg - nan - good - - | 18 | 160_neg_nan_good_ |
203
+ | 161 | ii - blank - vi - et - lower | 18 | 161_ii_blank_vi_et |
204
+ | 162 | golden - husky - samoyed - pug - german | 17 | 162_golden_husky_samoyed_pug |
205
+ | 163 | arg - delete - act - neg - lead | 17 | 163_arg_delete_act_neg |
206
+ | 164 | exp - pp - intj - punc - prep | 17 | 164_exp_pp_intj_punc |
207
+ | 165 | email - form - letter - report - news | 17 | 165_email_form_letter_report |
208
+ | 166 | protein - rna - cell - line - type | 17 | 166_protein_rna_cell_line |
209
+ | 167 | en - hi - fur - - | 17 | 167_en_hi_fur_ |
210
+ | 168 | - - - - | 17 | 168____ |
211
+ | 169 | - - - - | 17 | 169____ |
212
+ | 170 | loc loc - loc - pers - evt - | 16 | 170_loc loc_loc_pers_evt |
213
+ | 171 | menu - - - - | 16 | 171_menu___ |
214
+ | 172 | normal - - - - | 16 | 172_normal___ |
215
+ | 173 | label_122 label_123 - label_97 label_98 label_99 - label_97 label_98 - label_96 label_97 label_98 - label_98 label_99 | 16 | 173_label_122 label_123_label_97 label_98 label_99_label_97 label_98_label_96 label_97 label_98 |
216
+ | 174 | cell - organ - organism - tissue - disease | 16 | 174_cell_organ_organism_tissue |
217
+ | 175 | target - instrument - opinion - price - product | 16 | 175_target_instrument_opinion_price |
218
+ | 176 | org org - org org org - loc loc - org - prs | 16 | 176_org org_org org org_loc loc_org |
219
+ | 177 | 10 11 - 10 11 12 - 11 12 - 12 - 11 | 16 | 177_10 11_10 11 12_11 12_12 |
220
+ | 178 | korean - russian - dutch - persian - french | 16 | 178_korean_russian_dutch_persian |
221
+ | 179 | label_4 label_40 label_41 - label_39 label_4 label_40 - label_38 label_39 label_4 - label_37 label_38 label_39 - label_40 label_41 | 16 | 179_label_4 label_40 label_41_label_39 label_4 label_40_label_38 label_39 label_4_label_37 label_38 label_39 |
222
+ | 180 | experience - location - loc misc org - loc misc - misc org | 15 | 180_experience_location_loc misc org_loc misc |
223
+ | 181 | normal - pressure - high - water - | 15 | 181_normal_pressure_high_water |
224
+ | 182 | company - institution - loc org - degree - org | 15 | 182_company_institution_loc org_degree |
225
+ | 183 | short - sl - long - - | 15 | 183_short_sl_long_ |
226
+ | 184 | good - bad - non - - | 15 | 184_good_bad_non_ |
227
+ | 185 | 149 - 151 - 191 - 199 - 231 | 15 | 185_149_151_191_199 |
228
+ | 186 | unknown - vi - ii - - | 15 | 186_unknown_vi_ii_ |
229
+ | 187 | end - head - cross - - | 15 | 187_end_head_cross_ |
230
+ | 188 | forest - street - road - tree - mountain | 15 | 188_forest_street_road_tree |
231
+ | 189 | label_7 label_8 label_9 - label_8 label_9 - label_0 label_1 label_10 - label_1 label_10 - label_10 | 14 | 189_label_7 label_8 label_9_label_8 label_9_label_0 label_1 label_10_label_1 label_10 |
232
+ | 190 | prod - loc - evt - org org - loc loc | 14 | 190_prod_loc_evt_org org |
233
+ | 191 | tech - business - sports - science - female | 14 | 191_tech_business_sports_science |
234
+ | 192 | adult - child - young - - | 14 | 192_adult_child_young_ |
235
+ | 193 | human - organism - plants - - | 14 | 193_human_organism_plants_ |
236
+ | 194 | hot dog - chicken - hot - food - dog | 14 | 194_hot dog_chicken_hot_food |
237
+ | 195 | rain - snow - - - | 14 | 195_rain_snow__ |
238
+ | 196 | objective - neutral - - - | 14 | 196_objective_neutral__ |
239
+ | 197 | pro - neutral - russian - attack - | 14 | 197_pro_neutral_russian_attack |
240
+ | 198 | normal - disorder - good - - | 14 | 198_normal_disorder_good_ |
241
+ | 199 | road - good - bike - - | 14 | 199_road_good_bike_ |
242
+ | 200 | - - - - | 14 | 200____ |
243
+ | 201 | science - energy - arts - nuclear - systems | 13 | 201_science_energy_arts_nuclear |
244
+ | 202 | - - - - | 13 | 202____ |
245
+ | 203 | event - ticket - ok - loose - non | 13 | 203_event_ticket_ok_loose |
246
+ | 204 | neutral - left - right - unknown - | 13 | 204_neutral_left_right_unknown |
247
+ | 205 | - - - - | 13 | 205____ |
248
+ | 206 | crime - pers - time - book - org | 13 | 206_crime_pers_time_book |
249
+ | 207 | seven - start - record - zero - open | 13 | 207_seven_start_record_zero |
250
+ | 208 | label_5 label_50 label_51 - label_50 label_51 label_52 - label_51 label_52 label_53 - label_51 label_52 - label_50 label_51 | 13 | 208_label_5 label_50 label_51_label_50 label_51 label_52_label_51 label_52 label_53_label_51 label_52 |
251
+ | 209 | label_29 label_3 label_30 - label_26 label_27 label_28 - label_27 label_28 label_29 - label_27 label_28 - label_28 label_29 label_3 | 13 | 209_label_29 label_3 label_30_label_26 label_27 label_28_label_27 label_28 label_29_label_27 label_28 |
252
+ | 210 | human - machine - - - | 13 | 210_human_machine__ |
253
+ | 211 | control - la - sin - social - ambient | 13 | 211_control_la_sin_social |
254
+ | 212 | anger fear - sadness - anger - fear - fear joy | 13 | 212_anger fear_sadness_anger_fear |
255
+ | 213 | panda - ticket - air - bamboo - el | 13 | 213_panda_ticket_air_bamboo |
256
+ | 214 | target - - - - | 13 | 214_target___ |
257
+ | 215 | id - container - type - person - number | 12 | 215_id_container_type_person |
258
+ | 216 | neutral - positive - negative - neutral positive - positive negative | 12 | 216_neutral_positive_negative_neutral positive |
259
+ | 217 | change - bad - movement - work - science | 12 | 217_change_bad_movement_work |
260
+ | 218 | rust - - - - | 12 | 218_rust___ |
261
+ | 219 | quantity - container - package - id - weight | 12 | 219_quantity_container_package_id |
262
+ | 220 | text - - - - | 12 | 220_text___ |
263
+ | 221 | background - objective - - - | 12 | 221_background_objective__ |
264
+ | 222 | middle - subject - yes - request - answer | 12 | 222_middle_subject_yes_request |
265
+ | 223 | - - - - | 12 | 223____ |
266
+ | 224 | public - ambiguous - non - person - | 12 | 224_public_ambiguous_non_person |
267
+ | 225 | healthy - plant - pepper - spot - leaf | 12 | 225_healthy_plant_pepper_spot |
268
+ | 226 | punc - prep - digit - latin - conj | 12 | 226_punc_prep_digit_latin |
269
+ | 227 | location money - language - percent person - actor - money | 12 | 227_location money_language_percent person_actor |
270
+ | 228 | - - - - | 11 | 228____ |
271
+ | 229 | punc - zero - pers - neg - reflex | 11 | 229_punc_zero_pers_neg |
272
+ | 230 | album - major - copper - coon - common | 11 | 230_album_major_copper_coon |
273
+ | 231 | metal - pop - country - dance - hip | 11 | 231_metal_pop_country_dance |
274
+ | 232 | energy - common - grass - persian - removal | 11 | 232_energy_common_grass_persian |
275
+ | 233 | man - double - bird - long - single | 11 | 233_man_double_bird_long |
276
+ | 234 | 17 - 16 - 18 - 13 - 15 | 11 | 234_17_16_18_13 |
277
+ | 235 | email - actor - threat - tools - attack | 11 | 235_email_actor_threat_tools |
278
+ | 236 | space - - - - | 11 | 236_space___ |
279
+ | 237 | type - country - jeep - van - lincoln | 11 | 237_type_country_jeep_van |
280
+ | 238 | general - - - - | 10 | 238_general___ |
281
+ | 239 | ru - mat - - - | 10 | 239_ru_mat__ |
282
+ | 240 | contradiction - non - entailment - neutral - | 10 | 240_contradiction_non_entailment_neutral |
283
+ | 241 | city - new - country - location - label_1 | 10 | 241_city_new_country_location |
284
+ | 242 | non - legal - sub - - | 9 | 242_non_legal_sub_ |
285
+ | 243 | tulip - cattle - motorcycle - road - color | 8 | 243_tulip_cattle_motorcycle_road |
286
+ | 244 | item - color - cc - model - | 8 | 244_item_color_cc_model |
287
+ | 245 | delivery - product - service - different - environment | 7 | 245_delivery_product_service_different |
288
+ | 246 | degree - tim - neg - pos - propn | 6 | 246_degree_tim_neg_pos |
289
+ | 247 | threat - hate - non - unknown - neutral | 6 | 247_threat_hate_non_unknown |
290
+ | 248 | label_33 label_34 - label_32 label_33 label_34 - label_32 label_33 - label_31 label_32 label_33 - label_31 label_32 | 6 | 248_label_33 label_34_label_32 label_33 label_34_label_32 label_33_label_31 label_32 label_33 |
291
+ | 249 | experience - location - - - | 6 | 249_experience_location__ |
292
+ | 250 | nat - gpe - geo - pro - tim | 5 | 250_nat_gpe_geo_pro |
293
+
294
+ </details>
295
+
296
+ ## Training hyperparameters
297
+
298
+ * calculate_probabilities: False
299
+ * language: None
300
+ * low_memory: False
301
+ * min_topic_size: 10
302
+ * n_gram_range: (1, 1)
303
+ * nr_topics: None
304
+ * seed_topic_list: None
305
+ * top_n_words: 10
306
+ * verbose: True
307
+
308
+ ## Framework versions
309
+
310
+ * Numpy: 1.22.4
311
+ * HDBSCAN: 0.8.29
312
+ * UMAP: 0.5.3
313
+ * Pandas: 1.5.3
314
+ * Scikit-Learn: 1.2.2
315
+ * Sentence-transformers: 2.2.2
316
+ * Transformers: 4.29.2
317
+ * Numba: 0.56.4
318
+ * Plotly: 5.13.1
319
+ * Python: 3.10.11
config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": false,
3
+ "language": null,
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 1
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": true
14
+ }
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a37ff8eb1f305336409db7b4776d9b5417dc6e510f420f25f3fbde37f502dc9d
3
+ size 774232
topics.json ADDED
The diff for this file is too large to render. See raw diff