Commit
•
4dd3b94
1
Parent(s):
0cf3439
Add BERTopic model
Browse files- README.md +314 -0
- config.json +14 -0
- topic_embeddings.safetensors +3 -0
- topics.json +0 -0
README.md
ADDED
@@ -0,0 +1,314 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
tags:
|
4 |
+
- bertopic
|
5 |
+
library_name: bertopic
|
6 |
+
pipeline_tag: text-classification
|
7 |
+
---
|
8 |
+
|
9 |
+
# label_model_merged
|
10 |
+
|
11 |
+
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
12 |
+
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
13 |
+
|
14 |
+
## Usage
|
15 |
+
|
16 |
+
To use this model, please install BERTopic:
|
17 |
+
|
18 |
+
```
|
19 |
+
pip install -U bertopic
|
20 |
+
```
|
21 |
+
|
22 |
+
You can use the model as follows:
|
23 |
+
|
24 |
+
```python
|
25 |
+
from bertopic import BERTopic
|
26 |
+
topic_model = BERTopic.load("davanstrien/label_model_merged")
|
27 |
+
|
28 |
+
topic_model.get_topic_info()
|
29 |
+
```
|
30 |
+
|
31 |
+
## Topic overview
|
32 |
+
|
33 |
+
* Number of topics: 247
|
34 |
+
* Number of training documents: 14986
|
35 |
+
|
36 |
+
<details>
|
37 |
+
<summary>Click here for an overview of all topics.</summary>
|
38 |
+
|
39 |
+
| Topic ID | Topic Keywords | Topic Frequency | Label |
|
40 |
+
|----------|----------------|-----------------|-------|
|
41 |
+
| -1 | pre - roll - heavy - farm - health | 5 | -1_pre_roll_heavy_farm |
|
42 |
+
| 0 | label_1 label_2 - label_0 label_1 label_2 - label_1 - label_0 label_1 - label_2 | 1386 | 0_label_1 label_2_label_0 label_1 label_2_label_1_label_0 label_1 |
|
43 |
+
| 1 | label_1 label_2 label_3 - label_3 label_4 label_5 - label_4 label_5 - label_2 label_3 label_4 - label_5 | 1042 | 1_label_1 label_2 label_3_label_3 label_4 label_5_label_4 label_5_label_2 label_3 label_4 |
|
44 |
+
| 2 | negative positive - positive negative - negative - positive - target | 803 | 2_negative positive_positive negative_negative_positive |
|
45 |
+
| 3 | loc misc org - misc org - loc misc - misc - org loc | 652 | 3_loc misc org_misc org_loc misc_misc |
|
46 |
+
| 4 | neutral positive - neutral - positive negative - negative - positive | 509 | 4_neutral positive_neutral_positive negative_negative |
|
47 |
+
| 5 | label_0 - country - city - label_1 - label_0 label_1 | 357 | 5_label_0_country_city_label_1 |
|
48 |
+
| 6 | contradiction - entailment - neutral - - | 351 | 6_contradiction_entailment_neutral_ |
|
49 |
+
| 7 | label_0 - positive - - - | 335 | 7_label_0_positive__ |
|
50 |
+
| 8 | 99 - - - - | 327 | 8_99___ |
|
51 |
+
| 9 | label_1 label_2 label_3 - label_2 label_3 label_4 - label_3 label_4 - label_2 label_3 - label_4 | 302 | 9_label_1 label_2 label_3_label_2 label_3 label_4_label_3 label_4_label_2 label_3 |
|
52 |
+
| 10 | entailment - true - child - related - non | 257 | 10_entailment_true_child_related |
|
53 |
+
| 11 | terrier - snake - dog - bear - wolf | 245 | 11_terrier_snake_dog_bear |
|
54 |
+
| 12 | loc misc org - loc misc - misc org - misc - org loc | 240 | 12_loc misc org_loc misc_misc org_misc |
|
55 |
+
| 13 | label_5 label_6 label_7 - label_6 label_7 - label_4 label_5 label_6 - label_5 label_6 - label_7 | 231 | 13_label_5 label_6 label_7_label_6 label_7_label_4 label_5 label_6_label_5 label_6 |
|
56 |
+
| 14 | calendar - greeting - weather - transfer - calculator | 229 | 14_calendar_greeting_weather_transfer |
|
57 |
+
| 15 | label_1 label_2 label_3 - label_2 label_3 - label_3 - label_1 label_2 - label_0 label_1 label_2 | 226 | 15_label_1 label_2 label_3_label_2 label_3_label_3_label_1 label_2 |
|
58 |
+
| 16 | delete - unrelated - bad - related - rel | 207 | 16_delete_unrelated_bad_related |
|
59 |
+
| 17 | label_12 label_13 label_14 - label_11 label_12 label_13 - label_13 label_14 - label_12 label_13 - label_10 label_11 label_12 | 172 | 17_label_12 label_13 label_14_label_11 label_12 label_13_label_13 label_14_label_12 label_13 |
|
60 |
+
| 18 | loc org - org loc - org - loc - loc loc | 166 | 18_loc org_org loc_org_loc |
|
61 |
+
| 19 | left - right - stop - yes - zero | 130 | 19_left_right_stop_yes |
|
62 |
+
| 20 | label_6 label_60 label_61 - label_60 label_61 - label_60 label_61 label_62 - label_62 label_63 - label_59 label_6 label_60 | 123 | 20_label_6 label_60 label_61_label_60 label_61_label_60 label_61 label_62_label_62 label_63 |
|
63 |
+
| 21 | unrelated - - - - | 117 | 21_unrelated___ |
|
64 |
+
| 22 | forest - industrial - river - transport - disaster | 110 | 22_forest_industrial_river_transport |
|
65 |
+
| 23 | label_4 label_5 label_6 - label_5 label_6 - label_6 - label_1 label_2 label_3 - label_3 label_4 label_5 | 107 | 23_label_4 label_5 label_6_label_5 label_6_label_6_label_1 label_2 label_3 |
|
66 |
+
| 24 | question - quantity - - - | 106 | 24_question_quantity__ |
|
67 |
+
| 25 | healthy - leaf - rust - plant - mildew | 103 | 25_healthy_leaf_rust_plant |
|
68 |
+
| 26 | disease - blood - bio - healthy - sexual | 100 | 26_disease_blood_bio_healthy |
|
69 |
+
| 27 | work - group - corporation - person product - product | 92 | 27_work_group_corporation_person product |
|
70 |
+
| 28 | surprise anger - sadness surprise - fear joy - anger fear - joy love | 80 | 28_surprise anger_sadness surprise_fear joy_anger fear |
|
71 |
+
| 29 | duplicate - common - non - - | 78 | 29_duplicate_common_non_ |
|
72 |
+
| 30 | steak - hamburger - restaurant - pizza - joint | 76 | 30_steak_hamburger_restaurant_pizza |
|
73 |
+
| 31 | room - service - transport - product - forest | 74 | 31_room_service_transport_product |
|
74 |
+
| 32 | dis - - - - | 74 | 32_dis___ |
|
75 |
+
| 33 | - - - - | 73 | 33____ |
|
76 |
+
| 34 | loc org - org - date - loc - set | 70 | 34_loc org_org_date_loc |
|
77 |
+
| 35 | label_17 label_18 label_19 - label_18 label_19 - label_18 label_19 label_2 - label_19 label_2 - label_16 label_17 label_18 | 70 | 35_label_17 label_18 label_19_label_18 label_19_label_18 label_19 label_2_label_19 label_2 |
|
78 |
+
| 36 | 03 - 02 - second - - | 65 | 36_03_02_second_ |
|
79 |
+
| 37 | anger fear - joy love - surprise - joy - love | 65 | 37_anger fear_joy love_surprise_joy |
|
80 |
+
| 38 | real - true - image - news - | 64 | 38_real_true_image_news |
|
81 |
+
| 39 | - - - - | 63 | 39____ |
|
82 |
+
| 40 | pos - neg - neu - - | 62 | 40_pos_neg_neu_ |
|
83 |
+
| 41 | 45 - 30 - 55 - 35 - 10 | 61 | 41_45_30_55_35 |
|
84 |
+
| 42 | ge - wifi - na - alpha - fan | 61 | 42_ge_wifi_na_alpha |
|
85 |
+
| 43 | label_1 label_10 label_11 - label_10 label_11 - label_8 label_9 label_0 - label_9 label_0 label_1 - label_9 label_0 | 61 | 43_label_1 label_10 label_11_label_10 label_11_label_8 label_9 label_0_label_9 label_0 label_1 |
|
86 |
+
| 44 | event - group - corporation - person product - product | 61 | 44_event_group_corporation_person product |
|
87 |
+
| 45 | label_19 label_2 label_20 - label_2 label_20 - label_20 - label_18 label_19 label_2 - label_18 label_19 | 60 | 45_label_19 label_2 label_20_label_2 label_20_label_20_label_18 label_19 label_2 |
|
88 |
+
| 46 | fear happy - sad - happy - disgust fear - angry | 58 | 46_fear happy_sad_happy_disgust fear |
|
89 |
+
| 47 | battery - volume - chinese - juice - socks | 58 | 47_battery_volume_chinese_juice |
|
90 |
+
| 48 | prep - nn - bio - cc - pro | 56 | 48_prep_nn_bio_cc |
|
91 |
+
| 49 | good - poor - ok - great - bad | 56 | 49_good_poor_ok_great |
|
92 |
+
| 50 | date - city - fur - day - ar | 54 | 50_date_city_fur_day |
|
93 |
+
| 51 | 15 - 18 19 20 - 19 20 - 17 18 19 - 18 19 | 54 | 51_15_18 19 20_19 20_17 18 19 |
|
94 |
+
| 52 | menu - price - num - - | 52 | 52_menu_price_num_ |
|
95 |
+
| 53 | common - fat - loose - small - sugar | 52 | 53_common_fat_loose_small |
|
96 |
+
| 54 | append_ - replace_ - append_ append_ - replace_ replace_ - append_ append_ append_ | 49 | 54_append__replace__append_ append__replace_ replace_ |
|
97 |
+
| 55 | append_ - replace_ - append_ append_ - replace_ replace_ - append_ append_ append_ | 48 | 55_append__replace__append_ append__replace_ replace_ |
|
98 |
+
| 56 | animals - flying - tech - dance - tiger | 48 | 56_animals_flying_tech_dance |
|
99 |
+
| 57 | self - question - neutral - yes - greeting | 47 | 57_self_question_neutral_yes |
|
100 |
+
| 58 | mt - cv - tr - tm - drug | 47 | 58_mt_cv_tr_tm |
|
101 |
+
| 59 | organization person - location organization - organization - location - person | 46 | 59_organization person_location organization_organization_location |
|
102 |
+
| 60 | - - - - | 45 | 60____ |
|
103 |
+
| 61 | joy - anger - sadness - sad - happy | 44 | 61_joy_anger_sadness_sad |
|
104 |
+
| 62 | daisy - tulip - rose - - | 43 | 62_daisy_tulip_rose_ |
|
105 |
+
| 63 | positive - negative - neutral - neutral positive - positive negative | 42 | 63_positive_negative_neutral_neutral positive |
|
106 |
+
| 64 | windows - pm - 21 - office - 20 | 42 | 64_windows_pm_21_office |
|
107 |
+
| 65 | label_14 label_15 - label_13 label_14 label_15 - label_15 - label_12 label_13 label_14 - label_11 label_12 label_13 | 42 | 65_label_14 label_15_label_13 label_14 label_15_label_15_label_12 label_13 label_14 |
|
108 |
+
| 66 | position - statement - lead - request - study | 42 | 66_position_statement_lead_request |
|
109 |
+
| 67 | business - news - entertainment - tech - sport | 41 | 67_business_news_entertainment_tech |
|
110 |
+
| 68 | hate - speech - language - reporting - non | 41 | 68_hate_speech_language_reporting |
|
111 |
+
| 69 | bd - nan - id - bg - | 41 | 69_bd_nan_id_bg |
|
112 |
+
| 70 | cream - burger - carrot - ice cream - salad | 41 | 70_cream_burger_carrot_ice cream |
|
113 |
+
| 71 | human - machine - ai - artificial - art | 40 | 71_human_machine_ai_artificial |
|
114 |
+
| 72 | open - high - tie - abstract - button | 40 | 72_open_high_tie_abstract |
|
115 |
+
| 73 | label_23 label_24 label_25 - label_24 label_25 - label_22 label_23 label_24 - label_23 label_24 - label_21 label_22 label_23 | 40 | 73_label_23 label_24 label_25_label_24 label_25_label_22 label_23 label_24_label_23 label_24 |
|
116 |
+
| 74 | label_8 label_9 label_0 - label_9 label_0 label_1 - label_9 label_0 - label_7 label_8 label_9 - label_8 label_9 | 39 | 74_label_8 label_9 label_0_label_9 label_0 label_1_label_9 label_0_label_7 label_8 label_9 |
|
117 |
+
| 75 | cat - dog - cats - dogs - drinking | 39 | 75_cat_dog_cats_dogs |
|
118 |
+
| 76 | org org - loc loc - org - misc - loc | 38 | 76_org org_loc loc_org_misc |
|
119 |
+
| 77 | airplane - deer - bird - ship - frog | 38 | 77_airplane_deer_bird_ship |
|
120 |
+
| 78 | label_32 label_33 label_34 - label_33 label_34 - label_31 label_32 label_33 - label_32 label_33 - label_30 label_31 label_32 | 38 | 78_label_32 label_33 label_34_label_33 label_34_label_31 label_32 label_33_label_32 label_33 |
|
121 |
+
| 79 | true - - - - | 38 | 79_true___ |
|
122 |
+
| 80 | family - sports - music - related - health | 38 | 80_family_sports_music_related |
|
123 |
+
| 81 | star - positive - negative - amazon - negative positive | 37 | 81_star_positive_negative_amazon |
|
124 |
+
| 82 | hospital - unknown - description - material - pad | 37 | 82_hospital_unknown_description_material |
|
125 |
+
| 83 | threat - hate - reward - quality - content | 36 | 83_threat_hate_reward_quality |
|
126 |
+
| 84 | music - speech - instrument - engine - wind | 35 | 84_music_speech_instrument_engine |
|
127 |
+
| 85 | closure - annual - statement - issues - reward | 35 | 85_closure_annual_statement_issues |
|
128 |
+
| 86 | adp - aux - sconj - pron - noun | 35 | 86_adp_aux_sconj_pron |
|
129 |
+
| 87 | experience - location - skill - address - result | 35 | 87_experience_location_skill_address |
|
130 |
+
| 88 | - - - - | 34 | 88____ |
|
131 |
+
| 89 | test - train - risk - non - high | 34 | 89_test_train_risk_non |
|
132 |
+
| 90 | samoyed - corgi - husky - golden retriever - golden | 34 | 90_samoyed_corgi_husky_golden retriever |
|
133 |
+
| 91 | unk - zero - 10 - 12 13 14 - 13 14 15 | 33 | 91_unk_zero_10_12 13 14 |
|
134 |
+
| 92 | non - neutral - ok - lead - | 33 | 92_non_neutral_ok_lead |
|
135 |
+
| 93 | normal - covid - virus - regular - disorder | 33 | 93_normal_covid_virus_regular |
|
136 |
+
| 94 | test - help - app - risk - joke | 32 | 94_test_help_app_risk |
|
137 |
+
| 95 | replace_ - append_ - replace_ replace_ - append_ append_ - replace_ replace_ replace_ | 32 | 95_replace__append__replace_ replace__append_ append_ |
|
138 |
+
| 96 | disease - issues - pressure - drug - blood | 31 | 96_disease_issues_pressure_drug |
|
139 |
+
| 97 | women - casual - sexual - individual - use | 31 | 97_women_casual_sexual_individual |
|
140 |
+
| 98 | address - balance - code - second - currency | 30 | 98_address_balance_code_second |
|
141 |
+
| 99 | hate - non - neutral - - | 30 | 99_hate_non_neutral_ |
|
142 |
+
| 100 | normal - cell - large - clean - healthy | 29 | 100_normal_cell_large_clean |
|
143 |
+
| 101 | neutral - se - - - | 29 | 101_neutral_se__ |
|
144 |
+
| 102 | male - female - hair - skin - men | 29 | 102_male_female_hair_skin |
|
145 |
+
| 103 | title - page - section - abstract - table | 28 | 103_title_page_section_abstract |
|
146 |
+
| 104 | number - gender - case - person - fin | 28 | 104_number_gender_case_person |
|
147 |
+
| 105 | man - bird - flower - long - double | 28 | 105_man_bird_flower_long |
|
148 |
+
| 106 | contradiction - entailment - neutral - - | 28 | 106_contradiction_entailment_neutral_ |
|
149 |
+
| 107 | non - - - - | 27 | 107_non___ |
|
150 |
+
| 108 | tim - fac - org - pro - loc | 27 | 108_tim_fac_org_pro |
|
151 |
+
| 109 | lincoln - jaguar - visual - audio - sony | 27 | 109_lincoln_jaguar_visual_audio |
|
152 |
+
| 110 | statement - info - check - ad - news | 27 | 110_statement_info_check_ad |
|
153 |
+
| 111 | ben - ext - root - exp - loc | 26 | 111_ben_ext_root_exp |
|
154 |
+
| 112 | yes - - - - | 26 | 112_yes___ |
|
155 |
+
| 113 | queen - jack - king - south - war | 26 | 113_queen_jack_king_south |
|
156 |
+
| 114 | - - - - | 26 | 114____ |
|
157 |
+
| 115 | ft - cardinal - act - loc - loc loc | 25 | 115_ft_cardinal_act_loc |
|
158 |
+
| 116 | bio - chemical - food - - | 25 | 116_bio_chemical_food_ |
|
159 |
+
| 117 | ft - cardinal - act - loc - loc misc org | 25 | 117_ft_cardinal_act_loc |
|
160 |
+
| 118 | metric - task - - - | 25 | 118_metric_task__ |
|
161 |
+
| 119 | email - age - patient - zip - organization | 25 | 119_email_age_patient_zip |
|
162 |
+
| 120 | ent - im - ru - mat - art | 25 | 120_ent_im_ru_mat |
|
163 |
+
| 121 | ex - pt - galaxy - moon - 8888 | 24 | 121_ex_pt_galaxy_moon |
|
164 |
+
| 122 | - - - - | 24 | 122____ |
|
165 |
+
| 123 | neu - sad - dis - joy - | 24 | 123_neu_sad_dis_joy |
|
166 |
+
| 124 | label_122 - label_121 - label_120 - label_123 - label_119 | 24 | 124_label_122_label_121_label_120_label_123 |
|
167 |
+
| 125 | mixed - positive - negative - neutral positive - neutral | 24 | 125_mixed_positive_negative_neutral positive |
|
168 |
+
| 126 | date event - percent person - quantity - money - percent | 24 | 126_date event_percent person_quantity_money |
|
169 |
+
| 127 | fear joy - sadness surprise - surprise - joy - sadness | 24 | 127_fear joy_sadness surprise_surprise_joy |
|
170 |
+
| 128 | disgust - sadness surprise - joy love - surprise - joy | 24 | 128_disgust_sadness surprise_joy love_surprise |
|
171 |
+
| 129 | magnet - motor - hello - undefined - start | 24 | 129_magnet_motor_hello_undefined |
|
172 |
+
| 130 | loc loc - loc - pers - hi - en | 24 | 130_loc loc_loc_pers_hi |
|
173 |
+
| 131 | event - pers - fac - pro - loc org | 24 | 131_event_pers_fac_pro |
|
174 |
+
| 132 | disorder - body - patient - age - disease | 23 | 132_disorder_body_patient_age |
|
175 |
+
| 133 | happiness - fear - anger disgust - disgust - sadness | 23 | 133_happiness_fear_anger disgust_disgust |
|
176 |
+
| 134 | control - la - social - sin - civil | 23 | 134_control_la_social_sin |
|
177 |
+
| 135 | label_98 label_99 - label_97 label_98 label_99 - label_97 label_98 - label_95 label_96 - label_96 label_97 label_98 | 23 | 135_label_98 label_99_label_97 label_98 label_99_label_97 label_98_label_95 label_96 |
|
178 |
+
| 136 | greek - chinese - italian - japanese - dutch | 23 | 136_greek_chinese_italian_japanese |
|
179 |
+
| 137 | clean - - - - | 23 | 137_clean___ |
|
180 |
+
| 138 | protein - chemical - cell - - | 22 | 138_protein_chemical_cell_ |
|
181 |
+
| 139 | treatment - disease - location organization - organization person - organization | 22 | 139_treatment_disease_location organization_organization person |
|
182 |
+
| 140 | institution - tools - org - loc - organization | 22 | 140_institution_tools_org_loc |
|
183 |
+
| 141 | statement - question - - - | 22 | 141_statement_question__ |
|
184 |
+
| 142 | period - question - noun - number - | 21 | 142_period_question_noun_number |
|
185 |
+
| 143 | regular - - - - | 21 | 143_regular___ |
|
186 |
+
| 144 | rna - - - - | 21 | 144_rna___ |
|
187 |
+
| 145 | rs - - - - | 21 | 145_rs___ |
|
188 |
+
| 146 | address - id - job - email - country | 21 | 146_address_id_job_email |
|
189 |
+
| 147 | neg - neu - good - - | 21 | 147_neg_neu_good_ |
|
190 |
+
| 148 | label_122 label_123 - label_123 - label_122 - label_121 - label_120 | 20 | 148_label_122 label_123_label_123_label_122_label_121 |
|
191 |
+
| 149 | drink - tea - wine - coffee - soft | 20 | 149_drink_tea_wine_coffee |
|
192 |
+
| 150 | miscellaneous - organization - percent - money - percent person | 20 | 150_miscellaneous_organization_percent_money |
|
193 |
+
| 151 | description - invoice - zip - state - city | 20 | 151_description_invoice_zip_state |
|
194 |
+
| 152 | sports - tech - business - sport - | 20 | 152_sports_tech_business_sport |
|
195 |
+
| 153 | ok - vin - rl - ft - year | 20 | 153_ok_vin_rl_ft |
|
196 |
+
| 154 | healthy - - - - | 20 | 154_healthy___ |
|
197 |
+
| 155 | association - event - ticket - disaster - map | 20 | 155_association_event_ticket_disaster |
|
198 |
+
| 156 | 10 11 - 10 11 12 - 11 12 - 11 - 11 12 13 | 19 | 156_10 11_10 11 12_11 12_11 |
|
199 |
+
| 157 | noun num pron - num pron propn - num pron - pron propn punct - pron propn | 19 | 157_noun num pron_num pron propn_num pron_pron propn punct |
|
200 |
+
| 158 | cell - organ - organism - multi - tissue | 18 | 158_cell_organ_organism_multi |
|
201 |
+
| 159 | 02 - ent - express - act - delete | 18 | 159_02_ent_express_act |
|
202 |
+
| 160 | sym verb adj - verb adj adp - intj noun num - det intj noun - adj adp adv | 18 | 160_sym verb adj_verb adj adp_intj noun num_det intj noun |
|
203 |
+
| 161 | 12 - - - - | 18 | 161_12___ |
|
204 |
+
| 162 | org org - org - drug - - | 18 | 162_org org_org_drug_ |
|
205 |
+
| 163 | short - long - sl - ac - pad | 18 | 163_short_long_sl_ac |
|
206 |
+
| 164 | plastic - paper - glass - metal - sheet | 18 | 164_plastic_paper_glass_metal |
|
207 |
+
| 165 | ii - blank - iii - vi - et | 17 | 165_ii_blank_iii_vi |
|
208 |
+
| 166 | normal - virus - desert - smoke - pressure | 17 | 166_normal_virus_desert_smoke |
|
209 |
+
| 167 | skill - skills - - - | 17 | 167_skill_skills__ |
|
210 |
+
| 168 | protein - rna - cell - line - type | 17 | 168_protein_rna_cell_line |
|
211 |
+
| 169 | korean - russian - dutch - french - thai | 17 | 169_korean_russian_dutch_french |
|
212 |
+
| 170 | rainbow - rain - snow - color - green | 17 | 170_rainbow_rain_snow_color |
|
213 |
+
| 171 | company - role - institution - skill - loc org | 16 | 171_company_role_institution_skill |
|
214 |
+
| 172 | exp - pp - intj - punc - prep | 16 | 172_exp_pp_intj_punc |
|
215 |
+
| 173 | key - menu - - - | 16 | 173_key_menu__ |
|
216 |
+
| 174 | adult - young - child - male - female | 16 | 174_adult_young_child_male |
|
217 |
+
| 175 | normal - - - - | 16 | 175_normal___ |
|
218 |
+
| 176 | mask - bright - sharp - head - normal | 16 | 176_mask_bright_sharp_head |
|
219 |
+
| 177 | anger disgust fear - anger disgust - disgust fear - disgust - surprise anger | 16 | 177_anger disgust fear_anger disgust_disgust fear_disgust |
|
220 |
+
| 178 | - - - - | 16 | 178____ |
|
221 |
+
| 179 | objective - non - neutral - - | 16 | 179_objective_non_neutral_ |
|
222 |
+
| 180 | cr - sd - db - - | 16 | 180_cr_sd_db_ |
|
223 |
+
| 181 | - - - - | 16 | 181____ |
|
224 |
+
| 182 | label_29 label_3 label_30 - label_27 label_28 label_29 - label_26 label_27 label_28 - label_28 label_29 label_3 - label_29 label_3 | 16 | 182_label_29 label_3 label_30_label_27 label_28 label_29_label_26 label_27 label_28_label_28 label_29 label_3 |
|
225 |
+
| 183 | test - - - - | 15 | 183_test___ |
|
226 |
+
| 184 | good - bad - non - - | 15 | 184_good_bad_non_ |
|
227 |
+
| 185 | local - por - art - da - em | 15 | 185_local_por_art_da |
|
228 |
+
| 186 | label_122 label_123 - label_97 label_98 label_99 - label_97 label_98 - label_96 label_97 label_98 - label_98 label_99 | 15 | 186_label_122 label_123_label_97 label_98 label_99_label_97 label_98_label_96 label_97 label_98 |
|
229 |
+
| 187 | prod - loc - evt - misc - org org | 15 | 187_prod_loc_evt_misc |
|
230 |
+
| 188 | invoice - email - form - letter - report | 15 | 188_invoice_email_form_letter |
|
231 |
+
| 189 | end - head - cross - - | 15 | 189_end_head_cross_ |
|
232 |
+
| 190 | target - instrument - opinion - question - price | 15 | 190_target_instrument_opinion_question |
|
233 |
+
| 191 | unrelated - support - - - | 15 | 191_unrelated_support__ |
|
234 |
+
| 192 | ru - pl - bg - en - es | 14 | 192_ru_pl_bg_en |
|
235 |
+
| 193 | road - good - bike - - | 14 | 193_road_good_bike_ |
|
236 |
+
| 194 | human - organism - plants - - | 14 | 194_human_organism_plants_ |
|
237 |
+
| 195 | label_7 label_8 label_9 - label_8 label_9 - label_0 label_1 label_10 - label_1 label_10 - label_10 | 14 | 195_label_7 label_8 label_9_label_8 label_9_label_0 label_1 label_10_label_1 label_10 |
|
238 |
+
| 196 | replace_ - append_ - replace_ replace_ - append_ append_ - replace_ replace_ replace_ | 13 | 196_replace__append__replace_ replace__append_ append_ |
|
239 |
+
| 197 | brand - company - tm - color - item | 13 | 197_brand_company_tm_color |
|
240 |
+
| 198 | pro - neutral - russian - support - attack | 13 | 198_pro_neutral_russian_support |
|
241 |
+
| 199 | 18 19 20 - 19 20 - 23 - 17 18 19 - 21 | 13 | 199_18 19 20_19 20_23_17 18 19 |
|
242 |
+
| 200 | crime - pers - time - book - day | 13 | 200_crime_pers_time_book |
|
243 |
+
| 201 | neutral - positive - negative - positive negative - neutral positive | 13 | 201_neutral_positive_negative_positive negative |
|
244 |
+
| 202 | - - - - | 13 | 202____ |
|
245 |
+
| 203 | chemical - disease - bio - - | 13 | 203_chemical_disease_bio_ |
|
246 |
+
| 204 | angry - happy - sad - neutral - 60 | 12 | 204_angry_happy_sad_neutral |
|
247 |
+
| 205 | organisation - task - country - location - product | 12 | 205_organisation_task_country_location |
|
248 |
+
| 206 | iv - iii - vi - ii - unknown | 12 | 206_iv_iii_vi_ii |
|
249 |
+
| 207 | neutral - risk - - - | 12 | 207_neutral_risk__ |
|
250 |
+
| 208 | container - id - type - person - number | 12 | 208_container_id_type_person |
|
251 |
+
| 209 | target - - - - | 12 | 209_target___ |
|
252 |
+
| 210 | pop - metal - country - song - rock | 12 | 210_pop_metal_country_song |
|
253 |
+
| 211 | email - os - language - method - function | 12 | 211_email_os_language_method |
|
254 |
+
| 212 | contradiction - non - entailment - - | 12 | 212_contradiction_non_entailment_ |
|
255 |
+
| 213 | background - objective - method - result - | 12 | 213_background_objective_method_result |
|
256 |
+
| 214 | convertible - cab - type - series - martin | 12 | 214_convertible_cab_type_series |
|
257 |
+
| 215 | public - smoking - drinking - ambiguous - non | 12 | 215_public_smoking_drinking_ambiguous |
|
258 |
+
| 216 | rust - - - - | 12 | 216_rust___ |
|
259 |
+
| 217 | persian - mr - man - flying - ghost | 12 | 217_persian_mr_man_flying |
|
260 |
+
| 218 | quote - yes - middle - request - | 12 | 218_quote_yes_middle_request |
|
261 |
+
| 219 | text - mixed - - - | 12 | 219_text_mixed__ |
|
262 |
+
| 220 | punc - prep - digit - latin - conj | 12 | 220_punc_prep_digit_latin |
|
263 |
+
| 221 | panda - air - mr - ticket - little | 12 | 221_panda_air_mr_ticket |
|
264 |
+
| 222 | - - - - | 12 | 222____ |
|
265 |
+
| 223 | sym verb adj - intj noun num - verb adj adp - cconj det intj - aux cconj det | 12 | 223_sym verb adj_intj noun num_verb adj adp_cconj det intj |
|
266 |
+
| 224 | healthy - tomato - plant - pepper - spot | 11 | 224_healthy_tomato_plant_pepper |
|
267 |
+
| 225 | sony - lg - tv - galaxy - monitor | 11 | 225_sony_lg_tv_galaxy |
|
268 |
+
| 226 | new - city - mid - location - south | 11 | 226_new_city_mid_location |
|
269 |
+
| 227 | space - - - - | 11 | 227_space___ |
|
270 |
+
| 228 | cloud - racing - motorcycle - boy - bus | 11 | 228_cloud_racing_motorcycle_boy |
|
271 |
+
| 229 | punc - zero - pers - neg - reflex | 11 | 229_punc_zero_pers_neg |
|
272 |
+
| 230 | energy - arts - high - systems - computer | 11 | 230_energy_arts_high_systems |
|
273 |
+
| 231 | dis - ad - media - site - plant | 11 | 231_dis_ad_media_site |
|
274 |
+
| 232 | world - tech - business - sports - female | 11 | 232_world_tech_business_sports |
|
275 |
+
| 233 | sadness - anger - anger fear - joy - fear | 10 | 233_sadness_anger_anger fear_joy |
|
276 |
+
| 234 | neg - adj - sym - propn - num | 10 | 234_neg_adj_sym_propn |
|
277 |
+
| 235 | bulldog - cat - husky - pug - corgi | 9 | 235_bulldog_cat_husky_pug |
|
278 |
+
| 236 | - - - - | 8 | 236____ |
|
279 |
+
| 237 | origin - quote - actor - opinion - language | 7 | 237_origin_quote_actor_opinion |
|
280 |
+
| 238 | na - nb - nc - neu - ng | 7 | 238_na_nb_nc_neu |
|
281 |
+
| 239 | - - - - | 7 | 239____ |
|
282 |
+
| 240 | ci - aa - joy - im - ip | 7 | 240_ci_aa_joy_im |
|
283 |
+
| 241 | - - - - | 6 | 241____ |
|
284 |
+
| 242 | skill - email - address - grade - language | 6 | 242_skill_email_address_grade |
|
285 |
+
| 243 | sexual - threat - christian - hate - male | 6 | 243_sexual_threat_christian_hate |
|
286 |
+
| 244 | transmission - wind - tower - pole - | 6 | 244_transmission_wind_tower_pole |
|
287 |
+
| 245 | label_14 label_15 - label_13 label_14 label_15 - label_15 - label_12 label_13 label_14 - label_11 label_12 label_13 | 6 | 245_label_14 label_15_label_13 label_14 label_15_label_15_label_12 label_13 label_14 |
|
288 |
+
|
289 |
+
</details>
|
290 |
+
|
291 |
+
## Training hyperparameters
|
292 |
+
|
293 |
+
* calculate_probabilities: False
|
294 |
+
* language: None
|
295 |
+
* low_memory: False
|
296 |
+
* min_topic_size: 10
|
297 |
+
* n_gram_range: (1, 1)
|
298 |
+
* nr_topics: None
|
299 |
+
* seed_topic_list: None
|
300 |
+
* top_n_words: 10
|
301 |
+
* verbose: True
|
302 |
+
|
303 |
+
## Framework versions
|
304 |
+
|
305 |
+
* Numpy: 1.22.4
|
306 |
+
* HDBSCAN: 0.8.29
|
307 |
+
* UMAP: 0.5.3
|
308 |
+
* Pandas: 1.5.3
|
309 |
+
* Scikit-Learn: 1.2.2
|
310 |
+
* Sentence-transformers: 2.2.2
|
311 |
+
* Transformers: 4.29.2
|
312 |
+
* Numba: 0.56.4
|
313 |
+
* Plotly: 5.13.1
|
314 |
+
* Python: 3.10.11
|
config.json
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"calculate_probabilities": false,
|
3 |
+
"language": null,
|
4 |
+
"low_memory": false,
|
5 |
+
"min_topic_size": 10,
|
6 |
+
"n_gram_range": [
|
7 |
+
1,
|
8 |
+
1
|
9 |
+
],
|
10 |
+
"nr_topics": null,
|
11 |
+
"seed_topic_list": null,
|
12 |
+
"top_n_words": 10,
|
13 |
+
"verbose": true
|
14 |
+
}
|
topic_embeddings.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6dfc68ba36ab2801761db7804d39995eb77b59dda172ae2398fe7b07821a22bf
|
3 |
+
size 758872
|
topics.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|