Commit
•
3637239
1
Parent(s):
1a0f1cd
Add BERTopic model
Browse files- README.md +319 -0
- config.json +14 -0
- topic_embeddings.safetensors +3 -0
- topics.json +0 -0
README.md
ADDED
@@ -0,0 +1,319 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
tags:
|
4 |
+
- bertopic
|
5 |
+
library_name: bertopic
|
6 |
+
pipeline_tag: text-classification
|
7 |
+
---
|
8 |
+
|
9 |
+
# label_model
|
10 |
+
|
11 |
+
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
12 |
+
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
13 |
+
|
14 |
+
## Usage
|
15 |
+
|
16 |
+
To use this model, please install BERTopic:
|
17 |
+
|
18 |
+
```
|
19 |
+
pip install -U bertopic
|
20 |
+
```
|
21 |
+
|
22 |
+
You can use the model as follows:
|
23 |
+
|
24 |
+
```python
|
25 |
+
from bertopic import BERTopic
|
26 |
+
topic_model = BERTopic.load("davanstrien/label_model")
|
27 |
+
|
28 |
+
topic_model.get_topic_info()
|
29 |
+
```
|
30 |
+
|
31 |
+
## Topic overview
|
32 |
+
|
33 |
+
* Number of topics: 252
|
34 |
+
* Number of training documents: 14986
|
35 |
+
|
36 |
+
<details>
|
37 |
+
<summary>Click here for an overview of all topics.</summary>
|
38 |
+
|
39 |
+
| Topic ID | Topic Keywords | Topic Frequency | Label |
|
40 |
+
|----------|----------------|-----------------|-------|
|
41 |
+
| -1 | date - city - pre - heavy - fur | 5 | -1_date_city_pre_heavy |
|
42 |
+
| 0 | label_1 label_2 - label_0 label_1 label_2 - label_0 label_1 - label_1 - label_2 | 1333 | 0_label_1 label_2_label_0 label_1 label_2_label_0 label_1_label_1 |
|
43 |
+
| 1 | label_1 label_2 label_3 - label_3 label_4 label_5 - label_4 label_5 - label_2 label_3 label_4 - label_5 | 1043 | 1_label_1 label_2 label_3_label_3 label_4 label_5_label_4 label_5_label_2 label_3 label_4 |
|
44 |
+
| 2 | negative positive - positive negative - negative - positive - target | 803 | 2_negative positive_positive negative_negative_positive |
|
45 |
+
| 3 | loc misc org - loc misc - misc org - misc - org loc | 651 | 3_loc misc org_loc misc_misc org_misc |
|
46 |
+
| 4 | neutral positive - neutral - positive negative - negative - positive | 479 | 4_neutral positive_neutral_positive negative_negative |
|
47 |
+
| 5 | label_0 - - - - | 357 | 5_label_0___ |
|
48 |
+
| 6 | contradiction - entailment - neutral - ambiguous - | 348 | 6_contradiction_entailment_neutral_ambiguous |
|
49 |
+
| 7 | label_0 - - - - | 334 | 7_label_0___ |
|
50 |
+
| 8 | 99 - - - - | 326 | 8_99___ |
|
51 |
+
| 9 | label_1 label_2 label_3 - label_2 label_3 label_4 - label_3 label_4 - label_2 label_3 - label_4 | 300 | 9_label_1 label_2 label_3_label_2 label_3 label_4_label_3 label_4_label_2 label_3 |
|
52 |
+
| 10 | entailment - true - child - related - non | 257 | 10_entailment_true_child_related |
|
53 |
+
| 11 | snake - dog - bear - wolf - sea | 245 | 11_snake_dog_bear_wolf |
|
54 |
+
| 12 | label_5 label_6 label_7 - label_6 label_7 - label_4 label_5 label_6 - label_5 label_6 - label_6 label_7 label_8 | 241 | 12_label_5 label_6 label_7_label_6 label_7_label_4 label_5 label_6_label_5 label_6 |
|
55 |
+
| 13 | loc misc org - loc misc - misc org - misc - org loc | 229 | 13_loc misc org_loc misc_misc org_misc |
|
56 |
+
| 14 | weather - transfer - alarm - text - time | 228 | 14_weather_transfer_alarm_text |
|
57 |
+
| 15 | label_1 label_2 label_3 - label_2 label_3 - label_3 - label_1 label_2 - label_0 label_1 label_2 | 222 | 15_label_1 label_2 label_3_label_2 label_3_label_3_label_1 label_2 |
|
58 |
+
| 16 | delete - different - bad - related - rel | 207 | 16_delete_different_bad_related |
|
59 |
+
| 17 | label_12 label_13 label_14 - label_11 label_12 label_13 - label_13 label_14 - label_12 label_13 - label_10 label_11 label_12 | 172 | 17_label_12 label_13 label_14_label_11 label_12 label_13_label_13 label_14_label_12 label_13 |
|
60 |
+
| 18 | - - - - | 166 | 18____ |
|
61 |
+
| 19 | loc org loc - loc org - org loc - org - loc | 142 | 19_loc org loc_loc org_org loc_org |
|
62 |
+
| 20 | label_6 label_60 label_61 - label_60 label_61 - label_62 label_63 - label_61 label_62 label_63 - label_61 label_62 | 126 | 20_label_6 label_60 label_61_label_60 label_61_label_62 label_63_label_61 label_62 label_63 |
|
63 |
+
| 21 | label_4 label_5 label_6 - label_5 label_6 - label_6 - label_1 label_2 label_3 - label_3 label_4 label_5 | 117 | 21_label_4 label_5 label_6_label_5 label_6_label_6_label_1 label_2 label_3 |
|
64 |
+
| 22 | test - second - - - | 106 | 22_test_second__ |
|
65 |
+
| 23 | forest - industrial - transport - low - bamboo | 104 | 23_forest_industrial_transport_low |
|
66 |
+
| 24 | answer - header - question - quantity - | 104 | 24_answer_header_question_quantity |
|
67 |
+
| 25 | healthy - leaf - rust - plant - spot | 103 | 25_healthy_leaf_rust_plant |
|
68 |
+
| 26 | left - right - stop - yes - unknown | 100 | 26_left_right_stop_yes |
|
69 |
+
| 27 | en - na - alpha - fan - lifestyle | 93 | 27_en_na_alpha_fan |
|
70 |
+
| 28 | label_13 label_14 label_15 - label_14 label_15 - label_15 - label_12 label_13 label_14 - label_11 label_12 label_13 | 92 | 28_label_13 label_14 label_15_label_14 label_15_label_15_label_12 label_13 label_14 |
|
71 |
+
| 29 | disease - bio - disorder - healthy - | 86 | 29_disease_bio_disorder_healthy |
|
72 |
+
| 30 | work - group - person product - product - location | 86 | 30_work_group_person product_product |
|
73 |
+
| 31 | fear joy - sadness surprise - anger fear - joy love - surprise | 82 | 31_fear joy_sadness surprise_anger fear_joy love |
|
74 |
+
| 32 | common - non - different - - | 78 | 32_common_non_different_ |
|
75 |
+
| 33 | dis - - - - | 76 | 33_dis___ |
|
76 |
+
| 34 | - - - - | 73 | 34____ |
|
77 |
+
| 35 | restaurant - pizza - place - salad - food | 69 | 35_restaurant_pizza_place_salad |
|
78 |
+
| 36 | cconj det intj - adj adp adv - det intj noun - det intj - noun num pron | 66 | 36_cconj det intj_adj adp adv_det intj noun_det intj |
|
79 |
+
| 37 | label_17 label_18 label_19 - label_18 label_19 label_2 - label_18 label_19 - label_19 label_2 - label_16 label_17 label_18 | 66 | 37_label_17 label_18 label_19_label_18 label_19 label_2_label_18 label_19_label_19 label_2 |
|
80 |
+
| 38 | ll - year - related - cause - delete | 65 | 38_ll_year_related_cause |
|
81 |
+
| 39 | anger fear - joy love - surprise - joy - love | 64 | 39_anger fear_joy love_surprise_joy |
|
82 |
+
| 40 | true - news - partial - - | 64 | 40_true_news_partial_ |
|
83 |
+
| 41 | - - - - | 63 | 41____ |
|
84 |
+
| 42 | label_1 label_10 label_11 - label_10 label_11 - label_8 label_9 label_0 - label_7 label_8 label_9 - label_8 label_9 | 62 | 42_label_1 label_10 label_11_label_10 label_11_label_8 label_9 label_0_label_7 label_8 label_9 |
|
85 |
+
| 43 | pos - neg - - - | 62 | 43_pos_neg__ |
|
86 |
+
| 44 | loc org - org - loc - date - sex | 61 | 44_loc org_org_loc_date |
|
87 |
+
| 45 | label_19 label_2 label_20 - label_2 label_20 - label_20 - label_18 label_19 label_2 - label_18 label_19 | 60 | 45_label_19 label_2 label_20_label_2 label_20_label_20_label_18 label_19 label_2 |
|
88 |
+
| 46 | event - group - person product - product - location | 57 | 46_event_group_person product_product |
|
89 |
+
| 47 | bio - chemical - disease - effect - food | 57 | 47_bio_chemical_disease_effect |
|
90 |
+
| 48 | 234 - 19 20 21 - 20 21 22 - 22 23 24 - 23 24 | 57 | 48_234_19 20 21_20 21 22_22 23 24 |
|
91 |
+
| 49 | fear happy neutral - happy neutral - fear happy - sad - happy | 53 | 49_fear happy neutral_happy neutral_fear happy_sad |
|
92 |
+
| 50 | battery - volume - juice - chinese - korean | 53 | 50_battery_volume_juice_chinese |
|
93 |
+
| 51 | menu - price - num - - | 52 | 51_menu_price_num_ |
|
94 |
+
| 52 | poor - ok - good - bad - great | 52 | 52_poor_ok_good_bad |
|
95 |
+
| 53 | ll - cause - delete - unknown - | 51 | 53_ll_cause_delete_unknown |
|
96 |
+
| 54 | hospital - unknown - en - material - digital | 48 | 54_hospital_unknown_en_material |
|
97 |
+
| 55 | ll - cause - delete - unknown - | 48 | 55_ll_cause_delete_unknown |
|
98 |
+
| 56 | self - question - neutral - yes - statement | 48 | 56_self_question_neutral_yes |
|
99 |
+
| 57 | fat - loose - small - sugar - common | 47 | 57_fat_loose_small_sugar |
|
100 |
+
| 58 | true - - - - | 47 | 58_true___ |
|
101 |
+
| 59 | cream - drinks - seafood - fruit - ice cream | 46 | 59_cream_drinks_seafood_fruit |
|
102 |
+
| 60 | tr - ru - pers - pt - prod | 46 | 60_tr_ru_pers_pt |
|
103 |
+
| 61 | - - - - | 45 | 61____ |
|
104 |
+
| 62 | clothing - care - kitchen - personal - health | 44 | 62_clothing_care_kitchen_personal |
|
105 |
+
| 63 | business - news - tech - entertainment - sport | 43 | 63_business_news_tech_entertainment |
|
106 |
+
| 64 | non - partial - neutral - yes - ok | 43 | 64_non_partial_neutral_yes |
|
107 |
+
| 65 | organization person - location organization - organization - location - person | 43 | 65_organization person_location organization_organization_location |
|
108 |
+
| 66 | daisy - tulip - rose - - | 43 | 66_daisy_tulip_rose_ |
|
109 |
+
| 67 | joy - sadness - anger - angry - happy | 42 | 67_joy_sadness_anger_angry |
|
110 |
+
| 68 | samoyed - corgi - husky - pomeranian - golden | 41 | 68_samoyed_corgi_husky_pomeranian |
|
111 |
+
| 69 | music - instrument - engine - wind - animals | 41 | 69_music_instrument_engine_wind |
|
112 |
+
| 70 | hate - language - reporting - non - normal | 41 | 70_hate_language_reporting_non |
|
113 |
+
| 71 | label_23 label_24 label_25 - label_24 label_25 - label_22 label_23 label_24 - label_23 label_24 - label_21 label_22 label_23 | 41 | 71_label_23 label_24 label_25_label_24 label_25_label_22 label_23 label_24_label_23 label_24 |
|
114 |
+
| 72 | id - - - - | 40 | 72_id___ |
|
115 |
+
| 73 | animals - tech - dance - tiger - sport | 40 | 73_animals_tech_dance_tiger |
|
116 |
+
| 74 | org org - loc loc - org - misc - loc | 40 | 74_org org_loc loc_org_misc |
|
117 |
+
| 75 | star - positive - negative - negative positive - | 38 | 75_star_positive_negative_negative positive |
|
118 |
+
| 76 | bird - ship - frog - horse - truck | 37 | 76_bird_ship_frog_horse |
|
119 |
+
| 77 | cat - cats - dog - dogs - sleeping | 37 | 77_cat_cats_dog_dogs |
|
120 |
+
| 78 | family - sports - music - related - health | 37 | 78_family_sports_music_related |
|
121 |
+
| 79 | label_8 label_9 label_0 - label_9 label_0 label_1 - label_9 label_0 - label_7 label_8 label_9 - label_8 label_9 | 37 | 79_label_8 label_9 label_0_label_9 label_0 label_1_label_9 label_0_label_7 label_8 label_9 |
|
122 |
+
| 80 | room - service - transport - care - kitchen | 37 | 80_room_service_transport_care |
|
123 |
+
| 81 | positive - negative - neutral positive - neutral - positive negative | 37 | 81_positive_negative_neutral positive_neutral |
|
124 |
+
| 82 | test - play - train - non - live | 36 | 82_test_play_train_non |
|
125 |
+
| 83 | tim - evt - pro - gpe - org | 36 | 83_tim_evt_pro_gpe |
|
126 |
+
| 84 | cold - disease - pressure - drug - blood | 36 | 84_cold_disease_pressure_drug |
|
127 |
+
| 85 | non - early - late - - | 35 | 85_non_early_late_ |
|
128 |
+
| 86 | 21 - office - 20 - 17 - 16 | 34 | 86_21_office_20_17 |
|
129 |
+
| 87 | prep - nn - cc - pro - ex | 34 | 87_prep_nn_cc_pro |
|
130 |
+
| 88 | evidence - position - statement - lead - request | 33 | 88_evidence_position_statement_lead |
|
131 |
+
| 89 | adp - aux - sconj - cconj - det noun | 33 | 89_adp_aux_sconj_cconj |
|
132 |
+
| 90 | job - start - help - address - quantity | 33 | 90_job_start_help_address |
|
133 |
+
| 91 | gender - number - case - ind - person | 33 | 91_gender_number_case_ind |
|
134 |
+
| 92 | threat - hate - adult - target - male | 33 | 92_threat_hate_adult_target |
|
135 |
+
| 93 | institution - tools - organization - org - agent | 32 | 93_institution_tools_organization_org |
|
136 |
+
| 94 | - - - - | 32 | 94____ |
|
137 |
+
| 95 | email - age - patient - state - zip | 32 | 95_email_age_patient_state |
|
138 |
+
| 96 | mixed - positive - negative - neutral - neutral positive | 32 | 96_mixed_positive_negative_neutral |
|
139 |
+
| 97 | test - help - joke - contact - report | 32 | 97_test_help_joke_contact |
|
140 |
+
| 98 | address - balance - statement - request - second | 31 | 98_address_balance_statement_request |
|
141 |
+
| 99 | - - - - | 31 | 99____ |
|
142 |
+
| 100 | hate - non - neutral - - | 30 | 100_hate_non_neutral_ |
|
143 |
+
| 101 | - - - - | 30 | 101____ |
|
144 |
+
| 102 | unk - zero - seven - 10 - blank | 30 | 102_unk_zero_seven_10 |
|
145 |
+
| 103 | male - female - young - adult - skin | 30 | 103_male_female_young_adult |
|
146 |
+
| 104 | 94 - 59 60 - 49 50 - 81 - 97 | 29 | 104_94_59 60_49 50_81 |
|
147 |
+
| 105 | normal - cell - large - clean - lower | 29 | 105_normal_cell_large_clean |
|
148 |
+
| 106 | lincoln - jaguar - audio - source - general | 28 | 106_lincoln_jaguar_audio_source |
|
149 |
+
| 107 | title - section - header - list - item | 28 | 107_title_section_header_list |
|
150 |
+
| 108 | - - - - | 28 | 108____ |
|
151 |
+
| 109 | yes - - - - | 27 | 109_yes___ |
|
152 |
+
| 110 | - - - - | 26 | 110____ |
|
153 |
+
| 111 | contradiction - entailment - neutral - non - | 26 | 111_contradiction_entailment_neutral_non |
|
154 |
+
| 112 | instrument - org org - org org org - term - org | 26 | 112_instrument_org org_org org org_term |
|
155 |
+
| 113 | ft - cardinal - act - loc - loc loc | 25 | 113_ft_cardinal_act_loc |
|
156 |
+
| 114 | event - pro - pers - loc org - prod | 25 | 114_event_pro_pers_loc org |
|
157 |
+
| 115 | ben - ext - exp - root - loc | 25 | 115_ben_ext_exp_root |
|
158 |
+
| 116 | - - - - | 25 | 116____ |
|
159 |
+
| 117 | low - - - - | 25 | 117_low___ |
|
160 |
+
| 118 | ft - cardinal - act - loc - loc misc org | 25 | 118_ft_cardinal_act_loc |
|
161 |
+
| 119 | statement - question - evidence - experience - answer | 25 | 119_statement_question_evidence_experience |
|
162 |
+
| 120 | label_122 - label_121 - label_120 - label_123 - label_119 | 24 | 120_label_122_label_121_label_120_label_123 |
|
163 |
+
| 121 | clean - - - - | 24 | 121_clean___ |
|
164 |
+
| 122 | ru - tr - el - en - hi | 24 | 122_ru_tr_el_en |
|
165 |
+
| 123 | disgust - sadness surprise - joy love - surprise - joy | 24 | 123_disgust_sadness surprise_joy love_surprise |
|
166 |
+
| 124 | statement - info - check - news - non | 24 | 124_statement_info_check_news |
|
167 |
+
| 125 | motor - start - help - housing - yes | 24 | 125_motor_start_help_housing |
|
168 |
+
| 126 | greek - chinese - italian - japanese - dutch | 24 | 126_greek_chinese_italian_japanese |
|
169 |
+
| 127 | anger disgust - fear - disgust - sadness - anger | 23 | 127_anger disgust_fear_disgust_sadness |
|
170 |
+
| 128 | date event - percent person - quantity - money - percent | 23 | 128_date event_percent person_quantity_money |
|
171 |
+
| 129 | label_95 label_96 label_97 - label_97 label_98 label_99 - label_97 label_98 - label_94 label_95 label_96 - label_94 label_95 | 23 | 129_label_95 label_96 label_97_label_97 label_98 label_99_label_97 label_98_label_94 label_95 label_96 |
|
172 |
+
| 130 | period - question - noun - number - | 23 | 130_period_question_noun_number |
|
173 |
+
| 131 | neutral - - - - | 22 | 131_neutral___ |
|
174 |
+
| 132 | local - la - pad - data - personal | 22 | 132_local_la_pad_data |
|
175 |
+
| 133 | partial - - - - | 22 | 133_partial___ |
|
176 |
+
| 134 | human - art - machine - - | 22 | 134_human_art_machine_ |
|
177 |
+
| 135 | fear joy - sadness surprise - surprise - disgust fear - joy | 21 | 135_fear joy_sadness surprise_surprise_disgust fear |
|
178 |
+
| 136 | location organization - organization person - organization - price - disease | 21 | 136_location organization_organization person_organization_price |
|
179 |
+
| 137 | 14 15 16 - 12 13 14 - 13 14 15 - 11 12 13 - 10 11 12 | 21 | 137_14 15 16_12 13 14_13 14 15_11 12 13 |
|
180 |
+
| 138 | sports - tech - business - sport - | 21 | 138_sports_tech_business_sport |
|
181 |
+
| 139 | disorder - body - patient - age - disease | 20 | 139_disorder_body_patient_age |
|
182 |
+
| 140 | sad - dis - sur - joy - | 20 | 140_sad_dis_sur_joy |
|
183 |
+
| 141 | healthy - - - - | 20 | 141_healthy___ |
|
184 |
+
| 142 | drink - tea - wine - coffee - soft | 20 | 142_drink_tea_wine_coffee |
|
185 |
+
| 143 | protein - chemical - cell - - | 20 | 143_protein_chemical_cell_ |
|
186 |
+
| 144 | rna - - - - | 20 | 144_rna___ |
|
187 |
+
| 145 | normal - covid - - - | 20 | 145_normal_covid__ |
|
188 |
+
| 146 | ex - pt - - - | 20 | 146_ex_pt__ |
|
189 |
+
| 147 | ok - ft - year - int - rel | 20 | 147_ok_ft_year_int |
|
190 |
+
| 148 | header - currency - item - zip - state | 20 | 148_header_currency_item_zip |
|
191 |
+
| 149 | label_122 label_123 - label_123 - label_122 - label_121 - label_120 | 19 | 149_label_122 label_123_label_123_label_122_label_121 |
|
192 |
+
| 150 | anger disgust - anger disgust fear - disgust fear - disgust - sadness surprise | 19 | 150_anger disgust_anger disgust fear_disgust fear_disgust |
|
193 |
+
| 151 | na - nn - ft - dis - bio | 19 | 151_na_nn_ft_dis |
|
194 |
+
| 152 | angry - happy - sad - happy neutral - neutral | 19 | 152_angry_happy_sad_happy neutral |
|
195 |
+
| 153 | organization percent person - organization percent - miscellaneous - percent person - percent | 19 | 153_organization percent person_organization percent_miscellaneous_percent person |
|
196 |
+
| 154 | paper - metal - glass - tray - ticket | 19 | 154_paper_metal_glass_tray |
|
197 |
+
| 155 | mask - normal - sharp - head - green | 19 | 155_mask_normal_sharp_head |
|
198 |
+
| 156 | noun num pron - num pron propn - pron propn punct - num pron - adj adp adv | 18 | 156_noun num pron_num pron propn_pron propn punct_num pron |
|
199 |
+
| 157 | answer - - - - | 18 | 157_answer___ |
|
200 |
+
| 158 | review - id - job - email - state | 18 | 158_review_id_job_email |
|
201 |
+
| 159 | seven - queen - jack - king - war | 18 | 159_seven_queen_jack_king |
|
202 |
+
| 160 | neg - nan - good - - | 18 | 160_neg_nan_good_ |
|
203 |
+
| 161 | ii - blank - vi - et - lower | 18 | 161_ii_blank_vi_et |
|
204 |
+
| 162 | golden - husky - samoyed - pug - german | 17 | 162_golden_husky_samoyed_pug |
|
205 |
+
| 163 | arg - delete - act - neg - lead | 17 | 163_arg_delete_act_neg |
|
206 |
+
| 164 | exp - pp - intj - punc - prep | 17 | 164_exp_pp_intj_punc |
|
207 |
+
| 165 | email - form - letter - report - news | 17 | 165_email_form_letter_report |
|
208 |
+
| 166 | protein - rna - cell - line - type | 17 | 166_protein_rna_cell_line |
|
209 |
+
| 167 | en - hi - fur - - | 17 | 167_en_hi_fur_ |
|
210 |
+
| 168 | - - - - | 17 | 168____ |
|
211 |
+
| 169 | - - - - | 17 | 169____ |
|
212 |
+
| 170 | loc loc - loc - pers - evt - | 16 | 170_loc loc_loc_pers_evt |
|
213 |
+
| 171 | menu - - - - | 16 | 171_menu___ |
|
214 |
+
| 172 | normal - - - - | 16 | 172_normal___ |
|
215 |
+
| 173 | label_122 label_123 - label_97 label_98 label_99 - label_97 label_98 - label_96 label_97 label_98 - label_98 label_99 | 16 | 173_label_122 label_123_label_97 label_98 label_99_label_97 label_98_label_96 label_97 label_98 |
|
216 |
+
| 174 | cell - organ - organism - tissue - disease | 16 | 174_cell_organ_organism_tissue |
|
217 |
+
| 175 | target - instrument - opinion - price - product | 16 | 175_target_instrument_opinion_price |
|
218 |
+
| 176 | org org - org org org - loc loc - org - prs | 16 | 176_org org_org org org_loc loc_org |
|
219 |
+
| 177 | 10 11 - 10 11 12 - 11 12 - 12 - 11 | 16 | 177_10 11_10 11 12_11 12_12 |
|
220 |
+
| 178 | korean - russian - dutch - persian - french | 16 | 178_korean_russian_dutch_persian |
|
221 |
+
| 179 | label_4 label_40 label_41 - label_39 label_4 label_40 - label_38 label_39 label_4 - label_37 label_38 label_39 - label_40 label_41 | 16 | 179_label_4 label_40 label_41_label_39 label_4 label_40_label_38 label_39 label_4_label_37 label_38 label_39 |
|
222 |
+
| 180 | experience - location - loc misc org - loc misc - misc org | 15 | 180_experience_location_loc misc org_loc misc |
|
223 |
+
| 181 | normal - pressure - high - water - | 15 | 181_normal_pressure_high_water |
|
224 |
+
| 182 | company - institution - loc org - degree - org | 15 | 182_company_institution_loc org_degree |
|
225 |
+
| 183 | short - sl - long - - | 15 | 183_short_sl_long_ |
|
226 |
+
| 184 | good - bad - non - - | 15 | 184_good_bad_non_ |
|
227 |
+
| 185 | 149 - 151 - 191 - 199 - 231 | 15 | 185_149_151_191_199 |
|
228 |
+
| 186 | unknown - vi - ii - - | 15 | 186_unknown_vi_ii_ |
|
229 |
+
| 187 | end - head - cross - - | 15 | 187_end_head_cross_ |
|
230 |
+
| 188 | forest - street - road - tree - mountain | 15 | 188_forest_street_road_tree |
|
231 |
+
| 189 | label_7 label_8 label_9 - label_8 label_9 - label_0 label_1 label_10 - label_1 label_10 - label_10 | 14 | 189_label_7 label_8 label_9_label_8 label_9_label_0 label_1 label_10_label_1 label_10 |
|
232 |
+
| 190 | prod - loc - evt - org org - loc loc | 14 | 190_prod_loc_evt_org org |
|
233 |
+
| 191 | tech - business - sports - science - female | 14 | 191_tech_business_sports_science |
|
234 |
+
| 192 | adult - child - young - - | 14 | 192_adult_child_young_ |
|
235 |
+
| 193 | human - organism - plants - - | 14 | 193_human_organism_plants_ |
|
236 |
+
| 194 | hot dog - chicken - hot - food - dog | 14 | 194_hot dog_chicken_hot_food |
|
237 |
+
| 195 | rain - snow - - - | 14 | 195_rain_snow__ |
|
238 |
+
| 196 | objective - neutral - - - | 14 | 196_objective_neutral__ |
|
239 |
+
| 197 | pro - neutral - russian - attack - | 14 | 197_pro_neutral_russian_attack |
|
240 |
+
| 198 | normal - disorder - good - - | 14 | 198_normal_disorder_good_ |
|
241 |
+
| 199 | road - good - bike - - | 14 | 199_road_good_bike_ |
|
242 |
+
| 200 | - - - - | 14 | 200____ |
|
243 |
+
| 201 | science - energy - arts - nuclear - systems | 13 | 201_science_energy_arts_nuclear |
|
244 |
+
| 202 | - - - - | 13 | 202____ |
|
245 |
+
| 203 | event - ticket - ok - loose - non | 13 | 203_event_ticket_ok_loose |
|
246 |
+
| 204 | neutral - left - right - unknown - | 13 | 204_neutral_left_right_unknown |
|
247 |
+
| 205 | - - - - | 13 | 205____ |
|
248 |
+
| 206 | crime - pers - time - book - org | 13 | 206_crime_pers_time_book |
|
249 |
+
| 207 | seven - start - record - zero - open | 13 | 207_seven_start_record_zero |
|
250 |
+
| 208 | label_5 label_50 label_51 - label_50 label_51 label_52 - label_51 label_52 label_53 - label_51 label_52 - label_50 label_51 | 13 | 208_label_5 label_50 label_51_label_50 label_51 label_52_label_51 label_52 label_53_label_51 label_52 |
|
251 |
+
| 209 | label_29 label_3 label_30 - label_26 label_27 label_28 - label_27 label_28 label_29 - label_27 label_28 - label_28 label_29 label_3 | 13 | 209_label_29 label_3 label_30_label_26 label_27 label_28_label_27 label_28 label_29_label_27 label_28 |
|
252 |
+
| 210 | human - machine - - - | 13 | 210_human_machine__ |
|
253 |
+
| 211 | control - la - sin - social - ambient | 13 | 211_control_la_sin_social |
|
254 |
+
| 212 | anger fear - sadness - anger - fear - fear joy | 13 | 212_anger fear_sadness_anger_fear |
|
255 |
+
| 213 | panda - ticket - air - bamboo - el | 13 | 213_panda_ticket_air_bamboo |
|
256 |
+
| 214 | target - - - - | 13 | 214_target___ |
|
257 |
+
| 215 | id - container - type - person - number | 12 | 215_id_container_type_person |
|
258 |
+
| 216 | neutral - positive - negative - neutral positive - positive negative | 12 | 216_neutral_positive_negative_neutral positive |
|
259 |
+
| 217 | change - bad - movement - work - science | 12 | 217_change_bad_movement_work |
|
260 |
+
| 218 | rust - - - - | 12 | 218_rust___ |
|
261 |
+
| 219 | quantity - container - package - id - weight | 12 | 219_quantity_container_package_id |
|
262 |
+
| 220 | text - - - - | 12 | 220_text___ |
|
263 |
+
| 221 | background - objective - - - | 12 | 221_background_objective__ |
|
264 |
+
| 222 | middle - subject - yes - request - answer | 12 | 222_middle_subject_yes_request |
|
265 |
+
| 223 | - - - - | 12 | 223____ |
|
266 |
+
| 224 | public - ambiguous - non - person - | 12 | 224_public_ambiguous_non_person |
|
267 |
+
| 225 | healthy - plant - pepper - spot - leaf | 12 | 225_healthy_plant_pepper_spot |
|
268 |
+
| 226 | punc - prep - digit - latin - conj | 12 | 226_punc_prep_digit_latin |
|
269 |
+
| 227 | location money - language - percent person - actor - money | 12 | 227_location money_language_percent person_actor |
|
270 |
+
| 228 | - - - - | 11 | 228____ |
|
271 |
+
| 229 | punc - zero - pers - neg - reflex | 11 | 229_punc_zero_pers_neg |
|
272 |
+
| 230 | album - major - copper - coon - common | 11 | 230_album_major_copper_coon |
|
273 |
+
| 231 | metal - pop - country - dance - hip | 11 | 231_metal_pop_country_dance |
|
274 |
+
| 232 | energy - common - grass - persian - removal | 11 | 232_energy_common_grass_persian |
|
275 |
+
| 233 | man - double - bird - long - single | 11 | 233_man_double_bird_long |
|
276 |
+
| 234 | 17 - 16 - 18 - 13 - 15 | 11 | 234_17_16_18_13 |
|
277 |
+
| 235 | email - actor - threat - tools - attack | 11 | 235_email_actor_threat_tools |
|
278 |
+
| 236 | space - - - - | 11 | 236_space___ |
|
279 |
+
| 237 | type - country - jeep - van - lincoln | 11 | 237_type_country_jeep_van |
|
280 |
+
| 238 | general - - - - | 10 | 238_general___ |
|
281 |
+
| 239 | ru - mat - - - | 10 | 239_ru_mat__ |
|
282 |
+
| 240 | contradiction - non - entailment - neutral - | 10 | 240_contradiction_non_entailment_neutral |
|
283 |
+
| 241 | city - new - country - location - label_1 | 10 | 241_city_new_country_location |
|
284 |
+
| 242 | non - legal - sub - - | 9 | 242_non_legal_sub_ |
|
285 |
+
| 243 | tulip - cattle - motorcycle - road - color | 8 | 243_tulip_cattle_motorcycle_road |
|
286 |
+
| 244 | item - color - cc - model - | 8 | 244_item_color_cc_model |
|
287 |
+
| 245 | delivery - product - service - different - environment | 7 | 245_delivery_product_service_different |
|
288 |
+
| 246 | degree - tim - neg - pos - propn | 6 | 246_degree_tim_neg_pos |
|
289 |
+
| 247 | threat - hate - non - unknown - neutral | 6 | 247_threat_hate_non_unknown |
|
290 |
+
| 248 | label_33 label_34 - label_32 label_33 label_34 - label_32 label_33 - label_31 label_32 label_33 - label_31 label_32 | 6 | 248_label_33 label_34_label_32 label_33 label_34_label_32 label_33_label_31 label_32 label_33 |
|
291 |
+
| 249 | experience - location - - - | 6 | 249_experience_location__ |
|
292 |
+
| 250 | nat - gpe - geo - pro - tim | 5 | 250_nat_gpe_geo_pro |
|
293 |
+
|
294 |
+
</details>
|
295 |
+
|
296 |
+
## Training hyperparameters
|
297 |
+
|
298 |
+
* calculate_probabilities: False
|
299 |
+
* language: None
|
300 |
+
* low_memory: False
|
301 |
+
* min_topic_size: 10
|
302 |
+
* n_gram_range: (1, 1)
|
303 |
+
* nr_topics: None
|
304 |
+
* seed_topic_list: None
|
305 |
+
* top_n_words: 10
|
306 |
+
* verbose: True
|
307 |
+
|
308 |
+
## Framework versions
|
309 |
+
|
310 |
+
* Numpy: 1.22.4
|
311 |
+
* HDBSCAN: 0.8.29
|
312 |
+
* UMAP: 0.5.3
|
313 |
+
* Pandas: 1.5.3
|
314 |
+
* Scikit-Learn: 1.2.2
|
315 |
+
* Sentence-transformers: 2.2.2
|
316 |
+
* Transformers: 4.29.2
|
317 |
+
* Numba: 0.56.4
|
318 |
+
* Plotly: 5.13.1
|
319 |
+
* Python: 3.10.11
|
config.json
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"calculate_probabilities": false,
|
3 |
+
"language": null,
|
4 |
+
"low_memory": false,
|
5 |
+
"min_topic_size": 10,
|
6 |
+
"n_gram_range": [
|
7 |
+
1,
|
8 |
+
1
|
9 |
+
],
|
10 |
+
"nr_topics": null,
|
11 |
+
"seed_topic_list": null,
|
12 |
+
"top_n_words": 10,
|
13 |
+
"verbose": true
|
14 |
+
}
|
topic_embeddings.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a37ff8eb1f305336409db7b4776d9b5417dc6e510f420f25f3fbde37f502dc9d
|
3 |
+
size 774232
|
topics.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|