label_model_merged / README.md
davanstrien's picture
davanstrien HF staff
Add BERTopic model
4dd3b94
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

label_model_merged

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("davanstrien/label_model_merged")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 247
  • Number of training documents: 14986
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 pre - roll - heavy - farm - health 5 -1_pre_roll_heavy_farm
0 label_1 label_2 - label_0 label_1 label_2 - label_1 - label_0 label_1 - label_2 1386 0_label_1 label_2_label_0 label_1 label_2_label_1_label_0 label_1
1 label_1 label_2 label_3 - label_3 label_4 label_5 - label_4 label_5 - label_2 label_3 label_4 - label_5 1042 1_label_1 label_2 label_3_label_3 label_4 label_5_label_4 label_5_label_2 label_3 label_4
2 negative positive - positive negative - negative - positive - target 803 2_negative positive_positive negative_negative_positive
3 loc misc org - misc org - loc misc - misc - org loc 652 3_loc misc org_misc org_loc misc_misc
4 neutral positive - neutral - positive negative - negative - positive 509 4_neutral positive_neutral_positive negative_negative
5 label_0 - country - city - label_1 - label_0 label_1 357 5_label_0_country_city_label_1
6 contradiction - entailment - neutral - - 351 6_contradiction_entailment_neutral_
7 label_0 - positive - - - 335 7_label_0_positive__
8 99 - - - - 327 8_99___
9 label_1 label_2 label_3 - label_2 label_3 label_4 - label_3 label_4 - label_2 label_3 - label_4 302 9_label_1 label_2 label_3_label_2 label_3 label_4_label_3 label_4_label_2 label_3
10 entailment - true - child - related - non 257 10_entailment_true_child_related
11 terrier - snake - dog - bear - wolf 245 11_terrier_snake_dog_bear
12 loc misc org - loc misc - misc org - misc - org loc 240 12_loc misc org_loc misc_misc org_misc
13 label_5 label_6 label_7 - label_6 label_7 - label_4 label_5 label_6 - label_5 label_6 - label_7 231 13_label_5 label_6 label_7_label_6 label_7_label_4 label_5 label_6_label_5 label_6
14 calendar - greeting - weather - transfer - calculator 229 14_calendar_greeting_weather_transfer
15 label_1 label_2 label_3 - label_2 label_3 - label_3 - label_1 label_2 - label_0 label_1 label_2 226 15_label_1 label_2 label_3_label_2 label_3_label_3_label_1 label_2
16 delete - unrelated - bad - related - rel 207 16_delete_unrelated_bad_related
17 label_12 label_13 label_14 - label_11 label_12 label_13 - label_13 label_14 - label_12 label_13 - label_10 label_11 label_12 172 17_label_12 label_13 label_14_label_11 label_12 label_13_label_13 label_14_label_12 label_13
18 loc org - org loc - org - loc - loc loc 166 18_loc org_org loc_org_loc
19 left - right - stop - yes - zero 130 19_left_right_stop_yes
20 label_6 label_60 label_61 - label_60 label_61 - label_60 label_61 label_62 - label_62 label_63 - label_59 label_6 label_60 123 20_label_6 label_60 label_61_label_60 label_61_label_60 label_61 label_62_label_62 label_63
21 unrelated - - - - 117 21_unrelated___
22 forest - industrial - river - transport - disaster 110 22_forest_industrial_river_transport
23 label_4 label_5 label_6 - label_5 label_6 - label_6 - label_1 label_2 label_3 - label_3 label_4 label_5 107 23_label_4 label_5 label_6_label_5 label_6_label_6_label_1 label_2 label_3
24 question - quantity - - - 106 24_question_quantity__
25 healthy - leaf - rust - plant - mildew 103 25_healthy_leaf_rust_plant
26 disease - blood - bio - healthy - sexual 100 26_disease_blood_bio_healthy
27 work - group - corporation - person product - product 92 27_work_group_corporation_person product
28 surprise anger - sadness surprise - fear joy - anger fear - joy love 80 28_surprise anger_sadness surprise_fear joy_anger fear
29 duplicate - common - non - - 78 29_duplicate_common_non_
30 steak - hamburger - restaurant - pizza - joint 76 30_steak_hamburger_restaurant_pizza
31 room - service - transport - product - forest 74 31_room_service_transport_product
32 dis - - - - 74 32_dis___
33 - - - - 73 33____
34 loc org - org - date - loc - set 70 34_loc org_org_date_loc
35 label_17 label_18 label_19 - label_18 label_19 - label_18 label_19 label_2 - label_19 label_2 - label_16 label_17 label_18 70 35_label_17 label_18 label_19_label_18 label_19_label_18 label_19 label_2_label_19 label_2
36 03 - 02 - second - - 65 36_03_02_second_
37 anger fear - joy love - surprise - joy - love 65 37_anger fear_joy love_surprise_joy
38 real - true - image - news - 64 38_real_true_image_news
39 - - - - 63 39____
40 pos - neg - neu - - 62 40_pos_neg_neu_
41 45 - 30 - 55 - 35 - 10 61 41_45_30_55_35
42 ge - wifi - na - alpha - fan 61 42_ge_wifi_na_alpha
43 label_1 label_10 label_11 - label_10 label_11 - label_8 label_9 label_0 - label_9 label_0 label_1 - label_9 label_0 61 43_label_1 label_10 label_11_label_10 label_11_label_8 label_9 label_0_label_9 label_0 label_1
44 event - group - corporation - person product - product 61 44_event_group_corporation_person product
45 label_19 label_2 label_20 - label_2 label_20 - label_20 - label_18 label_19 label_2 - label_18 label_19 60 45_label_19 label_2 label_20_label_2 label_20_label_20_label_18 label_19 label_2
46 fear happy - sad - happy - disgust fear - angry 58 46_fear happy_sad_happy_disgust fear
47 battery - volume - chinese - juice - socks 58 47_battery_volume_chinese_juice
48 prep - nn - bio - cc - pro 56 48_prep_nn_bio_cc
49 good - poor - ok - great - bad 56 49_good_poor_ok_great
50 date - city - fur - day - ar 54 50_date_city_fur_day
51 15 - 18 19 20 - 19 20 - 17 18 19 - 18 19 54 51_15_18 19 20_19 20_17 18 19
52 menu - price - num - - 52 52_menu_price_num_
53 common - fat - loose - small - sugar 52 53_common_fat_loose_small
54 append_ - replace_ - append_ append_ - replace_ replace_ - append_ append_ append_ 49 54_append__replace__append_ append__replace_ replace_
55 append_ - replace_ - append_ append_ - replace_ replace_ - append_ append_ append_ 48 55_append__replace__append_ append__replace_ replace_
56 animals - flying - tech - dance - tiger 48 56_animals_flying_tech_dance
57 self - question - neutral - yes - greeting 47 57_self_question_neutral_yes
58 mt - cv - tr - tm - drug 47 58_mt_cv_tr_tm
59 organization person - location organization - organization - location - person 46 59_organization person_location organization_organization_location
60 - - - - 45 60____
61 joy - anger - sadness - sad - happy 44 61_joy_anger_sadness_sad
62 daisy - tulip - rose - - 43 62_daisy_tulip_rose_
63 positive - negative - neutral - neutral positive - positive negative 42 63_positive_negative_neutral_neutral positive
64 windows - pm - 21 - office - 20 42 64_windows_pm_21_office
65 label_14 label_15 - label_13 label_14 label_15 - label_15 - label_12 label_13 label_14 - label_11 label_12 label_13 42 65_label_14 label_15_label_13 label_14 label_15_label_15_label_12 label_13 label_14
66 position - statement - lead - request - study 42 66_position_statement_lead_request
67 business - news - entertainment - tech - sport 41 67_business_news_entertainment_tech
68 hate - speech - language - reporting - non 41 68_hate_speech_language_reporting
69 bd - nan - id - bg - 41 69_bd_nan_id_bg
70 cream - burger - carrot - ice cream - salad 41 70_cream_burger_carrot_ice cream
71 human - machine - ai - artificial - art 40 71_human_machine_ai_artificial
72 open - high - tie - abstract - button 40 72_open_high_tie_abstract
73 label_23 label_24 label_25 - label_24 label_25 - label_22 label_23 label_24 - label_23 label_24 - label_21 label_22 label_23 40 73_label_23 label_24 label_25_label_24 label_25_label_22 label_23 label_24_label_23 label_24
74 label_8 label_9 label_0 - label_9 label_0 label_1 - label_9 label_0 - label_7 label_8 label_9 - label_8 label_9 39 74_label_8 label_9 label_0_label_9 label_0 label_1_label_9 label_0_label_7 label_8 label_9
75 cat - dog - cats - dogs - drinking 39 75_cat_dog_cats_dogs
76 org org - loc loc - org - misc - loc 38 76_org org_loc loc_org_misc
77 airplane - deer - bird - ship - frog 38 77_airplane_deer_bird_ship
78 label_32 label_33 label_34 - label_33 label_34 - label_31 label_32 label_33 - label_32 label_33 - label_30 label_31 label_32 38 78_label_32 label_33 label_34_label_33 label_34_label_31 label_32 label_33_label_32 label_33
79 true - - - - 38 79_true___
80 family - sports - music - related - health 38 80_family_sports_music_related
81 star - positive - negative - amazon - negative positive 37 81_star_positive_negative_amazon
82 hospital - unknown - description - material - pad 37 82_hospital_unknown_description_material
83 threat - hate - reward - quality - content 36 83_threat_hate_reward_quality
84 music - speech - instrument - engine - wind 35 84_music_speech_instrument_engine
85 closure - annual - statement - issues - reward 35 85_closure_annual_statement_issues
86 adp - aux - sconj - pron - noun 35 86_adp_aux_sconj_pron
87 experience - location - skill - address - result 35 87_experience_location_skill_address
88 - - - - 34 88____
89 test - train - risk - non - high 34 89_test_train_risk_non
90 samoyed - corgi - husky - golden retriever - golden 34 90_samoyed_corgi_husky_golden retriever
91 unk - zero - 10 - 12 13 14 - 13 14 15 33 91_unk_zero_10_12 13 14
92 non - neutral - ok - lead - 33 92_non_neutral_ok_lead
93 normal - covid - virus - regular - disorder 33 93_normal_covid_virus_regular
94 test - help - app - risk - joke 32 94_test_help_app_risk
95 replace_ - append_ - replace_ replace_ - append_ append_ - replace_ replace_ replace_ 32 95_replace__append__replace_ replace__append_ append_
96 disease - issues - pressure - drug - blood 31 96_disease_issues_pressure_drug
97 women - casual - sexual - individual - use 31 97_women_casual_sexual_individual
98 address - balance - code - second - currency 30 98_address_balance_code_second
99 hate - non - neutral - - 30 99_hate_non_neutral_
100 normal - cell - large - clean - healthy 29 100_normal_cell_large_clean
101 neutral - se - - - 29 101_neutral_se__
102 male - female - hair - skin - men 29 102_male_female_hair_skin
103 title - page - section - abstract - table 28 103_title_page_section_abstract
104 number - gender - case - person - fin 28 104_number_gender_case_person
105 man - bird - flower - long - double 28 105_man_bird_flower_long
106 contradiction - entailment - neutral - - 28 106_contradiction_entailment_neutral_
107 non - - - - 27 107_non___
108 tim - fac - org - pro - loc 27 108_tim_fac_org_pro
109 lincoln - jaguar - visual - audio - sony 27 109_lincoln_jaguar_visual_audio
110 statement - info - check - ad - news 27 110_statement_info_check_ad
111 ben - ext - root - exp - loc 26 111_ben_ext_root_exp
112 yes - - - - 26 112_yes___
113 queen - jack - king - south - war 26 113_queen_jack_king_south
114 - - - - 26 114____
115 ft - cardinal - act - loc - loc loc 25 115_ft_cardinal_act_loc
116 bio - chemical - food - - 25 116_bio_chemical_food_
117 ft - cardinal - act - loc - loc misc org 25 117_ft_cardinal_act_loc
118 metric - task - - - 25 118_metric_task__
119 email - age - patient - zip - organization 25 119_email_age_patient_zip
120 ent - im - ru - mat - art 25 120_ent_im_ru_mat
121 ex - pt - galaxy - moon - 8888 24 121_ex_pt_galaxy_moon
122 - - - - 24 122____
123 neu - sad - dis - joy - 24 123_neu_sad_dis_joy
124 label_122 - label_121 - label_120 - label_123 - label_119 24 124_label_122_label_121_label_120_label_123
125 mixed - positive - negative - neutral positive - neutral 24 125_mixed_positive_negative_neutral positive
126 date event - percent person - quantity - money - percent 24 126_date event_percent person_quantity_money
127 fear joy - sadness surprise - surprise - joy - sadness 24 127_fear joy_sadness surprise_surprise_joy
128 disgust - sadness surprise - joy love - surprise - joy 24 128_disgust_sadness surprise_joy love_surprise
129 magnet - motor - hello - undefined - start 24 129_magnet_motor_hello_undefined
130 loc loc - loc - pers - hi - en 24 130_loc loc_loc_pers_hi
131 event - pers - fac - pro - loc org 24 131_event_pers_fac_pro
132 disorder - body - patient - age - disease 23 132_disorder_body_patient_age
133 happiness - fear - anger disgust - disgust - sadness 23 133_happiness_fear_anger disgust_disgust
134 control - la - social - sin - civil 23 134_control_la_social_sin
135 label_98 label_99 - label_97 label_98 label_99 - label_97 label_98 - label_95 label_96 - label_96 label_97 label_98 23 135_label_98 label_99_label_97 label_98 label_99_label_97 label_98_label_95 label_96
136 greek - chinese - italian - japanese - dutch 23 136_greek_chinese_italian_japanese
137 clean - - - - 23 137_clean___
138 protein - chemical - cell - - 22 138_protein_chemical_cell_
139 treatment - disease - location organization - organization person - organization 22 139_treatment_disease_location organization_organization person
140 institution - tools - org - loc - organization 22 140_institution_tools_org_loc
141 statement - question - - - 22 141_statement_question__
142 period - question - noun - number - 21 142_period_question_noun_number
143 regular - - - - 21 143_regular___
144 rna - - - - 21 144_rna___
145 rs - - - - 21 145_rs___
146 address - id - job - email - country 21 146_address_id_job_email
147 neg - neu - good - - 21 147_neg_neu_good_
148 label_122 label_123 - label_123 - label_122 - label_121 - label_120 20 148_label_122 label_123_label_123_label_122_label_121
149 drink - tea - wine - coffee - soft 20 149_drink_tea_wine_coffee
150 miscellaneous - organization - percent - money - percent person 20 150_miscellaneous_organization_percent_money
151 description - invoice - zip - state - city 20 151_description_invoice_zip_state
152 sports - tech - business - sport - 20 152_sports_tech_business_sport
153 ok - vin - rl - ft - year 20 153_ok_vin_rl_ft
154 healthy - - - - 20 154_healthy___
155 association - event - ticket - disaster - map 20 155_association_event_ticket_disaster
156 10 11 - 10 11 12 - 11 12 - 11 - 11 12 13 19 156_10 11_10 11 12_11 12_11
157 noun num pron - num pron propn - num pron - pron propn punct - pron propn 19 157_noun num pron_num pron propn_num pron_pron propn punct
158 cell - organ - organism - multi - tissue 18 158_cell_organ_organism_multi
159 02 - ent - express - act - delete 18 159_02_ent_express_act
160 sym verb adj - verb adj adp - intj noun num - det intj noun - adj adp adv 18 160_sym verb adj_verb adj adp_intj noun num_det intj noun
161 12 - - - - 18 161_12___
162 org org - org - drug - - 18 162_org org_org_drug_
163 short - long - sl - ac - pad 18 163_short_long_sl_ac
164 plastic - paper - glass - metal - sheet 18 164_plastic_paper_glass_metal
165 ii - blank - iii - vi - et 17 165_ii_blank_iii_vi
166 normal - virus - desert - smoke - pressure 17 166_normal_virus_desert_smoke
167 skill - skills - - - 17 167_skill_skills__
168 protein - rna - cell - line - type 17 168_protein_rna_cell_line
169 korean - russian - dutch - french - thai 17 169_korean_russian_dutch_french
170 rainbow - rain - snow - color - green 17 170_rainbow_rain_snow_color
171 company - role - institution - skill - loc org 16 171_company_role_institution_skill
172 exp - pp - intj - punc - prep 16 172_exp_pp_intj_punc
173 key - menu - - - 16 173_key_menu__
174 adult - young - child - male - female 16 174_adult_young_child_male
175 normal - - - - 16 175_normal___
176 mask - bright - sharp - head - normal 16 176_mask_bright_sharp_head
177 anger disgust fear - anger disgust - disgust fear - disgust - surprise anger 16 177_anger disgust fear_anger disgust_disgust fear_disgust
178 - - - - 16 178____
179 objective - non - neutral - - 16 179_objective_non_neutral_
180 cr - sd - db - - 16 180_cr_sd_db_
181 - - - - 16 181____
182 label_29 label_3 label_30 - label_27 label_28 label_29 - label_26 label_27 label_28 - label_28 label_29 label_3 - label_29 label_3 16 182_label_29 label_3 label_30_label_27 label_28 label_29_label_26 label_27 label_28_label_28 label_29 label_3
183 test - - - - 15 183_test___
184 good - bad - non - - 15 184_good_bad_non_
185 local - por - art - da - em 15 185_local_por_art_da
186 label_122 label_123 - label_97 label_98 label_99 - label_97 label_98 - label_96 label_97 label_98 - label_98 label_99 15 186_label_122 label_123_label_97 label_98 label_99_label_97 label_98_label_96 label_97 label_98
187 prod - loc - evt - misc - org org 15 187_prod_loc_evt_misc
188 invoice - email - form - letter - report 15 188_invoice_email_form_letter
189 end - head - cross - - 15 189_end_head_cross_
190 target - instrument - opinion - question - price 15 190_target_instrument_opinion_question
191 unrelated - support - - - 15 191_unrelated_support__
192 ru - pl - bg - en - es 14 192_ru_pl_bg_en
193 road - good - bike - - 14 193_road_good_bike_
194 human - organism - plants - - 14 194_human_organism_plants_
195 label_7 label_8 label_9 - label_8 label_9 - label_0 label_1 label_10 - label_1 label_10 - label_10 14 195_label_7 label_8 label_9_label_8 label_9_label_0 label_1 label_10_label_1 label_10
196 replace_ - append_ - replace_ replace_ - append_ append_ - replace_ replace_ replace_ 13 196_replace__append__replace_ replace__append_ append_
197 brand - company - tm - color - item 13 197_brand_company_tm_color
198 pro - neutral - russian - support - attack 13 198_pro_neutral_russian_support
199 18 19 20 - 19 20 - 23 - 17 18 19 - 21 13 199_18 19 20_19 20_23_17 18 19
200 crime - pers - time - book - day 13 200_crime_pers_time_book
201 neutral - positive - negative - positive negative - neutral positive 13 201_neutral_positive_negative_positive negative
202 - - - - 13 202____
203 chemical - disease - bio - - 13 203_chemical_disease_bio_
204 angry - happy - sad - neutral - 60 12 204_angry_happy_sad_neutral
205 organisation - task - country - location - product 12 205_organisation_task_country_location
206 iv - iii - vi - ii - unknown 12 206_iv_iii_vi_ii
207 neutral - risk - - - 12 207_neutral_risk__
208 container - id - type - person - number 12 208_container_id_type_person
209 target - - - - 12 209_target___
210 pop - metal - country - song - rock 12 210_pop_metal_country_song
211 email - os - language - method - function 12 211_email_os_language_method
212 contradiction - non - entailment - - 12 212_contradiction_non_entailment_
213 background - objective - method - result - 12 213_background_objective_method_result
214 convertible - cab - type - series - martin 12 214_convertible_cab_type_series
215 public - smoking - drinking - ambiguous - non 12 215_public_smoking_drinking_ambiguous
216 rust - - - - 12 216_rust___
217 persian - mr - man - flying - ghost 12 217_persian_mr_man_flying
218 quote - yes - middle - request - 12 218_quote_yes_middle_request
219 text - mixed - - - 12 219_text_mixed__
220 punc - prep - digit - latin - conj 12 220_punc_prep_digit_latin
221 panda - air - mr - ticket - little 12 221_panda_air_mr_ticket
222 - - - - 12 222____
223 sym verb adj - intj noun num - verb adj adp - cconj det intj - aux cconj det 12 223_sym verb adj_intj noun num_verb adj adp_cconj det intj
224 healthy - tomato - plant - pepper - spot 11 224_healthy_tomato_plant_pepper
225 sony - lg - tv - galaxy - monitor 11 225_sony_lg_tv_galaxy
226 new - city - mid - location - south 11 226_new_city_mid_location
227 space - - - - 11 227_space___
228 cloud - racing - motorcycle - boy - bus 11 228_cloud_racing_motorcycle_boy
229 punc - zero - pers - neg - reflex 11 229_punc_zero_pers_neg
230 energy - arts - high - systems - computer 11 230_energy_arts_high_systems
231 dis - ad - media - site - plant 11 231_dis_ad_media_site
232 world - tech - business - sports - female 11 232_world_tech_business_sports
233 sadness - anger - anger fear - joy - fear 10 233_sadness_anger_anger fear_joy
234 neg - adj - sym - propn - num 10 234_neg_adj_sym_propn
235 bulldog - cat - husky - pug - corgi 9 235_bulldog_cat_husky_pug
236 - - - - 8 236____
237 origin - quote - actor - opinion - language 7 237_origin_quote_actor_opinion
238 na - nb - nc - neu - ng 7 238_na_nb_nc_neu
239 - - - - 7 239____
240 ci - aa - joy - im - ip 7 240_ci_aa_joy_im
241 - - - - 6 241____
242 skill - email - address - grade - language 6 242_skill_email_address_grade
243 sexual - threat - christian - hate - male 6 243_sexual_threat_christian_hate
244 transmission - wind - tower - pole - 6 244_transmission_wind_tower_pole
245 label_14 label_15 - label_13 label_14 label_15 - label_15 - label_12 label_13 label_14 - label_11 label_12 label_13 6 245_label_14 label_15_label_13 label_14 label_15_label_15_label_12 label_13 label_14

Training hyperparameters

  • calculate_probabilities: False
  • language: None
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.29
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.29.2
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.11