Wikipedia-example-topic-model
This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
Usage
To use this model, please install BERTopic:
pip install -U bertopic
You can use the model as follows:
from bertopic import BERTopic
topic_model = BERTopic.load("TopicNavi/Wikipedia-example-topic-model")
topic_model.get_topic_info()
Topic overview
- Number of topics: 227
- Number of training documents: 25000
Click here for an overview of all topics.
Topic ID | Topic Keywords | Topic Frequency | Label |
---|---|---|---|
-1 | of - the - to - and - in | 10 | -1_of_the_to_and |
0 | actor - he - award - his - born | 7457 | 0_actor_he_award_his |
1 | film - directed - stars - written - by | 1494 | 1_film_directed_stars_written |
2 | actress - she - her - award - born | 1487 | 2_actress_she_her_award |
3 | series - premiered - created - season - television | 1339 | 3_series_premiered_created_season |
4 | band - rock - guitarist - formed - lead | 740 | 4_band_rock_guitarist_formed |
5 | species - are - genus - breed - dog | 501 | 5_species_are_genus_breed |
6 | indian - hindi - filmfare - cinema - tamil | 428 | 6_indian_hindi_filmfare_cinema |
7 | footballer - club - professional - plays - midfielder | 395 | 7_footballer_club_professional_plays |
8 | king - queen - prince - duke - throne | 372 | 8_king_queen_prince_duke |
9 | symptoms - disease - may - disorder - pain | 310 | 9_symptoms_disease_may_disorder |
10 | war - battle - fought - empire - german | 299 | 10_war_battle_fought_empire |
11 | sexual - sex - or - gender - activity | 284 | 11_sexual_sex_or_gender |
12 | singer - songwriter - album - music - albums | 268 | 12_singer_songwriter_album_music |
13 | language - spoken - languages - ethnic - speakers | 262 | 13_language_spoken_languages_ethnic |
14 | company - multinational - headquartered - corporation - technology | 204 | 14_company_multinational_headquartered_corporation |
15 | species - plant - genus - fruit - plants | 203 | 15_species_plant_genus_fruit |
16 | poet - philosopher - his - writer - novelist | 197 | 16_poet_philosopher_his_writer |
17 | aircraft - boeing - fighter - air - designed | 185 | 17_aircraft_boeing_fighter_air |
18 | game - xbox - playstation - developed - windows | 183 | 18_game_xbox_playstation_developed |
19 | city - capital - population - area - largest | 175 | 19_city_capital_population_area |
20 | manga - anime - aired - adaptation - japanese | 164 | 20_manga_anime_aired_adaptation |
21 | hindilanguage - indian - stars - film - produced | 156 | 21_hindilanguage_indian_stars_film |
22 | bible - jesus - god - hebrew - testament | 156 | 22_bible_jesus_god_hebrew |
23 | mathematics - probability - function - distribution - numbers | 151 | 23_mathematics_probability_function_distribution |
24 | nba - basketball - player - association - allstar | 150 | 24_nba_basketball_player_association |
25 | killer - convicted - serial - murders - murder | 148 | 25_killer_convicted_serial_murders |
26 | rapper - album - records - released - professionally | 141 | 26_rapper_album_records_released |
27 | wrestling - wwe - wrestler - ring - professional | 140 | 27_wrestling_wwe_wrestler_ring |
28 | forces - armed - military - force - air | 133 | 28_forces_armed_military_force |
29 | toyota - car - honda - manufactured - model | 124 | 29_toyota_car_honda_manufactured |
30 | nfl - football - quarterback - college - played | 116 | 30_nfl_football_quarterback_college |
31 | greek - mythology - goddess - ancient - roman | 115 | 31_greek_mythology_goddess_ancient |
32 | disney - walt - entertainment - studios - company | 110 | 32_disney_walt_entertainment_studios |
33 | team - compete - division - conference - league | 106 | 33_team_compete_division_conference |
34 | medication - treat - used - mouth - taken | 105 | 34_medication_treat_used_mouth |
35 | political - social - economic - democracy - government | 100 | 35_political_social_economic_democracy |
36 | football - club - league - bundesliga - professional | 96 | 36_football_club_league_bundesliga |
37 | dish - sauce - cheese - meat - vegetables | 92 | 37_dish_sauce_cheese_meat |
38 | element - chemical - atomic - symbol - metal | 92 | 38_element_chemical_atomic_symbol |
39 | mind - psychology - or - that - philosophical | 91 | 39_mind_psychology_or_that |
40 | novel - published - author - story - book | 89 | 40_novel_published_author_story |
41 | rifle - cartridge - pistol - gun - sig | 85 | 41_rifle_cartridge_pistol_gun |
42 | cup - fifa - tournament - world - teams | 84 | 42_cup_fifa_tournament_world |
43 | cofounder - ceo - entrepreneur - investor - facebook | 82 | 43_cofounder_ceo_entrepreneur_investor |
44 | computer - programming - data - software - language | 82 | 44_computer_programming_data_software |
45 | marvel - comics - comic - character - books | 81 | 45_marvel_comics_comic_character |
46 | ufc - mixed - martial - fighting - champion | 77 | 46_ufc_mixed_martial_fighting |
47 | korean - south - kim - roles - my | 76 | 47_korean_south_kim_roles |
48 | korean - south - entertainment - group - girl | 75 | 48_korean_south_entertainment_group |
49 | president - served - vice - states - bush | 74 | 49_president_served_vice_states |
50 | mafia - crime - cartel - organized - drug | 73 | 50_mafia_crime_cartel_organized |
51 | islands - island - australia - ocean - pacific | 73 | 51_islands_island_australia_ocean |
52 | state - india - pradesh - capital - region | 72 | 52_state_india_pradesh_capital |
53 | politician - president - served - minister - since | 69 | 53_politician_president_served_minister |
54 | city - county - populous - metropolitan - population | 69 | 54_city_county_populous_metropolitan |
55 | africa - country - republic - officially - west | 69 | 55_africa_country_republic_officially |
56 | university - research - college - private - universities | 66 | 56_university_research_college_private |
57 | ceremony - presented - awards - academy - ampas | 65 | 57_ceremony_presented_awards_academy |
58 | tennis - open - titles - singles - atp | 64 | 58_tennis_open_titles_singles |
59 | korean - kim - kst - aired - south | 64 | 59_korean_kim_kst_aired |
60 | music - rock - genre - pop - punk | 63 | 60_music_rock_genre_pop |
61 | caribbean - islands - island - country - antilles | 61 | 61_caribbean_islands_island_country |
62 | politician - senator - republican - democratic - party | 60 | 62_politician_senator_republican_democratic |
63 | electric - electromagnetic - radiation - energy - magnetic | 57 | 63_electric_electromagnetic_radiation_energy |
64 | wars - star - jedi - skywalker - trilogy | 55 | 64_wars_star_jedi_skywalker |
65 | planet - solar - sun - earth - jupiter | 54 | 65_planet_solar_sun_earth |
66 | class - ship - navy - ships - submarines | 54 | 66_class_ship_navy_ships |
67 | president - sabha - house - government - chief | 53 | 67_president_sabha_house_government |
68 | alphabet - letter - alphabets - languages - english | 49 | 68_alphabet_letter_alphabets_languages |
69 | football - team - represents - mens - governing | 48 | 69_football_team_represents_mens |
70 | club - football - stadium - league - tier | 48 | 70_club_football_stadium_league |
71 | empire - ancient - egypt - bc - civilization | 45 | 71_empire_ancient_egypt_bc |
72 | manufacturer - automobile - automotive - stellantis - company | 45 | 72_manufacturer_automobile_automotive_stellantis |
73 | flag - flags - national - tricolour - anthem | 44 | 73_flag_flags_national_tricolour |
74 | church - religious - christianity - religion - movement | 43 | 74_church_religious_christianity_religion |
75 | minister - prime - conservative - mp - served | 42 | 75_minister_prime_conservative_mp |
76 | wine - drink - sugar - alcoholic - cocktail | 41 | 76_wine_drink_sugar_alcoholic |
77 | hindu - hinduism - shiva - vishnu - goddess | 41 | 77_hindu_hinduism_shiva_vishnu |
78 | batman - dc - comics - gotham - superhero | 41 | 78_batman_dc_comics_gotham |
79 | formula - racing - driver - prix - championship | 41 | 79_formula_racing_driver_prix |
80 | airline - airlines - airport - carrier - destinations | 41 | 80_airline_airlines_airport_carrier |
81 | compound - acid - organic - chemical - formula | 40 | 81_compound_acid_organic_chemical |
82 | nazi - german - hitler - adolf - germany | 40 | 82_nazi_german_hitler_adolf |
83 | bond - james - eon - spy - mi6 | 39 | 83_bond_james_eon_spy |
84 | belief - god - religious - existence - atheism | 39 | 84_belief_god_religious_existence |
85 | energy - constant - force - heat - unit | 39 | 85_energy_constant_force_heat |
86 | minister - prime - indian - pakistan - india | 39 | 86_minister_prime_indian_pakistan |
87 | roman - emperor - bc - augustus - caesar | 38 | 87_roman_emperor_bc_augustus |
88 | asia - gulf - sea - east - oman | 38 | 88_asia_gulf_sea_east |
89 | boxer - heavyweight - title - wba - ibf | 37 | 89_boxer_heavyweight_title_wba |
90 | county - england - city - ceremonial - london | 36 | 90_county_england_city_ceremonial |
91 | data - learning - algorithm - machine - neural | 36 | 91_data_learning_algorithm_machine |
92 | day - holiday - celebrated - thanksgiving - celebration | 35 | 92_day_holiday_celebrated_thanksgiving |
93 | saul - breaking - bad - call - better | 34 | 93_saul_breaking_bad_call |
94 | punishment - death - execution - homicide - suicide | 34 | 94_punishment_death_execution_homicide |
95 | degree - education - secondary - bachelor - bachelors | 34 | 95_degree_education_secondary_bachelor |
96 | console - nintendo - playstation - game - consoles | 34 | 96_console_nintendo_playstation_game |
97 | iphone - apple - ipad - pro - inc | 34 | 97_iphone_apple_ipad_pro |
98 | vitamin - organisms - bacteria - animals - plants | 33 | 98_vitamin_organisms_bacteria_animals |
99 | cells - blood - system - gland - organ | 33 | 99_cells_blood_system_gland |
100 | trek - star - kirk - starship - uss | 33 | 100_trek_star_kirk_starship |
101 | jews - nazi - camps - camp - extermination | 33 | 101_jews_nazi_camps_camp |
102 | space - moon - apollo - nasa - shuttle | 33 | 102_space_moon_apollo_nasa |
103 | roman - empire - rome - western - byzantine | 32 | 103_roman_empire_rome_western |
104 | marvel - studios - mcu - thor - superhero | 32 | 104_marvel_studios_mcu_thor |
105 | organisms - biology - genetic - genes - species | 32 | 105_organisms_biology_genetic_genes |
106 | fashion - gucci - designer - luxury - chanel | 32 | 106_fashion_gucci_designer_luxury |
107 | baseball - mlb - league - major - runs | 32 | 107_baseball_mlb_league_major |
108 | island - islands - ireland - isles - northern | 31 | 108_island_islands_ireland_isles |
109 | creature - folklore - legendary - depicted - or | 31 | 109_creature_folklore_legendary_depicted |
110 | empire - mughal - maratha - subcontinent - dynasty | 31 | 110_empire_mughal_maratha_subcontinent |
111 | social - racial - race - racism - white | 31 | 111_social_racial_race_racism |
112 | election - presidential - incumbent - tuesday - republican | 30 | 112_election_presidential_incumbent_tuesday |
113 | building - tallest - street - manhattan - york | 29 | 113_building_tallest_street_manhattan |
114 | bowl - super - champion - football - conference | 29 | 114_bowl_super_champion_football |
115 | election - elections - elect - held - general | 29 | 115_election_elections_elect_held |
116 | soviet - union - stalin - communist - russian | 29 | 116_soviet_union_stalin_communist |
117 | stock - exchange - securities - investment - companies | 29 | 117_stock_exchange_securities_investment |
118 | bmw - mercedesbenz - generation - sedan - marketed | 29 | 118_bmw_mercedesbenz_generation_sedan |
119 | currency - dollar - currencies - monetary - bank | 29 | 119_currency_dollar_currencies_monetary |
120 | dynasty - emperor - china - qin - chinese | 28 | 120_dynasty_emperor_china_qin |
121 | internet - protocol - ip - networks - network | 28 | 121_internet_protocol_ip_networks |
122 | tropical - cyclones - cyclone - hurricane - hemisphere | 28 | 122_tropical_cyclones_cyclone_hurricane |
123 | anthropomorphic - cartoon - character - peanuts - bugs | 28 | 123_anthropomorphic_cartoon_character_peanuts |
124 | elections - election - senate - elect - governor | 28 | 124_elections_election_senate_elect |
125 | windows - operating - microsoft - macos - server | 28 | 125_windows_operating_microsoft_macos |
126 | san - county - california - los - angeles | 27 | 126_san_county_california_los |
127 | potter - harry - hogwarts - rowling - rowlings | 27 | 127_potter_harry_hogwarts_rowling |
128 | tank - soviet - tanks - t72 - armoured | 27 | 128_tank_soviet_tanks_t72 |
129 | website - youtube - pornographic - videos - websites | 26 | 129_website_youtube_pornographic_videos |
130 | missile - missiles - surfacetoair - ballistic - system | 26 | 130_missile_missiles_surfacetoair_ballistic |
131 | formula - championship - fia - racing - drivers | 26 | 131_formula_championship_fia_racing |
132 | mario - game - nintendo - super - games | 26 | 132_mario_game_nintendo_super |
133 | composer - composers - symphony - music - pianist | 26 | 133_composer_composers_symphony_music |
134 | music - theatre - musical - art - or | 26 | 134_music_theatre_musical_art |
135 | party - political - democratic - liberal - labour | 25 | 135_party_political_democratic_liberal |
136 | province - canada - provinces - territories - city | 25 | 136_province_canada_provinces_territories |
137 | airport - busiest - international - passenger - traffic | 25 | 137_airport_busiest_international_passenger |
138 | china - shanghai - province - guangzhou - populous | 24 | 138_china_shanghai_province_guangzhou |
139 | flight - airport - airlines - accident - crashed | 24 | 139_flight_airport_airlines_accident |
140 | expedition - spanish - america - explorer - americas | 24 | 140_expedition_spanish_america_explorer |
141 | economy - gdp - capita - ppp - countries | 24 | 141_economy_gdp_capita_ppp |
142 | ball - sport - players - teams - team | 24 | 142_ball_sport_players_teams |
143 | thrones - fire - ice - hbo - fantasy | 23 | 143_thrones_fire_ice_hbo |
144 | uefa - champions - league - cup - organised | 23 | 144_uefa_champions_league_cup |
145 | terminator - transformers - fiction - science - action | 23 | 145_terminator_transformers_fiction_science |
146 | time - calendar - zone - year - daylight | 23 | 146_time_calendar_zone_year |
147 | caliphate - muhammad - ibn - islam - islamic | 22 | 147_caliphate_muhammad_ibn_islam |
148 | holmes - sherlock - dracula - conan - watson | 22 | 148_holmes_sherlock_dracula_conan |
149 | games - multisport - olympic - olympics - winter | 21 | 149_games_multisport_olympic_olympics |
150 | web - google - search - pages - users | 21 | 150_web_google_search_pages |
151 | google - chat - messaging - users - torrent | 21 | 151_google_chat_messaging_users |
152 | renaissance - italian - leonardo - michelangelo - vinci | 21 | 152_renaissance_italian_leonardo_michelangelo |
153 | amendment - court - constitution - rights - abortion | 21 | 153_amendment_court_constitution_rights |
154 | marvel - continuity - comics - mcu - cinematic | 21 | 154_marvel_continuity_comics_mcu |
155 | draft - players - nba - lottery - eligible | 20 | 155_draft_players_nba_lottery |
156 | kennedy - clinton - president - jacqueline - lewinsky | 20 | 156_kennedy_clinton_president_jacqueline |
157 | shooting - school - injured - killed - mass | 20 | 157_shooting_school_injured_killed |
158 | greys - anatomy - abc - medical - rhimes | 20 | 158_greys_anatomy_abc_medical |
159 | kardashian - kardashians - jenner - keeping - kourtney | 19 | 159_kardashian_kardashians_jenner_keeping |
160 | godfather - corleone - coppola - vito - pacino | 19 | 160_godfather_corleone_coppola_vito |
161 | script - alphabet - chinese - writing - write | 19 | 161_script_alphabet_chinese_writing |
162 | beatles - album - parlophone - studio - songs | 19 | 162_beatles_album_parlophone_studio |
163 | martial - boxing - combat - aikido - wrestling | 18 | 163_martial_boxing_combat_aikido |
164 | york - island - new - borough - county | 18 | 164_york_island_new_borough |
165 | court - supreme - justice - associate - jurist | 18 | 165_court_supreme_justice_associate |
166 | hamlet - shakespeare - shakespeares - tragedy - william | 18 | 166_hamlet_shakespeare_shakespeares_tragedy |
167 | hong - kong - martial - yen - chow | 18 | 167_hong_kong_martial_yen |
168 | rocky - stallone - rambo - sylvester - balboa | 18 | 168_rocky_stallone_rambo_sylvester |
169 | nobel - prize - physics - prizes - physicist | 18 | 169_nobel_prize_physics_prizes |
170 | thrones - hbo - game - 20112019 - fantasy | 18 | 170_thrones_hbo_game_20112019 |
171 | cricket - cricketer - indian - captain - righthanded | 17 | 171_cricket_cricketer_indian_captain |
172 | art - architecture - movement - style - baroque | 17 | 172_art_architecture_movement_style |
173 | nuclear - bomb - weapons - weapon - thermonuclear | 17 | 173_nuclear_bomb_weapons_weapon |
174 | amphetamine - enhancer - stimulant - drug - adhd | 16 | 174_amphetamine_enhancer_stimulant_drug |
175 | walking - dead - kirkman - adlard - amc | 16 | 175_walking_dead_kirkman_adlard |
176 | snuff - genre - comedy - laughter - films | 16 | 176_snuff_genre_comedy_laughter |
177 | superman - dc - aquaman - dceu - warner | 16 | 177_superman_dc_aquaman_dceu |
178 | health - care - medical - medicine - hospitals | 16 | 178_health_care_medical_medicine |
179 | color - colors - rgb - red - blue | 16 | 179_color_colors_rgb_red |
180 | smiley - bokeh - clothing - meme - face | 16 | 180_smiley_bokeh_clothing_meme |
181 | metallica - metal - band - ulrich - thrash | 15 | 181_metallica_metal_band_ulrich |
182 | economic - prices - inflation - price - crisis | 15 | 182_economic_prices_inflation_price |
183 | doctor - incarnation - thirteenth - bbc - specials | 15 | 183_doctor_incarnation_thirteenth_bbc |
184 | rings - tolkiens - tolkien - hobbit - lord | 15 | 184_rings_tolkiens_tolkien_hobbit |
185 | pope - church - vatican - catholic - roncalli | 15 | 185_pope_church_vatican_catholic |
186 | rockefeller - miss - oil - rothschild - family | 15 | 186_rockefeller_miss_oil_rothschild |
187 | seinfeld - comedian - sitcom - kramer - jerry | 14 | 187_seinfeld_comedian_sitcom_kramer |
188 | ottoman - sultan - selim - empire - erturul | 14 | 188_ottoman_sultan_selim_empire |
189 | chinese - china - ccp - mao - communist | 14 | 189_chinese_china_ccp_mao |
190 | philosopher - philosophy - greek - treatise - mathematician | 14 | 190_philosopher_philosophy_greek_treatise |
191 | mark - punctuation - exclamation - bracket - marks | 14 | 191_mark_punctuation_exclamation_bracket |
192 | event - wrestlemania - wwe - payperview - livestreaming | 14 | 192_event_wrestlemania_wwe_payperview |
193 | norse - mythology - old - loki - odin | 14 | 193_norse_mythology_old_loki |
194 | dre - hop - hip - wutang - group | 14 | 194_dre_hop_hip_wutang |
195 | newspaper - daily - guardian - times - news | 14 | 195_newspaper_daily_guardian_times |
196 | theft - kratos - auto - rockstar - god | 13 | 196_theft_kratos_auto_rockstar |
197 | drag - rupauls - race - vh1 - season | 13 | 197_drag_rupauls_race_vh1 |
198 | polyethylene - polymers - silk - plastics - synthetic | 13 | 198_polyethylene_polymers_silk_plastics |
199 | strings - instrument - instruments - guitar - electronic | 13 | 199_strings_instrument_instruments_guitar |
200 | population - census - rate - growth - increase | 13 | 200_population_census_rate_growth |
201 | resolution - hdtv - display - hd - pixels | 13 | 201_resolution_hdtv_display_hd |
202 | athletic - hockey - ncaa - conference - university | 13 | 202_athletic_hockey_ncaa_conference |
203 | nervous - spinal - brain - nerves - cord | 13 | 203_nervous_spinal_brain_nerves |
204 | peppers - chili - hot - rock - red | 13 | 204_peppers_chili_hot_rock |
205 | accounting - tax - financial - nonprofit - entity | 12 | 205_accounting_tax_financial_nonprofit |
206 | swift - album - studio - taylor - singersongwriter | 12 | 206_swift_album_studio_taylor |
207 | sheldon - bang - theory - big - parsons | 12 | 207_sheldon_bang_theory_big |
208 | conjuring - wan - annabelle - lorraine - dauberman | 12 | 208_conjuring_wan_annabelle_lorraine |
209 | karate - kid - miyagi - macchio - kai | 12 | 209_karate_kid_miyagi_macchio |
210 | earthquake - eruption - tsunami - fault - occurred | 12 | 210_earthquake_eruption_tsunami_fault |
211 | guard - guards - ball - positions - midfielders | 12 | 211_guard_guards_ball_positions |
212 | geologic - planets - earth - how - earths | 12 | 212_geologic_planets_earth_how |
213 | card - game - cards - chess - baccarat | 12 | 213_card_game_cards_chess |
214 | zodiac - sign - astrological - transits - spans | 12 | 214_zodiac_sign_astrological_transits |
215 | gandhi - singh - godse - india - bhindranwale | 12 | 215_gandhi_singh_godse_india |
216 | cannabis - cigarette - thc - tobacco - cocaine | 11 | 216_cannabis_cigarette_thc_tobacco |
217 | xmen - wolverine - installment - jackman - superhero | 11 | 217_xmen_wolverine_installment_jackman |
218 | caucasus - azerbaijan - baku - sea - caspian | 11 | 218_caucasus_azerbaijan_baku_sea |
219 | draft - nfl - meeting - select - eligible | 11 | 219_draft_nfl_meeting_select |
220 | nobility - royalty - rank - knighthood - dukes | 11 | 220_nobility_royalty_rank_knighthood |
221 | saudi - arabia - saud - abdulaziz - bin | 11 | 221_saudi_arabia_saud_abdulaziz |
222 | jolyne - jotaro - her - school - stand | 11 | 222_jolyne_jotaro_her_school |
223 | prefecture - kon - mifune - ueno - hachik | 11 | 223_prefecture_kon_mifune_ueno |
224 | guru - granth - baba - gobind - das | 10 | 224_guru_granth_baba_gobind |
225 | un - nations - intergovernmental - organisation - organization | 10 | 225_un_nations_intergovernmental_organisation |
Training hyperparameters
- calculate_probabilities: False
- language: english
- low_memory: False
- min_topic_size: 10
- n_gram_range: (1, 1)
- nr_topics: None
- seed_topic_list: None
- top_n_words: 10
- verbose: False
- zeroshot_min_similarity: 0.7
- zeroshot_topic_list: None
Framework versions
- Numpy: 1.26.4
- HDBSCAN: 0.8.33
- UMAP: 0.5.6
- Pandas: 2.2.2
- Scikit-Learn: 1.4.2
- Sentence-transformers: 2.7.0
- Transformers: 4.40.2
- Numba: 0.59.1
- Plotly: 5.22.0
- Python: 3.11.9
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.