Edit model card

cnn_dailymail_6789_50000_25000_v1_test

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_6789_50000_25000_v1_test")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 108
  • Number of training documents: 11490
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - one - year - also - police 5 -1_said_one_year_also
0 league - season - player - game - goal 4981 0_league_season_player_game
1 isis - syria - islamic - group - militant 2424 1_isis_syria_islamic_group
2 property - hotel - room - house - home 194 2_property_hotel_room_house
3 fight - mayweather - pacquiao - floyd - manny 188 3_fight_mayweather_pacquiao_floyd
4 labour - miliband - snp - mr - leader 156 4_labour_miliband_snp_mr
5 driver - car - road - vehicle - driving 133 5_driver_car_road_vehicle
6 baby - hospital - cancer - birth - mother 133 6_baby_hospital_cancer_birth
7 school - student - teacher - pupil - class 125 7_school_student_teacher_pupil
8 flight - plane - passenger - airport - pilot 114 8_flight_plane_passenger_airport
9 masters - woods - augusta - spieth - mcilroy 112 9_masters_woods_augusta_spieth
10 fashion - dress - model - style - designer 107 10_fashion_dress_model_style
11 chocolate - food - egg - sugar - restaurant 98 11_chocolate_food_egg_sugar
12 clinton - hillary - clintons - president - campaign 89 12_clinton_hillary_clintons_president
13 police - murder - mr - miss - body 89 13_police_murder_mr_miss
14 lion - animal - elephant - zoo - wildlife 83 14_lion_animal_elephant_zoo
15 weight - food - eating - diet - size 82 15_weight_food_eating_diet
16 djokovic - murray - open - miami - berdych 75 16_djokovic_murray_open_miami
17 dog - cat - animal - owner - pet 75 17_dog_cat_animal_owner
18 police - vault - gang - thief - raid 74 18_police_vault_gang_thief
19 planet - solar - earth - surface - moon 65 19_planet_solar_earth_surface
20 gray - police - baltimore - officer - grays 64 20_gray_police_baltimore_officer
21 nepal - earthquake - kathmandu - everest - avalanche 58 21_nepal_earthquake_kathmandu_everest
22 fire - blaze - bradford - firefighter - flame 55 22_fire_blaze_bradford_firefighter
23 hamilton - rosberg - race - mercedes - prix 52 23_hamilton_rosberg_race_mercedes
24 prince - royal - queen - duchess - princess 52 24_prince_royal_queen_duchess
25 tax - labour - economy - mr - cameron 51 25_tax_labour_economy_mr
26 shot - police - shooting - brady - gun 48 26_shot_police_shooting_brady
27 anzac - gallipoli - war - australian - waterloo 47 27_anzac_gallipoli_war_australian
28 chan - sukumaran - execution - bali - indonesian 45 28_chan_sukumaran_execution_bali
29 migrant - boat - libya - mediterranean - italian 44 29_migrant_boat_libya_mediterranean
30 china - chinese - chinas - kun - organ 43 30_china_chinese_chinas_kun
31 iran - nuclear - deal - agreement - irans 43 31_iran_nuclear_deal_agreement
32 neanderthals - cave - human - specie - bone 43 32_neanderthals_cave_human_specie
33 shark - fish - whale - seal - water 41 33_shark_fish_whale_seal
34 mccoy - jockey - race - ride - sandown 41 34_mccoy_jockey_race_ride
35 yemen - saudi - houthi - houthis - rebel 39 35_yemen_saudi_houthi_houthis
36 ship - vessel - crew - boat - titanic 39 36_ship_vessel_crew_boat
37 nfl - manziel - game - quarterback - patriots 39 37_nfl_manziel_game_quarterback
38 bruce - jenner - bobbi - bobby - kris 38 38_bruce_jenner_bobbi_bobby
39 money - fraud - bank - account - court 38 39_money_fraud_bank_account
40 wars - star - film - movie - trailer 37 40_wars_star_film_movie
41 hernandez - lloyd - hernandezs - odin - murder 32 41_hernandez_lloyd_hernandezs_odin
42 law - religious - marriage - indiana - samesex 32 42_law_religious_marriage_indiana
43 child - langlais - death - murder - dellinger 31 43_child_langlais_death_murder
44 tsarnaev - boston - dzhokhar - tamerlan - death 31 44_tsarnaev_boston_dzhokhar_tamerlan
45 marathon - running - race - runner - run 31 45_marathon_running_race_runner
46 clarkson - gear - bbc - top - hammond 29 46_clarkson_gear_bbc_top
47 water - weather - temperature - drought - climate 29 47_water_weather_temperature_drought
48 point - nba - scored - playoff - rebound 29 48_point_nba_scored_playoff
49 marijuana - cannabis - drug - hemp - smoking 28 49_marijuana_cannabis_drug_hemp
50 slager - scott - officer - charleston - walter 28 50_slager_scott_officer_charleston
51 died - family - mother - inquest - child 28 51_died_family_mother_inquest
52 groening - camp - auschwitz - nazi - jews 27 52_groening_camp_auschwitz_nazi
53 alshabaab - garissa - kenya - kenyan - attack 26 53_alshabaab_garissa_kenya_kenyan
54 artist - paint - painted - colouring - art 26 54_artist_paint_painted_colouring
55 crucible - osullivan - frame - doherty - world 24 55_crucible_osullivan_frame_doherty
56 janner - lord - saunders - public - abuse 24 56_janner_lord_saunders_public
57 apple - watch - iphone - samsung - battery 23 57_apple_watch_iphone_samsung
58 korea - korean - kim - north - seoul 23 58_korea_korean_kim_north
59 tornado - storm - cloud - lightning - wind 21 59_tornado_storm_cloud_lightning
60 housing - tenant - property - buy - association 20 60_housing_tenant_property_buy
61 hughes - capitol - gyrocopter - secret - lawn 20 61_hughes_capitol_gyrocopter_secret
62 vaccine - vaccination - cough - whooping - autism 19 62_vaccine_vaccination_cough_whooping
63 putin - russian - russia - ukraine - moscow 19 63_putin_russian_russia_ukraine
64 boko - haram - nigeria - nigerian - buhari 19 64_boko_haram_nigeria_nigerian
65 south - johannesburg - africa - african - violence 19 65_south_johannesburg_africa_african
66 bates - harris - tulsa - deputy - taser 19 66_bates_harris_tulsa_deputy
67 aldi - tesco - cent - per - price 19 67_aldi_tesco_cent_per
68 bolt - phelps - ennishill - olympic - kipsiro 19 68_bolt_phelps_ennishill_olympic
69 cuba - castro - obama - cuban - president 18 69_cuba_castro_obama_cuban
70 murray - dunblane - sears - wedding - andy 18 70_murray_dunblane_sears_wedding
71 mchenry - weinstein - battilana - britt - towing 18 71_mchenry_weinstein_battilana_britt
72 nhs - gp - gps - ae - patient 18 72_nhs_gp_gps_ae
73 cancer - breast - prostate - gene - cell 18 73_cancer_breast_prostate_gene
74 emoji - app - user - facebook - use 17 74_emoji_app_user_facebook
75 melbourne - police - anzac - australian - australia 17 75_melbourne_police_anzac_australian
76 song - songs - no - album - chart 17 76_song_songs_no_album
77 sydney - storm - weather - flooding - hail 16 77_sydney_storm_weather_flooding
78 car - audi - motor - bentley - vehicle 15 78_car_audi_motor_bentley
79 rocket - space - spacex - launch - booster 15 79_rocket_space_spacex_launch
80 underground - land - cave - garnet - built 14 80_underground_land_cave_garnet
81 genocide - armenians - armenian - pope - ottoman 14 81_genocide_armenians_armenian_pope
82 hair - jamelia - labium - rita - cheryl 14 82_hair_jamelia_labium_rita
83 stephanie - scott - scotts - stanford - leeton 13 83_stephanie_scott_scotts_stanford
84 funeral - nelms - work - job - grandparent 13 84_funeral_nelms_work_job
85 alcohol - wine - drinking - oak - drink 13 85_alcohol_wine_drinking_oak
86 nuclear - reactor - radiation - plant - fukushima 12 86_nuclear_reactor_radiation_plant
87 luke - search - bushland - missing - eildon 12 87_luke_search_bushland_missing
88 snowden - nsa - agency - oliver - information 12 88_snowden_nsa_agency_oliver
89 brandt - dr - kimmy - franff - fredric 10 89_brandt_dr_kimmy_franff
90 tidal - music - radio - streaming - service 10 90_tidal_music_radio_streaming
91 population - immigrant - cent - per - immigration 10 91_population_immigrant_cent_per
92 brain - acetaminophen - meditation - cortisol - study 9 92_brain_acetaminophen_meditation_cortisol
93 god - church - dollar - catholic - schuller 8 93_god_church_dollar_catholic
94 phone - user - google - device - app 8 94_phone_user_google_device
95 cocaine - cutter - custom - seized - tsa 8 95_cocaine_cutter_custom_seized
96 pusok - deputy - officer - pusoks - mcmahon 7 96_pusok_deputy_officer_pusoks
97 stover - kost - rape - convicted - offender 7 97_stover_kost_rape_convicted
98 nauru - sexual - sex - genetic - convicted 7 98_nauru_sexual_sex_genetic
99 tsa - security - roberts - airport - employee 7 99_tsa_security_roberts_airport
100 eaves - beach - martistee - mckeithen - spring 7 100_eaves_beach_martistee_mckeithen
101 oclee - michelle - philippa - barrientos - mcwhirter 6 101_oclee_michelle_philippa_barrientos
102 redman - wisconsin - basketball - badgers - wildcats 6 102_redman_wisconsin_basketball_badgers
103 gransbury - biderman - funking - website - joke 6 103_gransbury_biderman_funking_website
104 richards - ariana - beverly - kim - hills 6 104_richards_ariana_beverly_kim
105 affleck - gates - renner - avengers - afflecks 5 105_affleck_gates_renner_avengers
106 skin - sun - protoporphyrin - cream - sunlight 5 106_skin_sun_protoporphyrin_cream

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.23.5
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
0
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.