--- tags: - bertopic library_name: bertopic pipeline_tag: text-classification --- # cnn_dailymail_123_3000_1500_train This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. ## Usage To use this model, please install BERTopic: ``` pip install -U bertopic ``` You can use the model as follows: ```python from bertopic import BERTopic topic_model = BERTopic.load("KingKazma/cnn_dailymail_123_3000_1500_train") topic_model.get_topic_info() ``` ## Topic overview * Number of topics: 57 * Number of training documents: 3000
Click here for an overview of all topics. | Topic ID | Topic Keywords | Topic Frequency | Label | |----------|----------------|-----------------|-------| | -1 | said - one - police - people - year | 10 | -1_said_one_police_people | | 0 | league - player - cup - goal - game | 1070 | 0_league_player_cup_goal | | 1 | police - said - home - murder - found | 320 | 1_police_said_home_murder | | 2 | court - mr - said - year - sex | 142 | 2_court_mr_said_year | | 3 | obama - president - republicans - house - republican | 113 | 3_obama_president_republicans_house | | 4 | plane - flight - passenger - airport - aircraft | 89 | 4_plane_flight_passenger_airport | | 5 | hospital - care - family - baby - mr | 59 | 5_hospital_care_family_baby | | 6 | fashion - dress - style - look - collection | 57 | 6_fashion_dress_style_look | | 7 | mr - minister - cameron - party - labour | 50 | 7_mr_minister_cameron_party | | 8 | weight - diet - food - fat - school | 49 | 8_weight_diet_food_fat | | 9 | mars - space - climate - nasa - mission | 43 | 9_mars_space_climate_nasa | | 10 | apple - ipad - iphone - app - apples | 41 | 10_apple_ipad_iphone_app | | 11 | shark - dolphin - fish - coast - water | 39 | 11_shark_dolphin_fish_coast | | 12 | teacher - school - student - said - state | 37 | 12_teacher_school_student_said | | 13 | murray - wimbledon - win - champion - match | 36 | 13_murray_wimbledon_win_champion | | 14 | race - prix - hamilton - gold - world | 33 | 14_race_prix_hamilton_gold | | 15 | dog - animal - owner - dogs - tiger | 32 | 15_dog_animal_owner_dogs | | 16 | syrian - syria - isis - islamic - force | 32 | 16_syrian_syria_isis_islamic | | 17 | storm - weather - lava - snow - said | 32 | 17_storm_weather_lava_snow | | 18 | chocolate - sale - cent - online - caramel | 32 | 18_chocolate_sale_cent_online | | 19 | afghanistan - afghan - pakistan - herat - taliban | 32 | 19_afghanistan_afghan_pakistan_herat | | 20 | music - band - halen - song - album | 30 | 20_music_band_halen_song | | 21 | beach - island - resort - park - hotel | 29 | 21_beach_island_resort_park | | 22 | mcilroy - golf - round - shot - hole | 27 | 22_mcilroy_golf_round_shot | | 23 | text - data - nsa - credit - email | 26 | 23_text_data_nsa_credit | | 24 | show - film - movie - actor - griffiths | 26 | 24_show_film_movie_actor | | 25 | putin - russian - russia - ukraine - moscow | 26 | 25_putin_russian_russia_ukraine | | 26 | art - artist - work - painting - pinata | 25 | 26_art_artist_work_painting | | 27 | economy - eurozone - european - euro - debt | 24 | 27_economy_eurozone_european_euro | | 28 | north - kim - korea - korean - jong | 24 | 28_north_kim_korea_korean | | 29 | ebola - virus - liberia - africa - outbreak | 22 | 29_ebola_virus_liberia_africa | | 30 | bike - speed - road - driver - cyclist | 22 | 30_bike_speed_road_driver | | 31 | car - accident - driver - scene - crash | 20 | 31_car_accident_driver_scene | | 32 | price - london - house - home - property | 20 | 32_price_london_house_home | | 33 | al - qaeda - yemen - us - yemeni | 20 | 33_al_qaeda_yemen_us | | 34 | mrs - police - murder - greaves - mr | 20 | 34_mrs_police_murder_greaves | | 35 | per - cent - people - age - average | 19 | 35_per_cent_people_age | | 36 | philpott - court - berry - husband - dewani | 18 | 36_philpott_court_berry_husband | | 37 | facebook - photo - user - instagram - cuddle | 17 | 37_facebook_photo_user_instagram | | 38 | vaccine - meningitis - disease - flu - princeton | 17 | 38_vaccine_meningitis_disease_flu | | 39 | bear - lion - gorilla - cub - zoo | 16 | 39_bear_lion_gorilla_cub | | 40 | brain - drug - alzheimers - memory - patient | 16 | 40_brain_drug_alzheimers_memory | | 41 | prince - royal - queen - duchess - duke | 16 | 41_prince_royal_queen_duchess | | 42 | boat - ship - river - vessel - ferry | 15 | 42_boat_ship_river_vessel | | 43 | china - chinese - chinas - organ - hong | 14 | 43_china_chinese_chinas_organ | | 44 | egypt - election - egyptian - mubarak - protest | 13 | 44_egypt_election_egyptian_mubarak | | 45 | mexico - mexican - cartel - mexicos - drug | 13 | 45_mexico_mexican_cartel_mexicos | | 46 | cia - assange - snowden - us - interrogation | 13 | 46_cia_assange_snowden_us | | 47 | police - hartman - hore - store - maitua | 13 | 47_police_hartman_hore_store | | 48 | israeli - israel - palestinian - gaza - hamas | 12 | 48_israeli_israel_palestinian_gaza | | 49 | pension - tax - scheme - energy - cent | 12 | 49_pension_tax_scheme_energy | | 50 | council - neighbour - village - site - shed | 12 | 50_council_neighbour_village_site | | 51 | occupy - protester - york - cosby - mayor | 11 | 51_occupy_protester_york_cosby | | 52 | mould - allergic - allergy - reaction - hand | 11 | 52_mould_allergic_allergy_reaction | | 53 | boko - haram - nigeria - sudan - isis | 11 | 53_boko_haram_nigeria_sudan | | 54 | disaster - building - tsunami - people - quake | 11 | 54_disaster_building_tsunami_people | | 55 | castro - sloot - der - ariel - aruba | 11 | 55_castro_sloot_der_ariel |
## Training hyperparameters * calculate_probabilities: True * language: english * low_memory: False * min_topic_size: 10 * n_gram_range: (1, 1) * nr_topics: None * seed_topic_list: None * top_n_words: 10 * verbose: False ## Framework versions * Numpy: 1.22.4 * HDBSCAN: 0.8.33 * UMAP: 0.5.3 * Pandas: 1.5.3 * Scikit-Learn: 1.2.2 * Sentence-transformers: 2.2.2 * Transformers: 4.31.0 * Numba: 0.56.4 * Plotly: 5.13.1 * Python: 3.10.6