KingKazma's picture
Add BERTopic model
684528e
|
raw
history blame
6.25 kB
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

cnn_dailymail_123_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_123_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 57
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - one - police - people - year 10 -1_said_one_police_people
0 league - player - cup - goal - game 1070 0_league_player_cup_goal
1 police - said - home - murder - found 320 1_police_said_home_murder
2 court - mr - said - year - sex 142 2_court_mr_said_year
3 obama - president - republicans - house - republican 113 3_obama_president_republicans_house
4 plane - flight - passenger - airport - aircraft 89 4_plane_flight_passenger_airport
5 hospital - care - family - baby - mr 59 5_hospital_care_family_baby
6 fashion - dress - style - look - collection 57 6_fashion_dress_style_look
7 mr - minister - cameron - party - labour 50 7_mr_minister_cameron_party
8 weight - diet - food - fat - school 49 8_weight_diet_food_fat
9 mars - space - climate - nasa - mission 43 9_mars_space_climate_nasa
10 apple - ipad - iphone - app - apples 41 10_apple_ipad_iphone_app
11 shark - dolphin - fish - coast - water 39 11_shark_dolphin_fish_coast
12 teacher - school - student - said - state 37 12_teacher_school_student_said
13 murray - wimbledon - win - champion - match 36 13_murray_wimbledon_win_champion
14 race - prix - hamilton - gold - world 33 14_race_prix_hamilton_gold
15 dog - animal - owner - dogs - tiger 32 15_dog_animal_owner_dogs
16 syrian - syria - isis - islamic - force 32 16_syrian_syria_isis_islamic
17 storm - weather - lava - snow - said 32 17_storm_weather_lava_snow
18 chocolate - sale - cent - online - caramel 32 18_chocolate_sale_cent_online
19 afghanistan - afghan - pakistan - herat - taliban 32 19_afghanistan_afghan_pakistan_herat
20 music - band - halen - song - album 30 20_music_band_halen_song
21 beach - island - resort - park - hotel 29 21_beach_island_resort_park
22 mcilroy - golf - round - shot - hole 27 22_mcilroy_golf_round_shot
23 text - data - nsa - credit - email 26 23_text_data_nsa_credit
24 show - film - movie - actor - griffiths 26 24_show_film_movie_actor
25 putin - russian - russia - ukraine - moscow 26 25_putin_russian_russia_ukraine
26 art - artist - work - painting - pinata 25 26_art_artist_work_painting
27 economy - eurozone - european - euro - debt 24 27_economy_eurozone_european_euro
28 north - kim - korea - korean - jong 24 28_north_kim_korea_korean
29 ebola - virus - liberia - africa - outbreak 22 29_ebola_virus_liberia_africa
30 bike - speed - road - driver - cyclist 22 30_bike_speed_road_driver
31 car - accident - driver - scene - crash 20 31_car_accident_driver_scene
32 price - london - house - home - property 20 32_price_london_house_home
33 al - qaeda - yemen - us - yemeni 20 33_al_qaeda_yemen_us
34 mrs - police - murder - greaves - mr 20 34_mrs_police_murder_greaves
35 per - cent - people - age - average 19 35_per_cent_people_age
36 philpott - court - berry - husband - dewani 18 36_philpott_court_berry_husband
37 facebook - photo - user - instagram - cuddle 17 37_facebook_photo_user_instagram
38 vaccine - meningitis - disease - flu - princeton 17 38_vaccine_meningitis_disease_flu
39 bear - lion - gorilla - cub - zoo 16 39_bear_lion_gorilla_cub
40 brain - drug - alzheimers - memory - patient 16 40_brain_drug_alzheimers_memory
41 prince - royal - queen - duchess - duke 16 41_prince_royal_queen_duchess
42 boat - ship - river - vessel - ferry 15 42_boat_ship_river_vessel
43 china - chinese - chinas - organ - hong 14 43_china_chinese_chinas_organ
44 egypt - election - egyptian - mubarak - protest 13 44_egypt_election_egyptian_mubarak
45 mexico - mexican - cartel - mexicos - drug 13 45_mexico_mexican_cartel_mexicos
46 cia - assange - snowden - us - interrogation 13 46_cia_assange_snowden_us
47 police - hartman - hore - store - maitua 13 47_police_hartman_hore_store
48 israeli - israel - palestinian - gaza - hamas 12 48_israeli_israel_palestinian_gaza
49 pension - tax - scheme - energy - cent 12 49_pension_tax_scheme_energy
50 council - neighbour - village - site - shed 12 50_council_neighbour_village_site
51 occupy - protester - york - cosby - mayor 11 51_occupy_protester_york_cosby
52 mould - allergic - allergy - reaction - hand 11 52_mould_allergic_allergy_reaction
53 boko - haram - nigeria - sudan - isis 11 53_boko_haram_nigeria_sudan
54 disaster - building - tsunami - people - quake 11 54_disaster_building_tsunami_people
55 castro - sloot - der - ariel - aruba 11 55_castro_sloot_der_ariel

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.6