Edit model card

cnn_dailymail_22457_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_22457_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 49
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - one - year - people - police 10 -1_said_one_year_people
0 league - player - club - game - cup 1050 0_league_player_club_game
1 said - syria - government - iraq - islamic 317 1_said_syria_government_iraq
2 obama - president - house - state - republican 140 2_obama_president_house_state
3 cancer - hospital - baby - treatment - child 122 3_cancer_hospital_baby_treatment
4 google - apple - tablet - car - device 84 4_google_apple_tablet_car
5 fashion - dress - hair - look - woman 78 5_fashion_dress_hair_look
6 police - officer - shooting - said - shot 66 6_police_officer_shooting_said
7 film - movie - show - actor - comedy 65 7_film_movie_show_actor
8 murder - death - said - home - police 55 8_murder_death_said_home
9 mr - labour - minister - mp - blair 52 9_mr_labour_minister_mp
10 storm - water - weather - ice - rain 51 10_storm_water_weather_ice
11 shark - bear - turtle - crocodile - bird 50 11_shark_bear_turtle_crocodile
12 flight - plane - passenger - airport - pilot 49 12_flight_plane_passenger_airport
13 house - property - home - per - room 49 13_house_property_home_per
14 drug - police - court - stealing - robbery 40 14_drug_police_court_stealing
15 police - murder - mr - court - clavell 36 15_police_murder_mr_court
16 games - gold - olympic - race - sport 34 16_games_gold_olympic_race
17 student - school - teacher - said - cardosa 34 17_student_school_teacher_said
18 country - minister - energy - cent - greece 32 18_country_minister_energy_cent
19 golf - mcilroy - course - round - ryder 31 19_golf_mcilroy_course_round
20 police - harris - abuse - allegation - officer 30 20_police_harris_abuse_allegation
21 ebola - virus - africa - health - liberia 29 21_ebola_virus_africa_health
22 chinese - china - cable - bo - beijing 28 22_chinese_china_cable_bo
23 federer - tennis - murray - wimbledon - match 28 23_federer_tennis_murray_wimbledon
24 dog - animal - dogs - owner - simmons 26 24_dog_animal_dogs_owner
25 cent - per - woman - men - pickens 23 25_cent_per_woman_men
26 ship - boat - rescue - water - sea 23 26_ship_boat_rescue_water
27 hamilton - race - rosberg - mercedes - formula 22 27_hamilton_race_rosberg_mercedes
28 galaxy - planet - universe - earth - telescope 22 28_galaxy_planet_universe_earth
29 russian - russia - putin - ukraine - moscow 22 29_russian_russia_putin_ukraine
30 pakistan - pakistani - karachi - taliban - anwar 22 30_pakistan_pakistani_karachi_taliban
31 korea - north - korean - south - kim 21 31_korea_north_korean_south
32 car - driver - train - accident - cope 21 32_car_driver_train_accident
33 food - fruit - taste - cake - cream 20 33_food_fruit_taste_cake
34 painting - art - auction - artist - gallery 20 34_painting_art_auction_artist
35 base - drone - soldier - afghan - us 19 35_base_drone_soldier_afghan
36 weight - fat - eating - healthy - size 18 36_weight_fat_eating_healthy
37 mafia - wine - money - fraud - court 18 37_mafia_wine_money_fraud
38 aguilar - bravo - brewer - rambold - court 18 38_aguilar_bravo_brewer_rambold
39 missing - search - found - family - disappeared 17 39_missing_search_found_family
40 juarez - quezada - mexico - mexican - cartel 15 40_juarez_quezada_mexico_mexican
41 knicks - lin - chicago - blackhawks - game 15 41_knicks_lin_chicago_blackhawks
42 duchess - prince - kate - royal - william 15 42_duchess_prince_kate_royal
43 price - supermarket - asda - shop - food 14 43_price_supermarket_asda_shop
44 school - child - pupil - teacher - xxx 14 44_school_child_pupil_teacher
45 nhs - patient - ae - hospital - staff 13 45_nhs_patient_ae_hospital
46 zsa - francesca - rhodes - vongtau - gabor 12 46_zsa_francesca_rhodes_vongtau
47 medal - war - bomb - graf - vc 10 47_medal_war_bomb_graf

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.6
Downloads last month
2
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.