KingKazma's picture
Add BERTopic model
440b54c
|
raw
history blame
6.68 kB
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

cnn_dailymail_55555_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_55555_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 61
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - one - year - people - mr 10 -1_said_one_year_people
0 league - game - player - cup - goal 961 0_league_game_player_cup
1 police - death - said - murder - family 313 1_police_death_said_murder
2 obama - republican - senate - president - republicans 182 2_obama_republican_senate_president
3 fashion - hair - look - makeup - brand 91 3_fashion_hair_look_makeup
4 dog - animal - cat - bird - pet 69 4_dog_animal_cat_bird
5 syria - isis - syrian - iraq - fighter 54 5_syria_isis_syrian_iraq
6 mexico - said - cuba - president - cartel 53 6_mexico_said_cuba_president
7 police - court - cash - jailed - said 53 7_police_court_cash_jailed
8 space - nasa - mars - planet - earth 51 8_space_nasa_mars_planet
9 property - house - price - room - london 48 9_property_house_price_room
10 patient - hospital - nhs - doctor - cancer 48 10_patient_hospital_nhs_doctor
11 tax - bank - minister - mr - pay 46 11_tax_bank_minister_mr
12 car - fire - crash - bus - train 45 12_car_fire_crash_bus
13 milk - food - raw - restaurant - chocolate 44 13_milk_food_raw_restaurant
14 gold - olympic - horse - race - medal 36 14_gold_olympic_horse_race
15 album - song - joel - music - show 35 15_album_song_joel_music
16 show - film - movie - award - les 35 16_show_film_movie_award
17 baby - born - hospital - birth - pregnancy 34 17_baby_born_hospital_birth
18 prince - queen - royal - william - duchess 31 18_prince_queen_royal_william
19 chinese - china - bo - beijing - chen 30 19_chinese_china_bo_beijing
20 labour - mr - party - ukip - miliband 30 20_labour_mr_party_ukip
21 school - student - teacher - book - fraternity 29 21_school_student_teacher_book
22 somalia - dala - african - alshabaab - mali 28 22_somalia_dala_african_alshabaab
23 ukraine - russian - russia - putin - moscow 26 23_ukraine_russian_russia_putin
24 woods - golf - golfer - hole - round 26 24_woods_golf_golfer_hole
25 sterling - nba - clippers - donald - said 26 25_sterling_nba_clippers_donald
26 found - scientist - stonehenge - researcher - frog 26 26_found_scientist_stonehenge_researcher
27 apple - iphone - apples - phone - device 24 27_apple_iphone_apples_phone
28 formula - race - schumacher - prix - ecclestone 23 28_formula_race_schumacher_prix
29 ebola - virus - outbreak - health - vaccine 22 29_ebola_virus_outbreak_health
30 church - pope - priest - francis - vatican 21 30_church_pope_priest_francis
31 sharapova - open - wimbledon - tennis - slam 21 31_sharapova_open_wimbledon_tennis
32 pakistani - pakistan - taliban - musharraf - afghanistan 21 32_pakistani_pakistan_taliban_musharraf
33 storm - weather - tornado - water - rain 21 33_storm_weather_tornado_water
34 north - korea - korean - kim - south 21 34_north_korea_korean_kim
35 war - medal - soldier - army - afghanistan 21 35_war_medal_soldier_army
36 marijuana - cigarette - alcohol - drug - smoking 20 36_marijuana_cigarette_alcohol_drug
37 internet - google - user - facebook - online 19 37_internet_google_user_facebook
38 plane - flight - crash - passenger - airport 19 38_plane_flight_crash_passenger
39 weight - diet - fat - stone - food 18 39_weight_diet_fat_stone
40 israeli - israel - gaza - hamas - palestinian 17 40_israeli_israel_gaza_hamas
41 beach - art - resort - festival - painting 17 41_beach_art_resort_festival
42 petraeus - cia - broadwell - justice - fbi 17 42_petraeus_cia_broadwell_justice
43 garner - wilson - officer - police - black 16 43_garner_wilson_officer_police
44 ship - cruise - ships - crew - pirate 16 44_ship_cruise_ships_crew
45 nfl - patriots - rice - seahawks - chris 15 45_nfl_patriots_rice_seahawks
46 dolphin - sea - creature - cuttlefish - fisherman 14 46_dolphin_sea_creature_cuttlefish
47 weather - rain - winter - temperature - warm 14 47_weather_rain_winter_temperature
48 mandela - african - africa - south - mandelas 14 48_mandela_african_africa_south
49 disney - snow - million - wars - movie 14 49_disney_snow_million_wars
50 price - bag - plastic - cent - energy 13 50_price_bag_plastic_cent
51 spartan - cliff - parachute - matthew - obstacle 12 51_spartan_cliff_parachute_matthew
52 zoo - panda - cub - giraffe - park 12 52_zoo_panda_cub_giraffe
53 iran - iranian - irans - ahmadinejad - nuclear 12 53_iran_iranian_irans_ahmadinejad
54 bin - laden - us - qaeda - al 12 54_bin_laden_us_qaeda
55 crocodile - snake - python - bascoules - alligator 12 55_crocodile_snake_python_bascoules
56 woman - ivf - men - dna - fertility 11 56_woman_ivf_men_dna
57 driver - driving - police - meracle - text 11 57_driver_driving_police_meracle
58 mitchell - mr - evans - mp - gate 10 58_mitchell_mr_evans_mp
59 france - police - mosque - salah - donetsk 10 59_france_police_mosque_salah

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.6