File size: 6,678 Bytes
440b54c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129

---
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
---

# cnn_dailymail_55555_3000_1500_train

This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. 
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. 

## Usage 

To use this model, please install BERTopic:

```
pip install -U bertopic
```

You can use the model as follows:

```python
from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_55555_3000_1500_train")

topic_model.get_topic_info()
```

## Topic overview

* Number of topics: 61
* Number of training documents: 3000

<details>
  <summary>Click here for an overview of all topics.</summary>
  
  | Topic ID | Topic Keywords | Topic Frequency | Label | 
|----------|----------------|-----------------|-------| 
| -1 | said - one - year - people - mr | 10 | -1_said_one_year_people | 
| 0 | league - game - player - cup - goal | 961 | 0_league_game_player_cup | 
| 1 | police - death - said - murder - family | 313 | 1_police_death_said_murder | 
| 2 | obama - republican - senate - president - republicans | 182 | 2_obama_republican_senate_president | 
| 3 | fashion - hair - look - makeup - brand | 91 | 3_fashion_hair_look_makeup | 
| 4 | dog - animal - cat - bird - pet | 69 | 4_dog_animal_cat_bird | 
| 5 | syria - isis - syrian - iraq - fighter | 54 | 5_syria_isis_syrian_iraq | 
| 6 | mexico - said - cuba - president - cartel | 53 | 6_mexico_said_cuba_president | 
| 7 | police - court - cash - jailed - said | 53 | 7_police_court_cash_jailed | 
| 8 | space - nasa - mars - planet - earth | 51 | 8_space_nasa_mars_planet | 
| 9 | property - house - price - room - london | 48 | 9_property_house_price_room | 
| 10 | patient - hospital - nhs - doctor - cancer | 48 | 10_patient_hospital_nhs_doctor | 
| 11 | tax - bank - minister - mr - pay | 46 | 11_tax_bank_minister_mr | 
| 12 | car - fire - crash - bus - train | 45 | 12_car_fire_crash_bus | 
| 13 | milk - food - raw - restaurant - chocolate | 44 | 13_milk_food_raw_restaurant | 
| 14 | gold - olympic - horse - race - medal | 36 | 14_gold_olympic_horse_race | 
| 15 | album - song - joel - music - show | 35 | 15_album_song_joel_music | 
| 16 | show - film - movie - award - les | 35 | 16_show_film_movie_award | 
| 17 | baby - born - hospital - birth - pregnancy | 34 | 17_baby_born_hospital_birth | 
| 18 | prince - queen - royal - william - duchess | 31 | 18_prince_queen_royal_william | 
| 19 | chinese - china - bo - beijing - chen | 30 | 19_chinese_china_bo_beijing | 
| 20 | labour - mr - party - ukip - miliband | 30 | 20_labour_mr_party_ukip | 
| 21 | school - student - teacher - book - fraternity | 29 | 21_school_student_teacher_book | 
| 22 | somalia - dala - african - alshabaab - mali | 28 | 22_somalia_dala_african_alshabaab | 
| 23 | ukraine - russian - russia - putin - moscow | 26 | 23_ukraine_russian_russia_putin | 
| 24 | woods - golf - golfer - hole - round | 26 | 24_woods_golf_golfer_hole | 
| 25 | sterling - nba - clippers - donald - said | 26 | 25_sterling_nba_clippers_donald | 
| 26 | found - scientist - stonehenge - researcher - frog | 26 | 26_found_scientist_stonehenge_researcher | 
| 27 | apple - iphone - apples - phone - device | 24 | 27_apple_iphone_apples_phone | 
| 28 | formula - race - schumacher - prix - ecclestone | 23 | 28_formula_race_schumacher_prix | 
| 29 | ebola - virus - outbreak - health - vaccine | 22 | 29_ebola_virus_outbreak_health | 
| 30 | church - pope - priest - francis - vatican | 21 | 30_church_pope_priest_francis | 
| 31 | sharapova - open - wimbledon - tennis - slam | 21 | 31_sharapova_open_wimbledon_tennis | 
| 32 | pakistani - pakistan - taliban - musharraf - afghanistan | 21 | 32_pakistani_pakistan_taliban_musharraf | 
| 33 | storm - weather - tornado - water - rain | 21 | 33_storm_weather_tornado_water | 
| 34 | north - korea - korean - kim - south | 21 | 34_north_korea_korean_kim | 
| 35 | war - medal - soldier - army - afghanistan | 21 | 35_war_medal_soldier_army | 
| 36 | marijuana - cigarette - alcohol - drug - smoking | 20 | 36_marijuana_cigarette_alcohol_drug | 
| 37 | internet - google - user - facebook - online | 19 | 37_internet_google_user_facebook | 
| 38 | plane - flight - crash - passenger - airport | 19 | 38_plane_flight_crash_passenger | 
| 39 | weight - diet - fat - stone - food | 18 | 39_weight_diet_fat_stone | 
| 40 | israeli - israel - gaza - hamas - palestinian | 17 | 40_israeli_israel_gaza_hamas | 
| 41 | beach - art - resort - festival - painting | 17 | 41_beach_art_resort_festival | 
| 42 | petraeus - cia - broadwell - justice - fbi | 17 | 42_petraeus_cia_broadwell_justice | 
| 43 | garner - wilson - officer - police - black | 16 | 43_garner_wilson_officer_police | 
| 44 | ship - cruise - ships - crew - pirate | 16 | 44_ship_cruise_ships_crew | 
| 45 | nfl - patriots - rice - seahawks - chris | 15 | 45_nfl_patriots_rice_seahawks | 
| 46 | dolphin - sea - creature - cuttlefish - fisherman | 14 | 46_dolphin_sea_creature_cuttlefish | 
| 47 | weather - rain - winter - temperature - warm | 14 | 47_weather_rain_winter_temperature | 
| 48 | mandela - african - africa - south - mandelas | 14 | 48_mandela_african_africa_south | 
| 49 | disney - snow - million - wars - movie | 14 | 49_disney_snow_million_wars | 
| 50 | price - bag - plastic - cent - energy | 13 | 50_price_bag_plastic_cent | 
| 51 | spartan - cliff - parachute - matthew - obstacle | 12 | 51_spartan_cliff_parachute_matthew | 
| 52 | zoo - panda - cub - giraffe - park | 12 | 52_zoo_panda_cub_giraffe | 
| 53 | iran - iranian - irans - ahmadinejad - nuclear | 12 | 53_iran_iranian_irans_ahmadinejad | 
| 54 | bin - laden - us - qaeda - al | 12 | 54_bin_laden_us_qaeda | 
| 55 | crocodile - snake - python - bascoules - alligator | 12 | 55_crocodile_snake_python_bascoules | 
| 56 | woman - ivf - men - dna - fertility | 11 | 56_woman_ivf_men_dna | 
| 57 | driver - driving - police - meracle - text | 11 | 57_driver_driving_police_meracle | 
| 58 | mitchell - mr - evans - mp - gate | 10 | 58_mitchell_mr_evans_mp | 
| 59 | france - police - mosque - salah - donetsk | 10 | 59_france_police_mosque_salah |
  
</details>

## Training hyperparameters

* calculate_probabilities: True
* language: english
* low_memory: False
* min_topic_size: 10
* n_gram_range: (1, 1)
* nr_topics: None
* seed_topic_list: None
* top_n_words: 10
* verbose: False

## Framework versions

* Numpy: 1.22.4
* HDBSCAN: 0.8.33
* UMAP: 0.5.3
* Pandas: 1.5.3
* Scikit-Learn: 1.2.2
* Sentence-transformers: 2.2.2
* Transformers: 4.31.0
* Numba: 0.56.4
* Plotly: 5.13.1
* Python: 3.10.6