File size: 6,254 Bytes
684528e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125

---
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
---

# cnn_dailymail_123_3000_1500_train

This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. 
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. 

## Usage 

To use this model, please install BERTopic:

```
pip install -U bertopic
```

You can use the model as follows:

```python
from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_123_3000_1500_train")

topic_model.get_topic_info()
```

## Topic overview

* Number of topics: 57
* Number of training documents: 3000

<details>
  <summary>Click here for an overview of all topics.</summary>
  
  | Topic ID | Topic Keywords | Topic Frequency | Label | 
|----------|----------------|-----------------|-------| 
| -1 | said - one - police - people - year | 10 | -1_said_one_police_people | 
| 0 | league - player - cup - goal - game | 1070 | 0_league_player_cup_goal | 
| 1 | police - said - home - murder - found | 320 | 1_police_said_home_murder | 
| 2 | court - mr - said - year - sex | 142 | 2_court_mr_said_year | 
| 3 | obama - president - republicans - house - republican | 113 | 3_obama_president_republicans_house | 
| 4 | plane - flight - passenger - airport - aircraft | 89 | 4_plane_flight_passenger_airport | 
| 5 | hospital - care - family - baby - mr | 59 | 5_hospital_care_family_baby | 
| 6 | fashion - dress - style - look - collection | 57 | 6_fashion_dress_style_look | 
| 7 | mr - minister - cameron - party - labour | 50 | 7_mr_minister_cameron_party | 
| 8 | weight - diet - food - fat - school | 49 | 8_weight_diet_food_fat | 
| 9 | mars - space - climate - nasa - mission | 43 | 9_mars_space_climate_nasa | 
| 10 | apple - ipad - iphone - app - apples | 41 | 10_apple_ipad_iphone_app | 
| 11 | shark - dolphin - fish - coast - water | 39 | 11_shark_dolphin_fish_coast | 
| 12 | teacher - school - student - said - state | 37 | 12_teacher_school_student_said | 
| 13 | murray - wimbledon - win - champion - match | 36 | 13_murray_wimbledon_win_champion | 
| 14 | race - prix - hamilton - gold - world | 33 | 14_race_prix_hamilton_gold | 
| 15 | dog - animal - owner - dogs - tiger | 32 | 15_dog_animal_owner_dogs | 
| 16 | syrian - syria - isis - islamic - force | 32 | 16_syrian_syria_isis_islamic | 
| 17 | storm - weather - lava - snow - said | 32 | 17_storm_weather_lava_snow | 
| 18 | chocolate - sale - cent - online - caramel | 32 | 18_chocolate_sale_cent_online | 
| 19 | afghanistan - afghan - pakistan - herat - taliban | 32 | 19_afghanistan_afghan_pakistan_herat | 
| 20 | music - band - halen - song - album | 30 | 20_music_band_halen_song | 
| 21 | beach - island - resort - park - hotel | 29 | 21_beach_island_resort_park | 
| 22 | mcilroy - golf - round - shot - hole | 27 | 22_mcilroy_golf_round_shot | 
| 23 | text - data - nsa - credit - email | 26 | 23_text_data_nsa_credit | 
| 24 | show - film - movie - actor - griffiths | 26 | 24_show_film_movie_actor | 
| 25 | putin - russian - russia - ukraine - moscow | 26 | 25_putin_russian_russia_ukraine | 
| 26 | art - artist - work - painting - pinata | 25 | 26_art_artist_work_painting | 
| 27 | economy - eurozone - european - euro - debt | 24 | 27_economy_eurozone_european_euro | 
| 28 | north - kim - korea - korean - jong | 24 | 28_north_kim_korea_korean | 
| 29 | ebola - virus - liberia - africa - outbreak | 22 | 29_ebola_virus_liberia_africa | 
| 30 | bike - speed - road - driver - cyclist | 22 | 30_bike_speed_road_driver | 
| 31 | car - accident - driver - scene - crash | 20 | 31_car_accident_driver_scene | 
| 32 | price - london - house - home - property | 20 | 32_price_london_house_home | 
| 33 | al - qaeda - yemen - us - yemeni | 20 | 33_al_qaeda_yemen_us | 
| 34 | mrs - police - murder - greaves - mr | 20 | 34_mrs_police_murder_greaves | 
| 35 | per - cent - people - age - average | 19 | 35_per_cent_people_age | 
| 36 | philpott - court - berry - husband - dewani | 18 | 36_philpott_court_berry_husband | 
| 37 | facebook - photo - user - instagram - cuddle | 17 | 37_facebook_photo_user_instagram | 
| 38 | vaccine - meningitis - disease - flu - princeton | 17 | 38_vaccine_meningitis_disease_flu | 
| 39 | bear - lion - gorilla - cub - zoo | 16 | 39_bear_lion_gorilla_cub | 
| 40 | brain - drug - alzheimers - memory - patient | 16 | 40_brain_drug_alzheimers_memory | 
| 41 | prince - royal - queen - duchess - duke | 16 | 41_prince_royal_queen_duchess | 
| 42 | boat - ship - river - vessel - ferry | 15 | 42_boat_ship_river_vessel | 
| 43 | china - chinese - chinas - organ - hong | 14 | 43_china_chinese_chinas_organ | 
| 44 | egypt - election - egyptian - mubarak - protest | 13 | 44_egypt_election_egyptian_mubarak | 
| 45 | mexico - mexican - cartel - mexicos - drug | 13 | 45_mexico_mexican_cartel_mexicos | 
| 46 | cia - assange - snowden - us - interrogation | 13 | 46_cia_assange_snowden_us | 
| 47 | police - hartman - hore - store - maitua | 13 | 47_police_hartman_hore_store | 
| 48 | israeli - israel - palestinian - gaza - hamas | 12 | 48_israeli_israel_palestinian_gaza | 
| 49 | pension - tax - scheme - energy - cent | 12 | 49_pension_tax_scheme_energy | 
| 50 | council - neighbour - village - site - shed | 12 | 50_council_neighbour_village_site | 
| 51 | occupy - protester - york - cosby - mayor | 11 | 51_occupy_protester_york_cosby | 
| 52 | mould - allergic - allergy - reaction - hand | 11 | 52_mould_allergic_allergy_reaction | 
| 53 | boko - haram - nigeria - sudan - isis | 11 | 53_boko_haram_nigeria_sudan | 
| 54 | disaster - building - tsunami - people - quake | 11 | 54_disaster_building_tsunami_people | 
| 55 | castro - sloot - der - ariel - aruba | 11 | 55_castro_sloot_der_ariel |
  
</details>

## Training hyperparameters

* calculate_probabilities: True
* language: english
* low_memory: False
* min_topic_size: 10
* n_gram_range: (1, 1)
* nr_topics: None
* seed_topic_list: None
* top_n_words: 10
* verbose: False

## Framework versions

* Numpy: 1.22.4
* HDBSCAN: 0.8.33
* UMAP: 0.5.3
* Pandas: 1.5.3
* Scikit-Learn: 1.2.2
* Sentence-transformers: 2.2.2
* Transformers: 4.31.0
* Numba: 0.56.4
* Plotly: 5.13.1
* Python: 3.10.6