KingKazma's picture
Add BERTopic model
684528e
---
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
---
# cnn_dailymail_123_3000_1500_train
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
## Usage
To use this model, please install BERTopic:
```
pip install -U bertopic
```
You can use the model as follows:
```python
from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_123_3000_1500_train")
topic_model.get_topic_info()
```
## Topic overview
* Number of topics: 57
* Number of training documents: 3000
<details>
<summary>Click here for an overview of all topics.</summary>
| Topic ID | Topic Keywords | Topic Frequency | Label |
|----------|----------------|-----------------|-------|
| -1 | said - one - police - people - year | 10 | -1_said_one_police_people |
| 0 | league - player - cup - goal - game | 1070 | 0_league_player_cup_goal |
| 1 | police - said - home - murder - found | 320 | 1_police_said_home_murder |
| 2 | court - mr - said - year - sex | 142 | 2_court_mr_said_year |
| 3 | obama - president - republicans - house - republican | 113 | 3_obama_president_republicans_house |
| 4 | plane - flight - passenger - airport - aircraft | 89 | 4_plane_flight_passenger_airport |
| 5 | hospital - care - family - baby - mr | 59 | 5_hospital_care_family_baby |
| 6 | fashion - dress - style - look - collection | 57 | 6_fashion_dress_style_look |
| 7 | mr - minister - cameron - party - labour | 50 | 7_mr_minister_cameron_party |
| 8 | weight - diet - food - fat - school | 49 | 8_weight_diet_food_fat |
| 9 | mars - space - climate - nasa - mission | 43 | 9_mars_space_climate_nasa |
| 10 | apple - ipad - iphone - app - apples | 41 | 10_apple_ipad_iphone_app |
| 11 | shark - dolphin - fish - coast - water | 39 | 11_shark_dolphin_fish_coast |
| 12 | teacher - school - student - said - state | 37 | 12_teacher_school_student_said |
| 13 | murray - wimbledon - win - champion - match | 36 | 13_murray_wimbledon_win_champion |
| 14 | race - prix - hamilton - gold - world | 33 | 14_race_prix_hamilton_gold |
| 15 | dog - animal - owner - dogs - tiger | 32 | 15_dog_animal_owner_dogs |
| 16 | syrian - syria - isis - islamic - force | 32 | 16_syrian_syria_isis_islamic |
| 17 | storm - weather - lava - snow - said | 32 | 17_storm_weather_lava_snow |
| 18 | chocolate - sale - cent - online - caramel | 32 | 18_chocolate_sale_cent_online |
| 19 | afghanistan - afghan - pakistan - herat - taliban | 32 | 19_afghanistan_afghan_pakistan_herat |
| 20 | music - band - halen - song - album | 30 | 20_music_band_halen_song |
| 21 | beach - island - resort - park - hotel | 29 | 21_beach_island_resort_park |
| 22 | mcilroy - golf - round - shot - hole | 27 | 22_mcilroy_golf_round_shot |
| 23 | text - data - nsa - credit - email | 26 | 23_text_data_nsa_credit |
| 24 | show - film - movie - actor - griffiths | 26 | 24_show_film_movie_actor |
| 25 | putin - russian - russia - ukraine - moscow | 26 | 25_putin_russian_russia_ukraine |
| 26 | art - artist - work - painting - pinata | 25 | 26_art_artist_work_painting |
| 27 | economy - eurozone - european - euro - debt | 24 | 27_economy_eurozone_european_euro |
| 28 | north - kim - korea - korean - jong | 24 | 28_north_kim_korea_korean |
| 29 | ebola - virus - liberia - africa - outbreak | 22 | 29_ebola_virus_liberia_africa |
| 30 | bike - speed - road - driver - cyclist | 22 | 30_bike_speed_road_driver |
| 31 | car - accident - driver - scene - crash | 20 | 31_car_accident_driver_scene |
| 32 | price - london - house - home - property | 20 | 32_price_london_house_home |
| 33 | al - qaeda - yemen - us - yemeni | 20 | 33_al_qaeda_yemen_us |
| 34 | mrs - police - murder - greaves - mr | 20 | 34_mrs_police_murder_greaves |
| 35 | per - cent - people - age - average | 19 | 35_per_cent_people_age |
| 36 | philpott - court - berry - husband - dewani | 18 | 36_philpott_court_berry_husband |
| 37 | facebook - photo - user - instagram - cuddle | 17 | 37_facebook_photo_user_instagram |
| 38 | vaccine - meningitis - disease - flu - princeton | 17 | 38_vaccine_meningitis_disease_flu |
| 39 | bear - lion - gorilla - cub - zoo | 16 | 39_bear_lion_gorilla_cub |
| 40 | brain - drug - alzheimers - memory - patient | 16 | 40_brain_drug_alzheimers_memory |
| 41 | prince - royal - queen - duchess - duke | 16 | 41_prince_royal_queen_duchess |
| 42 | boat - ship - river - vessel - ferry | 15 | 42_boat_ship_river_vessel |
| 43 | china - chinese - chinas - organ - hong | 14 | 43_china_chinese_chinas_organ |
| 44 | egypt - election - egyptian - mubarak - protest | 13 | 44_egypt_election_egyptian_mubarak |
| 45 | mexico - mexican - cartel - mexicos - drug | 13 | 45_mexico_mexican_cartel_mexicos |
| 46 | cia - assange - snowden - us - interrogation | 13 | 46_cia_assange_snowden_us |
| 47 | police - hartman - hore - store - maitua | 13 | 47_police_hartman_hore_store |
| 48 | israeli - israel - palestinian - gaza - hamas | 12 | 48_israeli_israel_palestinian_gaza |
| 49 | pension - tax - scheme - energy - cent | 12 | 49_pension_tax_scheme_energy |
| 50 | council - neighbour - village - site - shed | 12 | 50_council_neighbour_village_site |
| 51 | occupy - protester - york - cosby - mayor | 11 | 51_occupy_protester_york_cosby |
| 52 | mould - allergic - allergy - reaction - hand | 11 | 52_mould_allergic_allergy_reaction |
| 53 | boko - haram - nigeria - sudan - isis | 11 | 53_boko_haram_nigeria_sudan |
| 54 | disaster - building - tsunami - people - quake | 11 | 54_disaster_building_tsunami_people |
| 55 | castro - sloot - der - ariel - aruba | 11 | 55_castro_sloot_der_ariel |
</details>
## Training hyperparameters
* calculate_probabilities: True
* language: english
* low_memory: False
* min_topic_size: 10
* n_gram_range: (1, 1)
* nr_topics: None
* seed_topic_list: None
* top_n_words: 10
* verbose: False
## Framework versions
* Numpy: 1.22.4
* HDBSCAN: 0.8.33
* UMAP: 0.5.3
* Pandas: 1.5.3
* Scikit-Learn: 1.2.2
* Sentence-transformers: 2.2.2
* Transformers: 4.31.0
* Numba: 0.56.4
* Plotly: 5.13.1
* Python: 3.10.6