--- tags: - bertopic library_name: bertopic pipeline_tag: text-classification --- # xsum_123_3000_1500_train This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. ## Usage To use this model, please install BERTopic: ``` pip install -U bertopic ``` You can use the model as follows: ```python from bertopic import BERTopic topic_model = BERTopic.load("KingKazma/xsum_123_3000_1500_train") topic_model.get_topic_info() ``` ## Topic overview * Number of topics: 47 * Number of training documents: 3000
Click here for an overview of all topics. | Topic ID | Topic Keywords | Topic Frequency | Label | |----------|----------------|-----------------|-------| | -1 | said - mr - police - people - would | 5 | -1_said_mr_police_people | | 0 | win - game - half - foul - league | 1132 | 0_win_game_half_foul | | 1 | eu - labour - party - would - uk | 591 | 1_eu_labour_party_would | | 2 | athlete - sport - gold - olympic - medal | 149 | 2_athlete_sport_gold_olympic | | 3 | nhs - health - care - patient - hospital | 104 | 3_nhs_health_care_patient | | 4 | growth - price - market - sale - economy | 84 | 4_growth_price_market_sale | | 5 | president - mr - government - maduro - rousseff | 71 | 5_president_mr_government_maduro | | 6 | crash - police - hospital - road - driver | 58 | 6_crash_police_hospital_road | | 7 | murray - match - set - tennis - seed | 46 | 7_murray_match_set_tennis | | 8 | syrian - us - syria - rebel - force | 45 | 8_syrian_us_syria_rebel | | 9 | school - education - pupil - schools - child | 41 | 9_school_education_pupil_schools | | 10 | animal - zoo - wildlife - bird - specie | 40 | 10_animal_zoo_wildlife_bird | | 11 | film - actor - star - series - drama | 38 | 11_film_actor_star_series | | 12 | abuse - court - sexual - police - victim | 38 | 12_abuse_court_sexual_police | | 13 | trump - mr - clinton - republican - president | 31 | 13_trump_mr_clinton_republican | | 14 | fire - blaze - building - service - firefighters | 31 | 14_fire_blaze_building_service | | 15 | suu - party - mr - government - election | 29 | 15_suu_party_mr_government | | 16 | china - korea - chinese - south - north | 29 | 16_china_korea_chinese_south | | 17 | album - band - song - music - best | 25 | 17_album_band_song_music | | 18 | ms - heard - court - death - said | 24 | 18_ms_heard_court_death | | 19 | wales - welsh - said - train - government | 23 | 19_wales_welsh_said_train | | 20 | road - police - death - seen - found | 23 | 20_road_police_death_seen | | 21 | passenger - crew - sea - boat - aircraft | 23 | 21_passenger_crew_sea_boat | | 22 | russian - ukraine - russia - mr - ukrainian | 22 | 22_russian_ukraine_russia_mr | | 23 | fight - joshua - title - khan - boxing | 22 | 23_fight_joshua_title_khan | | 24 | samsung - phone - app - android - user | 20 | 24_samsung_phone_app_android | | 25 | earthquake - particle - nepal - building - mars | 19 | 25_earthquake_particle_nepal_building | | 26 | highways - traffic - dartford - council - road | 18 | 26_highways_traffic_dartford_council | | 27 | vettel - hamilton - lap - race - alonso | 18 | 27_vettel_hamilton_lap_race | | 28 | park - building - visitor - festival - visitscotland | 16 | 28_park_building_visitor_festival | | 29 | site - council - street - project - plan | 15 | 29_site_council_street_project | | 30 | abdeslam - paris - attack - belgian - salah | 15 | 30_abdeslam_paris_attack_belgian | | 31 | virus - ebola - disease - hiv - sierra | 14 | 31_virus_ebola_disease_hiv | | 32 | security - data - attack - cyber - malware | 14 | 32_security_data_attack_cyber | | 33 | dog - dogs - stray - pet - owner | 14 | 33_dog_dogs_stray_pet | | 34 | birdie - pga - bogey - woods - open | 13 | 34_birdie_pga_bogey_woods | | 35 | man - police - wearing - incident - anyone | 13 | 35_man_police_wearing_incident | | 36 | energy - pipeline - waste - renewables - electricity | 13 | 36_energy_pipeline_waste_renewables | | 37 | silence - bishop - belfast - people - attended | 11 | 37_silence_bishop_belfast_people | | 38 | painting - art - work - artist - exhibition | 11 | 38_painting_art_work_artist | | 39 | eyre - gaunt - lyttle - peter - court | 10 | 39_eyre_gaunt_lyttle_peter | | 40 | crime - police - force - constable - chief | 9 | 40_crime_police_force_constable | | 41 | flood - river - rain - louisiana - flooded | 9 | 41_flood_river_rain_louisiana | | 42 | charity - abuse - yentob - porn - batmanghelidjh | 7 | 42_charity_abuse_yentob_porn | | 43 | india - nidar - gun - yrf - film | 6 | 43_india_nidar_gun_yrf | | 44 | driving - stirling - winn - fraser - road | 6 | 44_driving_stirling_winn_fraser | | 45 | boko - haram - shekau - militant - monguno | 5 | 45_boko_haram_shekau_militant |
## Training hyperparameters * calculate_probabilities: True * language: english * low_memory: False * min_topic_size: 10 * n_gram_range: (1, 1) * nr_topics: None * seed_topic_list: None * top_n_words: 10 * verbose: False ## Framework versions * Numpy: 1.22.4 * HDBSCAN: 0.8.33 * UMAP: 0.5.3 * Pandas: 1.5.3 * Scikit-Learn: 1.2.2 * Sentence-transformers: 2.2.2 * Transformers: 4.31.0 * Numba: 0.57.1 * Plotly: 5.13.1 * Python: 3.10.12