Edit model card

xsum_108_50000_25000_validation

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_108_50000_25000_validation")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 80
  • Number of training documents: 11332
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - mr - people - would - also 5 -1_said_mr_people_would
0 win - game - league - club - player 4532 0_win_game_league_club
1 police - court - said - mr - officer 2119 1_police_court_said_mr
2 world - sport - olympic - champion - gold 1037 2_world_sport_olympic_champion
3 bank - sale - growth - price - rate 604 3_bank_sale_growth_price
4 data - user - company - firm - mobile 225 4_data_user_company_firm
5 film - star - album - show - best 204 5_film_star_album_show
6 school - education - pupil - teacher - student 183 6_school_education_pupil_teacher
7 nhs - care - patient - hospital - health 107 7_nhs_care_patient_hospital
8 boko - haram - president - un - mr 103 8_boko_haram_president_un
9 labour - party - corbyn - ukip - mps 94 9_labour_party_corbyn_ukip
10 bird - specie - tree - ash - animal 82 10_bird_specie_tree_ash
11 northern - ireland - sinn - fin - dup 77 11_northern_ireland_sinn_fin
12 trump - mr - clinton - republican - president 73 12_trump_mr_clinton_republican
13 fire - blaze - smoke - scene - firefighter 73 13_fire_blaze_smoke_scene
14 water - flood - flooding - river - rain 67 14_water_flood_flooding_river
15 art - museum - artist - gallery - auction 63 15_art_museum_artist_gallery
16 transport - road - route - traffic - bridge 58 16_transport_road_route_traffic
17 ebola - virus - outbreak - vaccine - health 54 17_ebola_virus_outbreak_vaccine
18 syrian - syria - rebel - aleppo - iraq 53 18_syrian_syria_rebel_aleppo
19 race - hamilton - f1 - mercedes - rosberg 51 19_race_hamilton_f1_mercedes
20 welsh - wales - plaid - assembly - labour 51 20_welsh_wales_plaid_assembly
21 council - site - building - planning - centre 49 21_council_site_building_planning
22 rmt - rail - train - strike - service 47 22_rmt_rail_train_strike
23 ice - space - satellite - scientist - earth 47 23_ice_space_satellite_scientist
24 flight - plane - airport - aircraft - pilot 45 24_flight_plane_airport_aircraft
25 korea - north - china - korean - us 43 25_korea_north_china_korean
26 eu - uk - brexit - migration - immigration 43 26_eu_uk_brexit_migration
27 taliban - afghan - afghanistan - kabul - attack 43 27_taliban_afghan_afghanistan_kabul
28 hong - kong - china - chinese - liu 42 28_hong_kong_china_chinese
29 energy - wind - turbine - farm - electricity 42 29_energy_wind_turbine_farm
30 maduro - farc - president - venezuela - opposition 42 30_maduro_farc_president_venezuela
31 scottish - scotland - referendum - snp - independence 41 31_scottish_scotland_referendum_snp
32 cancer - risk - woman - study - diabetes 41 32_cancer_risk_woman_study
33 war - battle - royal - service - regiment 40 33_war_battle_royal_service
34 yn - ar - ei - yr - bod 40 34_yn_ar_ei_yr
35 space - mars - iss - astronaut - updated 39 35_space_mars_iss_astronaut
36 india - indias - delhi - hindu - indian 39 36_india_indias_delhi_hindu
37 russia - russian - ukraine - putin - president 38 37_russia_russian_ukraine_putin
38 tax - budget - chancellor - government - osborne 37 38_tax_budget_chancellor_government
39 coastguard - boat - lifeboat - rnli - rescue 32 39_coastguard_boat_lifeboat_rnli
40 elephant - ivory - zoo - animal - rhino 32 40_elephant_ivory_zoo_animal
41 pension - pay - wage - income - pot 32 41_pension_pay_wage_income
42 abortion - marriage - woman - samesex - gay 31 42_abortion_marriage_woman_samesex
43 paris - french - attack - france - jewish 29 43_paris_french_attack_france
44 dog - pet - animal - hare - police 26 44_dog_pet_animal_hare
45 unsupported - updated - playback - device - media 25 45_unsupported_updated_playback_device
46 pollution - waste - air - bag - bin 24 46_pollution_waste_air_bag
47 climate - carbon - emission - coal - gas 24 47_climate_carbon_emission_coal
48 mortgage - credit - lender - price - property 24 48_mortgage_credit_lender_price
49 ahmed - court - terrorism - police - naseer 24 49_ahmed_court_terrorism_police
50 train - driver - raib - incident - rail 23 50_train_driver_raib_incident
51 hoard - coin - museum - display - found 23 51_hoard_coin_museum_display
52 eu - trade - uk - market - brexit 23 52_eu_trade_uk_market
53 greece - greek - eurozone - debt - bailout 22 53_greece_greek_eurozone_debt
54 whale - shark - water - dolphin - fish 21 54_whale_shark_water_dolphin
55 steel - tata - industry - plant - uk 17 55_steel_tata_industry_plant
56 prince - duchess - duke - royal - princess 15 56_prince_duchess_duke_royal
57 camp - nazi - locsin - germany - extradition 15 57_camp_nazi_locsin_germany
58 migrant - refugee - border - turkey - greek 15 58_migrant_refugee_border_turkey
59 calais - migrant - camp - jungle - refugee 14 59_calais_migrant_camp_jungle
60 gun - violence - police - shooting - baltimore 14 60_gun_violence_police_shooting
61 rousseff - impeachment - senate - temer - petrobras 13 61_rousseff_impeachment_senate_temer
62 macron - le - germany - pen - macrons 13 62_macron_le_germany_pen
63 turkey - erdogan - turkish - turkeys - hdp 13 63_turkey_erdogan_turkish_turkeys
64 lodge - belfast - ballysillan - parade - paper 12 64_lodge_belfast_ballysillan_parade
65 castle - hull - staffin - crofters - house 11 65_castle_hull_staffin_crofters
66 food - cocoa - fairtrade - advertising - sale 10 66_food_cocoa_fairtrade_advertising
67 israel - hamas - israeli - gaza - palestinians 10 67_israel_hamas_israeli_gaza
68 bank - account - rbs - overdraft - note 10 68_bank_account_rbs_overdraft
69 runway - airport - heathrow - airports - flight 8 69_runway_airport_heathrow_airports
70 rescue - mountain - avalanche - climbing - nepal 7 70_rescue_mountain_avalanche_climbing
71 council - privatisation - deal - government - carmarthenshires 7 71_council_privatisation_deal_government
72 ruddy - inla - information - police - family 6 72_ruddy_inla_information_police
73 drug - prescribed - psychoactive - drugs - lyrica 6 73_drug_prescribed_psychoactive_drugs
74 kitty - book - author - publisher - prize 6 74_kitty_book_author_publisher
75 search - cabin - airways - plane - amsa 6 75_search_cabin_airways_plane
76 sterkel - alert - examined - detonated - rifle 6 76_sterkel_alert_examined_detonated
77 ambulance - service - aberglaslyn - called - inverclyde 5 77_ambulance_service_aberglaslyn_called
78 research - 3d - science - prof - vision 5 78_research_3d_science_prof

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
1
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.