Edit model card

xsum_123_3000_1500_validation

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_123_3000_1500_validation")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 27
  • Number of training documents: 1500
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - mr - would - also - year 5 -1_said_mr_would_also
0 police - said - court - mr - found 654 0_police_said_court_mr
1 said - council - site - would - development 101 1_said_council_site_would
2 attack - killed - taliban - syria - government 68 2_attack_killed_taliban_syria
3 gold - world - race - sport - olympic 57 3_gold_world_race_sport
4 price - bank - rate - share - company 53 4_price_bank_rate_share
5 party - vote - labour - mr - ukip 52 5_party_vote_labour_mr
6 league - season - player - premier - club 49 6_league_season_player_premier
7 cricket - england - wicket - game - test 45 7_cricket_england_wicket_game
8 crash - car - road - accident - said 42 8_crash_car_road_accident
9 wales - game - rugby - davies - hes 40 9_wales_game_rugby_davies
10 patient - health - hospital - service - ambulance 34 10_patient_health_hospital_service
11 foul - corner - half - box - kick 32 11_foul_corner_half_box
12 trump - clinton - mr - mrs - us 30 12_trump_clinton_mr_mrs
13 president - mr - africa - south - mugabe 30 13_president_mr_africa_south
14 animal - dog - bird - said - rspca 28 14_animal_dog_bird_said
15 school - education - teacher - child - pupil 26 15_school_education_teacher_child
16 world - round - number - murray - court 25 16_world_round_number_murray
17 northern - ireland - party - dup - sinn 23 17_northern_ireland_party_dup
18 album - song - like - music - band 17 18_album_song_like_music
19 fire - building - police - blaze - service 16 19_fire_building_police_blaze
20 fossil - brontosaurus - dinosaur - found - animal 16 20_fossil_brontosaurus_dinosaur_found
21 film - star - artist - novel - photograph 16 21_film_star_artist_novel
22 wage - income - living - tax - uk 11 22_wage_income_living_tax
23 gang - guerrero - prison - state - police 11 23_gang_guerrero_prison_state
24 albion - brighton - hove - burton - wigan 11 24_albion_brighton_hove_burton
25 3d - space - kelly - cmdr - flight 8 25_3d_space_kelly_cmdr

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
2
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.