File size: 5,389 Bytes
dd89fe0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115

---
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
---

# xsum_123_3000_1500_train

This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. 
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. 

## Usage 

To use this model, please install BERTopic:

```
pip install -U bertopic
```

You can use the model as follows:

```python
from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_123_3000_1500_train")

topic_model.get_topic_info()
```

## Topic overview

* Number of topics: 47
* Number of training documents: 3000

<details>
  <summary>Click here for an overview of all topics.</summary>
  
  | Topic ID | Topic Keywords | Topic Frequency | Label | 
|----------|----------------|-----------------|-------| 
| -1 | said - mr - police - people - would | 5 | -1_said_mr_police_people | 
| 0 | win - game - half - foul - league | 1132 | 0_win_game_half_foul | 
| 1 | eu - labour - party - would - uk | 591 | 1_eu_labour_party_would | 
| 2 | athlete - sport - gold - olympic - medal | 149 | 2_athlete_sport_gold_olympic | 
| 3 | nhs - health - care - patient - hospital | 104 | 3_nhs_health_care_patient | 
| 4 | growth - price - market - sale - economy | 84 | 4_growth_price_market_sale | 
| 5 | president - mr - government - maduro - rousseff | 71 | 5_president_mr_government_maduro | 
| 6 | crash - police - hospital - road - driver | 58 | 6_crash_police_hospital_road | 
| 7 | murray - match - set - tennis - seed | 46 | 7_murray_match_set_tennis | 
| 8 | syrian - us - syria - rebel - force | 45 | 8_syrian_us_syria_rebel | 
| 9 | school - education - pupil - schools - child | 41 | 9_school_education_pupil_schools | 
| 10 | animal - zoo - wildlife - bird - specie | 40 | 10_animal_zoo_wildlife_bird | 
| 11 | film - actor - star - series - drama | 38 | 11_film_actor_star_series | 
| 12 | abuse - court - sexual - police - victim | 38 | 12_abuse_court_sexual_police | 
| 13 | trump - mr - clinton - republican - president | 31 | 13_trump_mr_clinton_republican | 
| 14 | fire - blaze - building - service - firefighters | 31 | 14_fire_blaze_building_service | 
| 15 | suu - party - mr - government - election | 29 | 15_suu_party_mr_government | 
| 16 | china - korea - chinese - south - north | 29 | 16_china_korea_chinese_south | 
| 17 | album - band - song - music - best | 25 | 17_album_band_song_music | 
| 18 | ms - heard - court - death - said | 24 | 18_ms_heard_court_death | 
| 19 | wales - welsh - said - train - government | 23 | 19_wales_welsh_said_train | 
| 20 | road - police - death - seen - found | 23 | 20_road_police_death_seen | 
| 21 | passenger - crew - sea - boat - aircraft | 23 | 21_passenger_crew_sea_boat | 
| 22 | russian - ukraine - russia - mr - ukrainian | 22 | 22_russian_ukraine_russia_mr | 
| 23 | fight - joshua - title - khan - boxing | 22 | 23_fight_joshua_title_khan | 
| 24 | samsung - phone - app - android - user | 20 | 24_samsung_phone_app_android | 
| 25 | earthquake - particle - nepal - building - mars | 19 | 25_earthquake_particle_nepal_building | 
| 26 | highways - traffic - dartford - council - road | 18 | 26_highways_traffic_dartford_council | 
| 27 | vettel - hamilton - lap - race - alonso | 18 | 27_vettel_hamilton_lap_race | 
| 28 | park - building - visitor - festival - visitscotland | 16 | 28_park_building_visitor_festival | 
| 29 | site - council - street - project - plan | 15 | 29_site_council_street_project | 
| 30 | abdeslam - paris - attack - belgian - salah | 15 | 30_abdeslam_paris_attack_belgian | 
| 31 | virus - ebola - disease - hiv - sierra | 14 | 31_virus_ebola_disease_hiv | 
| 32 | security - data - attack - cyber - malware | 14 | 32_security_data_attack_cyber | 
| 33 | dog - dogs - stray - pet - owner | 14 | 33_dog_dogs_stray_pet | 
| 34 | birdie - pga - bogey - woods - open | 13 | 34_birdie_pga_bogey_woods | 
| 35 | man - police - wearing - incident - anyone | 13 | 35_man_police_wearing_incident | 
| 36 | energy - pipeline - waste - renewables - electricity | 13 | 36_energy_pipeline_waste_renewables | 
| 37 | silence - bishop - belfast - people - attended | 11 | 37_silence_bishop_belfast_people | 
| 38 | painting - art - work - artist - exhibition | 11 | 38_painting_art_work_artist | 
| 39 | eyre - gaunt - lyttle - peter - court | 10 | 39_eyre_gaunt_lyttle_peter | 
| 40 | crime - police - force - constable - chief | 9 | 40_crime_police_force_constable | 
| 41 | flood - river - rain - louisiana - flooded | 9 | 41_flood_river_rain_louisiana | 
| 42 | charity - abuse - yentob - porn - batmanghelidjh | 7 | 42_charity_abuse_yentob_porn | 
| 43 | india - nidar - gun - yrf - film | 6 | 43_india_nidar_gun_yrf | 
| 44 | driving - stirling - winn - fraser - road | 6 | 44_driving_stirling_winn_fraser | 
| 45 | boko - haram - shekau - militant - monguno | 5 | 45_boko_haram_shekau_militant |
  
</details>

## Training hyperparameters

* calculate_probabilities: True
* language: english
* low_memory: False
* min_topic_size: 10
* n_gram_range: (1, 1)
* nr_topics: None
* seed_topic_list: None
* top_n_words: 10
* verbose: False

## Framework versions

* Numpy: 1.22.4
* HDBSCAN: 0.8.33
* UMAP: 0.5.3
* Pandas: 1.5.3
* Scikit-Learn: 1.2.2
* Sentence-transformers: 2.2.2
* Transformers: 4.31.0
* Numba: 0.57.1
* Plotly: 5.13.1
* Python: 3.10.12