Add BERTopic model
Browse files- README.md +240 -0
- config.json +16 -0
- ctfidf.safetensors +3 -0
- ctfidf_config.json +0 -0
- topic_embeddings.safetensors +3 -0
- topics.json +0 -0
README.md
ADDED
@@ -0,0 +1,240 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
tags:
|
4 |
+
- bertopic
|
5 |
+
library_name: bertopic
|
6 |
+
pipeline_tag: text-classification
|
7 |
+
---
|
8 |
+
|
9 |
+
# ArXiv
|
10 |
+
|
11 |
+
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
12 |
+
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
13 |
+
|
14 |
+
## Usage
|
15 |
+
|
16 |
+
To use this model, please install BERTopic:
|
17 |
+
|
18 |
+
```
|
19 |
+
pip install -U bertopic
|
20 |
+
```
|
21 |
+
|
22 |
+
You can use the model as follows:
|
23 |
+
|
24 |
+
```python
|
25 |
+
from bertopic import BERTopic
|
26 |
+
topic_model = BERTopic.load("OSN2/ArXiv")
|
27 |
+
|
28 |
+
topic_model.get_topic_info()
|
29 |
+
```
|
30 |
+
|
31 |
+
## Topic overview
|
32 |
+
|
33 |
+
* Number of topics: 171
|
34 |
+
* Number of training documents: 12693
|
35 |
+
|
36 |
+
<details>
|
37 |
+
<summary>Click here for an overview of all topics.</summary>
|
38 |
+
|
39 |
+
| Topic ID | Topic Keywords | Topic Frequency | Label |
|
40 |
+
|----------|----------------|-----------------|-------|
|
41 |
+
| -1 | the - and - to - of - in | 15 | -1_the_and_to_of |
|
42 |
+
| 0 | recipe - food - recipes - pizza - salad | 3814 | 0_recipe_food_recipes_pizza |
|
43 |
+
| 1 | trump - election - law - the - that | 849 | 1_trump_election_law_the |
|
44 |
+
| 2 | anaysa - fashion - pants - swimwear - sneakers | 393 | 2_anaysa_fashion_pants_swimwear |
|
45 |
+
| 3 | arsenal - liverpool - rugby - match - haaland | 382 | 3_arsenal_liverpool_rugby_match |
|
46 |
+
| 4 | weather - bengal - storm - west - snow | 271 | 4_weather_bengal_storm_west |
|
47 |
+
| 5 | crypto - bitcoin - cryptocurrency - gaming - trading | 172 | 5_crypto_bitcoin_cryptocurrency_gaming |
|
48 |
+
| 6 | her - she - was - on - related | 143 | 6_her_she_was_on |
|
49 |
+
| 7 | 420m - dog - animal - animals - dogs | 138 | 7_420m_dog_animal_animals |
|
50 |
+
| 8 | god - lord - prayer - jesus - church | 127 | 8_god_lord_prayer_jesus |
|
51 |
+
| 9 | cars - sale - used - under - for | 119 | 9_cars_sale_used_under |
|
52 |
+
| 10 | pro - vivo - v23 - phone - google | 117 | 10_pro_vivo_v23_phone |
|
53 |
+
| 11 | news - iptv - tv - interview - latest | 110 | 11_news_iptv_tv_interview |
|
54 |
+
| 12 | art - museum - artists - artist - of | 108 | 12_art_museum_artists_artist |
|
55 |
+
| 13 | my - nephews - nieces - poetry - love | 107 | 13_my_nephews_nieces_poetry |
|
56 |
+
| 14 | film - review - his - as - but | 102 | 14_film_review_his_as |
|
57 |
+
| 15 | bike - helmet - bikes - mountain - pilots | 98 | 15_bike_helmet_bikes_mountain |
|
58 |
+
| 16 | hair - bite - steel - care - haircut | 97 | 16_hair_bite_steel_care |
|
59 |
+
| 17 | police - rhonda - mcdowell - was - said | 90 | 17_police_rhonda_mcdowell_was |
|
60 |
+
| 18 | property - room - bedrooms - bedroom - home | 86 | 18_property_room_bedrooms_bedroom |
|
61 |
+
| 19 | ukraine - russia - russian - putin - news | 86 | 19_ukraine_russia_russian_putin |
|
62 |
+
| 20 | business - jobs - income - data - part | 86 | 20_business_jobs_income_data |
|
63 |
+
| 21 | vaccinated - vaccine - covid - va - unvaccinated | 84 | 21_vaccinated_vaccine_covid_va |
|
64 |
+
| 22 | music - band - students - orchestra - tickets | 83 | 22_music_band_students_orchestra |
|
65 |
+
| 23 | workout - abs - workouts - fitness - exercise | 83 | 23_workout_abs_workouts_fitness |
|
66 |
+
| 24 | school - teachers - dmc - 804 - children | 83 | 24_school_teachers_dmc_804 |
|
67 |
+
| 25 | women - robotics - bali - spanish - lutheran | 82 | 25_women_robotics_bali_spanish |
|
68 |
+
| 26 | lima - tourism - parks - urban - our | 79 | 26_lima_tourism_parks_urban |
|
69 |
+
| 27 | godzilla - movies - movie - spider - marvel | 77 | 27_godzilla_movies_movie_spider |
|
70 |
+
| 28 | fishing - backpacks - fish - packs - swimming | 74 | 28_fishing_backpacks_fish_packs |
|
71 |
+
| 29 | yoga - stretching - kru - nidra - oct | 74 | 29_yoga_stretching_kru_nidra |
|
72 |
+
| 30 | researchers - species - of - the - university | 72 | 30_researchers_species_of_the |
|
73 |
+
| 31 | wholesale - market - saree - delhi - software | 71 | 31_wholesale_market_saree_delhi |
|
74 |
+
| 32 | skin - acne - cream - blackheads - whitening | 70 | 32_skin_acne_cream_blackheads |
|
75 |
+
| 33 | rodents - pets - pest - dogs - animals | 70 | 33_rodents_pets_pest_dogs |
|
76 |
+
| 34 | books - book - salinger - fiction - literary | 67 | 34_books_book_salinger_fiction |
|
77 |
+
| 35 | class - pst - exams - preparation - test | 66 | 35_class_pst_exams_preparation |
|
78 |
+
| 36 | 5g - airlines - bsnl - flight - network | 64 | 36_5g_airlines_bsnl_flight |
|
79 |
+
| 37 | treetops - dementia - children - people - barbara | 62 | 37_treetops_dementia_children_people |
|
80 |
+
| 38 | lottery - thai - thailand - lotto - win | 62 | 38_lottery_thai_thailand_lotto |
|
81 |
+
| 39 | wedding - weddings - survival - gift - day | 61 | 39_wedding_weddings_survival_gift |
|
82 |
+
| 40 | quantum - solar - energy - material - light | 61 | 40_quantum_solar_energy_material |
|
83 |
+
| 41 | beauty - makeup - products - sephora - skin | 60 | 41_beauty_makeup_products_sephora |
|
84 |
+
| 42 | games - xbox - game - solitaire - free | 60 | 42_games_xbox_game_solitaire |
|
85 |
+
| 43 | insurance - insurers - insurer - company - aig | 59 | 43_insurance_insurers_insurer_company |
|
86 |
+
| 44 | green - saf - haiti - industry - solar | 58 | 44_green_saf_haiti_industry |
|
87 |
+
| 45 | diet - meat - foods - plant - body | 55 | 45_diet_meat_foods_plant |
|
88 |
+
| 46 | edinburgh - tour - royal - travel - castle | 55 | 46_edinburgh_tour_royal_travel |
|
89 |
+
| 47 | horses - horse - friesian - goëngamieden - post | 54 | 47_horses_horse_friesian_goëngamieden |
|
90 |
+
| 48 | your - you - mental - health - anal | 51 | 48_your_you_mental_health |
|
91 |
+
| 49 | weight - obesity - loss - lose - fat | 51 | 49_weight_obesity_loss_lose |
|
92 |
+
| 50 | estate - real - property - home - you | 50 | 50_estate_real_property_home |
|
93 |
+
| 51 | camping - surfing - guess - landmark - lego | 50 | 51_camping_surfing_guess_landmark |
|
94 |
+
| 52 | dorm - sex - birthday - my - joy | 50 | 52_dorm_sex_birthday_my |
|
95 |
+
| 53 | covid - 19 - vaccinated - vaccine - cases | 50 | 53_covid_19_vaccinated_vaccine |
|
96 |
+
| 54 | spain - morocco - gas - energy - industry | 49 | 54_spain_morocco_gas_energy |
|
97 |
+
| 55 | gardening - garden - grow - plants - fertilizer | 49 | 55_gardening_garden_grow_plants |
|
98 |
+
| 56 | tenant - transport - apartments - department - condos | 49 | 56_tenant_transport_apartments_department |
|
99 |
+
| 57 | cricket - england - engw - indw - vs | 48 | 57_cricket_england_engw_indw |
|
100 |
+
| 58 | trump - election - party - votes - former | 48 | 58_trump_election_party_votes |
|
101 |
+
| 59 | tesla - marine - electric - musk - ev | 47 | 59_tesla_marine_electric_musk |
|
102 |
+
| 60 | surf - surfing - ski - swimming - lessons | 47 | 60_surf_surfing_ski_swimming |
|
103 |
+
| 61 | disabled - disability - thailand - scholarship - scholarships | 47 | 61_disabled_disability_thailand_scholarship |
|
104 |
+
| 62 | programming - udemy - svelte - language - courses | 44 | 62_programming_udemy_svelte_language |
|
105 |
+
| 63 | diy - ideas - desk - wood - woodworking | 43 | 63_diy_ideas_desk_wood |
|
106 |
+
| 64 | wrestling - pearson - tiga - wwe - nfl | 43 | 64_wrestling_pearson_tiga_wwe |
|
107 |
+
| 65 | smart - gadgets - appliances - home - kitchen | 42 | 65_smart_gadgets_appliances_home |
|
108 |
+
| 66 | experiments - fu - kung - xxxtentacion - copyright | 40 | 66_experiments_fu_kung_xxxtentacion |
|
109 |
+
| 67 | job - small - businesses - hiring - business | 40 | 67_job_small_businesses_hiring |
|
110 |
+
| 68 | hiv - health - care - hospital - hospice | 40 | 68_hiv_health_care_hospital |
|
111 |
+
| 69 | he - was - it - empire - movie | 38 | 69_he_was_it_empire |
|
112 |
+
| 70 | beat - type - ringtone - lofi - beats | 37 | 70_beat_type_ringtone_lofi |
|
113 |
+
| 71 | castellvi - marines - marine - corps - county | 37 | 71_castellvi_marines_marine_corps |
|
114 |
+
| 72 | casino - xbox - game - games - poker | 37 | 72_casino_xbox_game_games |
|
115 |
+
| 73 | bellanaijaweddings - bride - handmadepaper - weddingplanner - makeup | 36 | 73_bellanaijaweddings_bride_handmadepaper_weddingplanner |
|
116 |
+
| 74 | music - jsem - bushcraft - se - festival | 36 | 74_music_jsem_bushcraft_se |
|
117 |
+
| 75 | gemini - tarot - horoscope - september - pisces | 35 | 75_gemini_tarot_horoscope_september |
|
118 |
+
| 76 | career - husni - magazines - magazine - employees | 35 | 76_career_husni_magazines_magazine |
|
119 |
+
| 77 | his - film - movie - review - but | 34 | 77_his_film_movie_review |
|
120 |
+
| 78 | gps - aircraft - trucks - vehicles - electric | 34 | 78_gps_aircraft_trucks_vehicles |
|
121 |
+
| 79 | raya - merch - magazines - cards - kongamidyearshoppingfestival | 34 | 79_raya_merch_magazines_cards |
|
122 |
+
| 80 | baby - she - birth - says - women | 34 | 80_baby_she_birth_says |
|
123 |
+
| 81 | covid - 19 - uk - health - interventions | 33 | 81_covid_19_uk_health |
|
124 |
+
| 82 | climate - gore - dm - eastman - change | 33 | 82_climate_gore_dm_eastman |
|
125 |
+
| 83 | buhari - anambra - apc - anyim - chief | 32 | 83_buhari_anambra_apc_anyim |
|
126 |
+
| 84 | orchestra - hotel - janice - chicago - symphony | 31 | 84_orchestra_hotel_janice_chicago |
|
127 |
+
| 85 | ramen - pierre - soulz - magic - westfieldcarousel | 31 | 85_ramen_pierre_soulz_magic |
|
128 |
+
| 86 | interior - design - home - decorate - bedroom | 30 | 86_interior_design_home_decorate |
|
129 |
+
| 87 | hindi - movie - explained - hollywood - lankybox | 30 | 87_hindi_movie_explained_hollywood |
|
130 |
+
| 88 | xbox - playstation - game - card - console | 30 | 88_xbox_playstation_game_card |
|
131 |
+
| 89 | insurance - car - policy - feener - policyworld | 30 | 89_insurance_car_policy_feener |
|
132 |
+
| 90 | share - nepal - stock - market - analysis | 29 | 90_share_nepal_stock_market |
|
133 |
+
| 91 | marketing - content - strategy - cart - your | 28 | 91_marketing_content_strategy_cart |
|
134 |
+
| 92 | songs - kids - song - rhymes - hindi | 28 | 92_songs_kids_song_rhymes |
|
135 |
+
| 93 | tax - cd - money - itr - 401 | 27 | 93_tax_cd_money_itr |
|
136 |
+
| 94 | inflation - housing - prices - chorley - hydrow | 27 | 94_inflation_housing_prices_chorley |
|
137 |
+
| 95 | venkat - spectre - spending - attacks - intel | 26 | 95_venkat_spectre_spending_attacks |
|
138 |
+
| 96 | band - grammys - recording - musical - doo | 26 | 96_band_grammys_recording_musical |
|
139 |
+
| 97 | drawing - draw - art - mandala - painting | 26 | 97_drawing_draw_art_mandala |
|
140 |
+
| 98 | shop - insurance - design - restaurant - food | 26 | 98_shop_insurance_design_restaurant |
|
141 |
+
| 99 | kamran - feride - iqiyi - drama - selim | 26 | 99_kamran_feride_iqiyi_drama |
|
142 |
+
| 100 | poetry - prize - mondaymotivation - publication - apologize | 26 | 100_poetry_prize_mondaymotivation_publication |
|
143 |
+
| 101 | jobs - tcs - part - job - work | 25 | 101_jobs_tcs_part_job |
|
144 |
+
| 102 | card - credit - rewards - cash - tracking | 25 | 102_card_credit_rewards_cash |
|
145 |
+
| 103 | vlog - vlogs - dexerto - video - blog | 25 | 103_vlog_vlogs_dexerto_video |
|
146 |
+
| 104 | brother - 5½ - burge - poetry - thank | 25 | 104_brother_5½_burge_poetry |
|
147 |
+
| 105 | anime - manga - disney - animes - recap | 25 | 105_anime_manga_disney_animes |
|
148 |
+
| 106 | fox - news - msnbc - biden - business | 25 | 106_fox_news_msnbc_biden |
|
149 |
+
| 107 | thoreau - wildness - maldives - malé - wildlife | 24 | 107_thoreau_wildness_maldives_malé |
|
150 |
+
| 108 | condo - minutes - rent - condominium - เช | 24 | 108_condo_minutes_rent_condominium |
|
151 |
+
| 109 | freshworks - sales - requirements - job - development | 24 | 109_freshworks_sales_requirements_job |
|
152 |
+
| 110 | insurance - management - property - company - loans | 24 | 110_insurance_management_property_company |
|
153 |
+
| 111 | aew - wrestling - highlights - esports - impact | 23 | 111_aew_wrestling_highlights_esports |
|
154 |
+
| 112 | ctv - cbc - _x000d_ - news - bridge | 23 | 112_ctv_cbc__x000d__news |
|
155 |
+
| 113 | ukrainian - music - lyatoshynsky - solos - concert | 23 | 113_ukrainian_music_lyatoshynsky_solos |
|
156 |
+
| 114 | abc - ladzinski - campaign - carlton - news | 23 | 114_abc_ladzinski_campaign_carlton |
|
157 |
+
| 115 | gaming - pc - headset - byte - cosmic | 23 | 115_gaming_pc_headset_byte |
|
158 |
+
| 116 | climate - environmental - noaa - literacy - education | 23 | 116_climate_environmental_noaa_literacy |
|
159 |
+
| 117 | game - players - sonic - its - the | 22 | 117_game_players_sonic_its |
|
160 |
+
| 118 | olympic - olympics - chen - biles - medal | 22 | 118_olympic_olympics_chen_biles |
|
161 |
+
| 119 | loans - loan - student - paying - naira | 22 | 119_loans_loan_student_paying |
|
162 |
+
| 120 | nail - art - nails - compilation - acrylic | 22 | 120_nail_art_nails_compilation |
|
163 |
+
| 121 | peppa - pig - wolfoo - nguyen - favorite | 21 | 121_peppa_pig_wolfoo_nguyen |
|
164 |
+
| 122 | jazz - music - blues - heat - waves | 21 | 122_jazz_music_blues_heat |
|
165 |
+
| 123 | rónán - march - composer - lyricist - tickets | 21 | 123_rónán_march_composer_lyricist |
|
166 |
+
| 124 | olympic - beijing - olympics - china - athletes | 21 | 124_olympic_beijing_olympics_china |
|
167 |
+
| 125 | smoking - breakover - smokers - heart - hind | 21 | 125_smoking_breakover_smokers_heart |
|
168 |
+
| 126 | pets - animals - pet - panda - dog | 21 | 126_pets_animals_pet_panda |
|
169 |
+
| 127 | cycling - gcn - bike - feroce - wheels | 21 | 127_cycling_gcn_bike_feroce |
|
170 |
+
| 128 | musique - proposée - libre - par - la | 21 | 128_musique_proposée_libre_par |
|
171 |
+
| 129 | male - girlfriend - roseanne - unagi - twohill | 20 | 129_male_girlfriend_roseanne_unagi |
|
172 |
+
| 130 | gymnastics - moana - always - drugs - week | 20 | 130_gymnastics_moana_always_drugs |
|
173 |
+
| 131 | musk - gambling - twitter - elon - deduction | 20 | 131_musk_gambling_twitter_elon |
|
174 |
+
| 132 | lichfield - google - sat - stoke - mon | 20 | 132_lichfield_google_sat_stoke |
|
175 |
+
| 133 | reasonable - greenhouse - accommodation - robots - ai | 20 | 133_reasonable_greenhouse_accommodation_robots |
|
176 |
+
| 134 | icebox - maxo - theme - kream - koo | 19 | 134_icebox_maxo_theme_kream |
|
177 |
+
| 135 | whio - ong - ang - canal - birds | 19 | 135_whio_ong_ang_canal |
|
178 |
+
| 136 | codyfight - tattooing - brothers - marriage - extreme | 19 | 136_codyfight_tattooing_brothers_marriage |
|
179 |
+
| 137 | nuro - gm - vehicle - vehicles - electric | 19 | 137_nuro_gm_vehicle_vehicles |
|
180 |
+
| 138 | kcs - railroads - cn - rail - stb | 19 | 138_kcs_railroads_cn_rail |
|
181 |
+
| 139 | strengths - music - grief - leisure - life | 19 | 139_strengths_music_grief_leisure |
|
182 |
+
| 140 | drones - drone - uae - missile - dhabi | 19 | 140_drones_drone_uae_missile |
|
183 |
+
| 141 | massage - dubai - jumeirah - japanese - oil | 18 | 141_massage_dubai_jumeirah_japanese |
|
184 |
+
| 142 | bowl - super - bengals - bet - rams | 18 | 142_bowl_super_bengals_bet |
|
185 |
+
| 143 | pension - 9news - pensions - pay - tax | 18 | 143_pension_9news_pensions_pay |
|
186 |
+
| 144 | dog - toy - pet - supplies - toys | 18 | 144_dog_toy_pet_supplies |
|
187 |
+
| 145 | english - travellers - students - course - syllabus | 18 | 145_english_travellers_students_course |
|
188 |
+
| 146 | mentoring - cbs - mentor - mentors - teachers | 18 | 146_mentoring_cbs_mentor_mentors |
|
189 |
+
| 147 | picnic - park - blankets - basket - acompañantes | 18 | 147_picnic_park_blankets_basket |
|
190 |
+
| 148 | orig - 99 - amazon - prime - dollar | 18 | 148_orig_99_amazon_prime |
|
191 |
+
| 149 | primary - english - genetics - wilanów - education | 18 | 149_primary_english_genetics_wilanów |
|
192 |
+
| 150 | hardin - film - he - she - oscar | 17 | 150_hardin_film_he_she |
|
193 |
+
| 151 | laptop - gaming - alienware - laptops - hp | 17 | 151_laptop_gaming_alienware_laptops |
|
194 |
+
| 152 | ufc - tmz - owens - onlyfans - tonight | 17 | 152_ufc_tmz_owens_onlyfans |
|
195 |
+
| 153 | basketball - vs - varsity - darien - canaan | 17 | 153_basketball_vs_varsity_darien |
|
196 |
+
| 154 | workers - hanford - state - law - doe | 17 | 154_workers_hanford_state_law |
|
197 |
+
| 155 | cdl - freight - broker - logistics - eldt | 17 | 155_cdl_freight_broker_logistics |
|
198 |
+
| 156 | builders - connell - brenton - firm - wage | 17 | 156_builders_connell_brenton_firm |
|
199 |
+
| 157 | bookstore - easter - my - menger - eastershelfie | 16 | 157_bookstore_easter_my_menger |
|
200 |
+
| 158 | prince - royal - duke - charles - queen | 16 | 158_prince_royal_duke_charles |
|
201 |
+
| 159 | ดตามเราได - จำก - มหาชน - voicetv - oppday | 16 | 159_ดตามเราได_จำก_มหาชน_voicetv |
|
202 |
+
| 160 | nba - trades - stream - espn - live | 16 | 160_nba_trades_stream_espn |
|
203 |
+
| 161 | school - students - science - brandon - twig | 16 | 161_school_students_science_brandon |
|
204 |
+
| 162 | morning - sleep - your - kaplan - routine | 16 | 162_morning_sleep_your_kaplan |
|
205 |
+
| 163 | kat - author - desires - louise - charmaine | 16 | 163_kat_author_desires_louise |
|
206 |
+
| 164 | movie - recapped - uche - academia - dizzyeight | 16 | 164_movie_recapped_uche_academia |
|
207 |
+
| 165 | awka - religion - suspects - anambra - echeng | 15 | 165_awka_religion_suspects_anambra |
|
208 |
+
| 166 | wrc - f1 - rally - championship - formula1 | 15 | 166_wrc_f1_rally_championship |
|
209 |
+
| 167 | hillstream - algae - scape - goby - aquarium | 15 | 167_hillstream_algae_scape_goby |
|
210 |
+
| 168 | skin - filler - touche - éclat - dermal | 15 | 168_skin_filler_touche_éclat |
|
211 |
+
| 169 | pets - cats - hopkins - cat - niblo | 15 | 169_pets_cats_hopkins_cat |
|
212 |
+
|
213 |
+
</details>
|
214 |
+
|
215 |
+
## Training hyperparameters
|
216 |
+
|
217 |
+
* calculate_probabilities: False
|
218 |
+
* language: None
|
219 |
+
* low_memory: False
|
220 |
+
* min_topic_size: 10
|
221 |
+
* n_gram_range: (1, 1)
|
222 |
+
* nr_topics: None
|
223 |
+
* seed_topic_list: None
|
224 |
+
* top_n_words: 10
|
225 |
+
* verbose: True
|
226 |
+
* zeroshot_min_similarity: 0.7
|
227 |
+
* zeroshot_topic_list: None
|
228 |
+
|
229 |
+
## Framework versions
|
230 |
+
|
231 |
+
* Numpy: 1.23.5
|
232 |
+
* HDBSCAN: 0.8.33
|
233 |
+
* UMAP: 0.5.5
|
234 |
+
* Pandas: 1.5.3
|
235 |
+
* Scikit-Learn: 1.2.2
|
236 |
+
* Sentence-transformers: 2.2.2
|
237 |
+
* Transformers: 4.36.0
|
238 |
+
* Numba: 0.58.1
|
239 |
+
* Plotly: 5.15.0
|
240 |
+
* Python: 3.10.12
|
config.json
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"calculate_probabilities": false,
|
3 |
+
"language": null,
|
4 |
+
"low_memory": false,
|
5 |
+
"min_topic_size": 10,
|
6 |
+
"n_gram_range": [
|
7 |
+
1,
|
8 |
+
1
|
9 |
+
],
|
10 |
+
"nr_topics": null,
|
11 |
+
"seed_topic_list": null,
|
12 |
+
"top_n_words": 10,
|
13 |
+
"verbose": true,
|
14 |
+
"zeroshot_min_similarity": 0.7,
|
15 |
+
"zeroshot_topic_list": null
|
16 |
+
}
|
ctfidf.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d72ed6bb936325fc58a2c01b5b70314996f20c6bb085a9916c325ac0d31286b9
|
3 |
+
size 5162328
|
ctfidf_config.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
topic_embeddings.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9e196af4ebebcaf6d7abdf759b9a763cceaebf453011141f8a56a5e59073a9f0
|
3 |
+
size 262744
|
topics.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|