Edit model card

BERTopic

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("Jerado/BERTopic")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 17
  • Number of training documents: 1000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 theism - much - way - think - just 15 -1_theism_much_way_think
0 nhl - playoffs - rangers - hockey - league 304 0_nhl_playoffs_rangers_hockey
1 performance - ram - drivers - monitor - speed 92 1_performance_ram_drivers_monitor
2 x11r5 - hyperhelp - windows - pc - application 82 2_x11r5_hyperhelp_windows_pc
3 dos - windows - harddisk - disk - software 82 3_dos_windows_harddisk_disk
4 amp - amps - amplifier - ampere - current 75 4_amp_amps_amplifier_ampere
5 scripture - christians - sin - bible - commandment 44 5_scripture_christians_sin_bible
6 patients - biological - medicine - studies - doctors 41 6_patients_biological_medicine_studies
7 nasa - solar - space - shuttle - orbiting 39 7_nasa_solar_space_shuttle
8 armenians - armenian - armenia - turks - genocide 38 8_armenians_armenian_armenia_turks
9 guns - gun - amendment - constitution - laws 36 9_guns_gun_amendment_constitution
10 - - - - 33 10____
11 motorcycle - bikes - cobralinks - bike - riding 32 11_motorcycle_bikes_cobralinks_bike
12 encryption - security - encrypted - privacy - secure 24 12_encryption_security_encrypted_privacy
13 contacted - address - mail - contact - email 23 13_contacted_address_mail_contact
14 paganism - faith - christianity - christians - atheists 21 14_paganism_faith_christianity_christians
15 action - fbi - batf - war - president 19 15_action_fbi_batf_war

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: [['drug', 'cancer', 'drugs', 'doctor'], ['windows', 'drive', 'dos', 'file'], ['space', 'launch', 'orbit', 'lunar']]
  • top_n_words: 10
  • verbose: False
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.23.5
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.6
  • Pandas: 2.0.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.7.0
  • Transformers: 4.40.1
  • Numba: 0.58.1
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
4
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.