Thang203's picture
Add BERTopic model
43df3cc verified
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

topic_model_general_auto_april8

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("Thang203/topic_model_general_auto_april8")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 113
  • Number of training documents: 6795
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 models - language - llms - language models - model 10 -1_models_language_llms_language models
0 visual - multimodal - image - images - video 1955 0_visual_multimodal_image_images
1 reasoning - mathematical - cot - math - problems 429 1_reasoning_mathematical_cot_math
2 students - education - chatgpt - student - ai 315 2_students_education_chatgpt_student
3 medical - clinical - biomedical - healthcare - notes 261 3_medical_clinical_biomedical_healthcare
4 translation - languages - machine translation - multilingual - machine 215 4_translation_languages_machine translation_multilingual
5 code - code generation - generation - programming - python 156 5_code_code generation_generation_programming
6 generation - story - text - text generation - gpt2 131 6_generation_story_text_text generation
7 rlhf - reward - alignment - preference - feedback 85 7_rlhf_reward_alignment_preference
8 financial - sentiment - stock - market - investment 78 8_financial_sentiment_stock_market
9 bias - gender - biases - gender bias - fairness 77 9_bias_gender_biases_gender bias
10 summarization - summaries - abstractive - text summarization - summary 77 10_summarization_summaries_abstractive_text summarization
11 emotion - emotional - empathetic - emotions - affective 74 11_emotion_emotional_empathetic_emotions
12 radiology - medical - reports - radiology reports - image 74 12_radiology_medical_reports_radiology reports
13 fewshot - zeroshot - learning - augmentation - data 69 13_fewshot_zeroshot_learning_augmentation
14 game - games - agents - negotiation - llm agents 69 14_game_games_agents_negotiation
15 dialogue - taskoriented - dialog - dialogue systems - systems 68 15_dialogue_taskoriented_dialog_dialogue systems
16 text - detection - texts - aigenerated - detectors 62 16_text_detection_texts_aigenerated
17 news - misinformation - fake - detection - fake news 61 17_news_misinformation_fake_detection
18 quantization - quantized - weights - 4bit - memory 61 18_quantization_quantized_weights_4bit
19 adversarial - attack - attacks - backdoor - adversarial examples 60 19_adversarial_attack_attacks_backdoor
20 privacy - private - federated - privacypreserving - pii 59 20_privacy_private_federated_privacypreserving
21 retrieval - ranking - rag - reranking - retrievalaugmented 58 21_retrieval_ranking_rag_reranking
22 legal - patent - court - claim - law 58 22_legal_patent_court_claim
23 code - software - developers - commit - code generation 57 23_code_software_developers_commit
24 word - representations - negation - linguistic - sentence 56 24_word_representations_negation_linguistic
25 recommendation - recommender - recommendations - recommender systems - user 55 25_recommendation_recommender_recommendations_recommender systems
26 instruction - instruction tuning - tuning - instructions - data 54 26_instruction_instruction tuning_tuning_instructions
27 pretraining - pretrained - seq2seq - tasks - masked 54 27_pretraining_pretrained_seq2seq_tasks
28 vulnerability - vulnerabilities - security - code - smart 54 28_vulnerability_vulnerabilities_security_code
29 transformer - transformers - layers - layer - attention 48 29_transformer_transformers_layers_layer
30 jailbreak - attacks - jailbreaking - attack - safety 44 30_jailbreak_attacks_jailbreaking_attack
31 ai - regulation - ethical - risk - regulatory 43 31_ai_regulation_ethical_risk
32 materials - chemistry - chemical - molecular - materials science 42 32_materials_chemistry_chemical_molecular
33 repair - bugs - bug - program repair - apr 42 33_repair_bugs_bug_program repair
34 graph - graphs - graph reasoning - graph neural - graph data 41 34_graph_graphs_graph reasoning_graph neural
35 speech - asr - speech recognition - audio - recognition 41 35_speech_asr_speech recognition_audio
36 evaluation - nlg - metrics - human - text 40 36_evaluation_nlg_metrics_human
37 personality - traits - personality traits - psychological - personas 38 37_personality_traits_personality traits_psychological
38 agent - agents - language agents - environments - decisionmaking 37 38_agent_agents_language agents_environments
39 texttosql - sql - database - spider - query 36 39_texttosql_sql_database_spider
40 tom - cognitive - mind - theory mind - humans 34 40_tom_cognitive_mind_theory mind
41 hate - hate speech - speech - offensive - hateful 34 41_hate_hate speech_speech_offensive
42 question - qa - answering - question answering - questions 34 42_question_qa_answering_question answering
43 incontext - icl - demonstrations - incontext learning - learning 33 43_incontext_icl_demonstrations_incontext learning
44 navigation - robot - manipulation - embodied - robots 33 44_navigation_robot_manipulation_embodied
45 hallucinations - hallucination - hallucination detection - detection - llms 31 45_hallucinations_hallucination_hallucination detection_detection
46 commonsense - commonsense knowledge - knowledge - commonsense reasoning - commonsense question answering 31 46_commonsense_commonsense knowledge_knowledge_commonsense reasoning
47 tool - tools - apis - api - tooluse 31 47_tool_tools_apis_api
48 parallelism - training - distributed - distributed training - network 30 48_parallelism_training_distributed_distributed training
49 brain - neural - gpt2 - circuit - attention 30 49_brain_neural_gpt2_circuit
50 context - context window - window - length - extrapolation 29 50_context_context window_window_length
51 knowledge - knowledge graph - kgs - wikidata - graph 29 51_knowledge_knowledge graph_kgs_wikidata
52 chatbots - search - chatgpt - technology - chat 28 52_chatbots_search_chatgpt_technology
53 cultural - political - opinions - values - survey 28 53_cultural_political_opinions_values
54 sentiment - sentiment analysis - analysis - aspectbased - polarity 28 54_sentiment_sentiment analysis_analysis_aspectbased
55 research - writing - ai - scientific - chatgpt 28 55_research_writing_ai_scientific
56 music - musical - audio - lyrics - sounds 28 56_music_musical_audio_lyrics
57 scaling - training - scaling laws - laws - emergent abilities 28 57_scaling_training_scaling laws_laws
58 explanations - counterfactual - explanation - counterfactuals - natural language explanations 27 58_explanations_counterfactual_explanation_counterfactuals
59 lora - lowrank - finetuning - adaptation - peft 27 59_lora_lowrank_finetuning_adaptation
60 safety - unsafe - harmful - safety alignment - 2chat 26 60_safety_unsafe_harmful_safety alignment
61 cybersecurity - cyber - security - genai - threat 26 61_cybersecurity_cyber_security_genai
62 visualization - visualizations - data visualization - chart - natural language 25 62_visualization_visualizations_data visualization_chart
63 attention - memory - matrix - linear - kv 23 63_attention_memory_matrix_linear
64 correction - gec - grammatical - error - error correction 23 64_correction_gec_grammatical_error
65 test - unit - tests - test generation - test cases 22 65_test_unit_tests_test generation
66 entity - relation - ner - extraction - relation extraction 22 66_entity_relation_ner_extraction
67 prompt - prompts - tuning - prompt tuning - optimization 22 67_prompt_prompts_tuning_prompt tuning
68 distillation - teacher - student - kd - student model 22 68_distillation_teacher_student_kd
69 pruning - sparsity - structured pruning - structured - weights 21 69_pruning_sparsity_structured pruning_structured
70 hallucination - hallucinations - lvlms - mllms - visual 21 70_hallucination_hallucinations_lvlms_mllms
71 ideas - creative - ai - creativity - fictional 21 71_ideas_creative_ai_creativity
72 mental - mental health - health - depression - social media 21 72_mental_mental health_health_depression
73 adversarial - vlms - attacks - attack - adversarial examples 20 73_adversarial_vlms_attacks_attack
74 confidence - calibration - uncertainty - probabilities - confidence scores 19 74_confidence_calibration_uncertainty_probabilities
75 crosslingual - multilingual - languages - english - transfer 19 75_crosslingual_multilingual_languages_english
76 verilog - design - hardware - hardware design - rtl 18 76_verilog_design_hardware_hardware design
77 intent - intent detection - slot - slot filling - detection 17 77_intent_intent detection_slot_slot filling
78 arabic - hebrew - cultural - nlp - diacritization 17 78_arabic_hebrew_cultural_nlp
79 watermarking - watermark - copyright - protection - ip 16 79_watermarking_watermark_copyright_protection
80 robot - robots - dialogue - round - humanrobot 16 80_robot_robots_dialogue_round
81 poetry - poems - poetry generation - lyrics - generation 16 81_poetry_poems_poetry generation_lyrics
82 table - tabular - tables - tabular data - data 16 82_table_tabular_tables_tabular data
83 spatial - geospatial - gis - geographic - location 15 83_spatial_geospatial_gis_geographic
84 product - ecommerce - attribute - extraction - product descriptions 15 84_product_ecommerce_attribute_extraction
85 geoscience - astronomy - scientific - astronomical - galactica 15 85_geoscience_astronomy_scientific_astronomical
86 phishing - emails - phishing emails - email - phishing attacks 15 86_phishing_emails_phishing emails_email
87 ai - generative ai - workers - generative - labor 14 87_ai_generative ai_workers_generative
88 planning - robotic - robot - robogpt - task planning 14 88_planning_robotic_robot_robogpt
89 mobile - wireless - edge - devices - aigc 14 89_mobile_wireless_edge_devices
90 simplification - text simplification - sentence - text - readability 14 90_simplification_text simplification_sentence_text
91 editing - knowledge editing - model editing - knowledge - editing methods 14 91_editing_knowledge editing_model editing_knowledge
92 annotation - data annotation - metadata - annotators - data 14 92_annotation_data annotation_metadata_annotators
93 gpu - hardware - communication - memory - accelerators 14 93_gpu_hardware_communication_memory
94 argument - arguments - argumentation - fallacy - fallacies 14 94_argument_arguments_argumentation_fallacy
95 toxicity - toxic - detoxification - content - toxic content 14 95_toxicity_toxic_detoxification_content
96 causal - causal reasoning - causality - causal discovery - causal inference 14 96_causal_causal reasoning_causality_causal discovery
97 design - bid - 3d - designs - generative 14 97_design_bid_3d_designs
98 chinese - questions - subjects - school - ceval 14 98_chinese_questions_subjects_school
99 scientific - papers - review - feedback - reviews 13 99_scientific_papers_review_feedback
100 urban - traffic - transportation - foundation models - foundation 13 100_urban_traffic_transportation_foundation models
101 humor - sarcasm - jokes - sarcasm detection - funny 13 101_humor_sarcasm_jokes_sarcasm detection
102 analogical - analogies - analogy - analogical reasoning - metaphor 12 102_analogical_analogies_analogy_analogical reasoning
103 public - early - sentiments - media - topics 12 103_public_early_sentiments_media
104 optimizers - adam - deep - networks - training 12 104_optimizers_adam_deep_networks
105 log - root - cloud - anomaly detection - anomaly 12 105_log_root_cloud_anomaly detection
106 dialogue - norm - norms - conversations - persona 12 106_dialogue_norm_norms_conversations
107 speculative - decoding - draft - speculative decoding - draft model 11 107_speculative_decoding_draft_speculative decoding
108 protein - sequences - proteins - bioinformatics - protein sequence 11 108_protein_sequences_proteins_bioinformatics
109 forgetting - catastrophic forgetting - catastrophic - continual - continual learning 11 109_forgetting_catastrophic forgetting_catastrophic_continual
110 software - software engineering - software using - chatgpt - software testing 11 110_software_software engineering_software using_chatgpt
111 verification - sva - configuration - proof - verified 10 111_verification_sva_configuration_proof

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.25.2
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.6
  • Pandas: 2.0.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.6.1
  • Transformers: 4.38.2
  • Numba: 0.58.1
  • Plotly: 5.15.0
  • Python: 3.10.12