metadata
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
topic_model_general_auto_april8
This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
Usage
To use this model, please install BERTopic:
pip install -U bertopic
You can use the model as follows:
from bertopic import BERTopic
topic_model = BERTopic.load("Thang203/topic_model_general_auto_april8")
topic_model.get_topic_info()
Topic overview
- Number of topics: 113
- Number of training documents: 6795
Click here for an overview of all topics.
Topic ID | Topic Keywords | Topic Frequency | Label |
---|---|---|---|
-1 | models - language - llms - language models - model | 10 | -1_models_language_llms_language models |
0 | visual - multimodal - image - images - video | 1955 | 0_visual_multimodal_image_images |
1 | reasoning - mathematical - cot - math - problems | 429 | 1_reasoning_mathematical_cot_math |
2 | students - education - chatgpt - student - ai | 315 | 2_students_education_chatgpt_student |
3 | medical - clinical - biomedical - healthcare - notes | 261 | 3_medical_clinical_biomedical_healthcare |
4 | translation - languages - machine translation - multilingual - machine | 215 | 4_translation_languages_machine translation_multilingual |
5 | code - code generation - generation - programming - python | 156 | 5_code_code generation_generation_programming |
6 | generation - story - text - text generation - gpt2 | 131 | 6_generation_story_text_text generation |
7 | rlhf - reward - alignment - preference - feedback | 85 | 7_rlhf_reward_alignment_preference |
8 | financial - sentiment - stock - market - investment | 78 | 8_financial_sentiment_stock_market |
9 | bias - gender - biases - gender bias - fairness | 77 | 9_bias_gender_biases_gender bias |
10 | summarization - summaries - abstractive - text summarization - summary | 77 | 10_summarization_summaries_abstractive_text summarization |
11 | emotion - emotional - empathetic - emotions - affective | 74 | 11_emotion_emotional_empathetic_emotions |
12 | radiology - medical - reports - radiology reports - image | 74 | 12_radiology_medical_reports_radiology reports |
13 | fewshot - zeroshot - learning - augmentation - data | 69 | 13_fewshot_zeroshot_learning_augmentation |
14 | game - games - agents - negotiation - llm agents | 69 | 14_game_games_agents_negotiation |
15 | dialogue - taskoriented - dialog - dialogue systems - systems | 68 | 15_dialogue_taskoriented_dialog_dialogue systems |
16 | text - detection - texts - aigenerated - detectors | 62 | 16_text_detection_texts_aigenerated |
17 | news - misinformation - fake - detection - fake news | 61 | 17_news_misinformation_fake_detection |
18 | quantization - quantized - weights - 4bit - memory | 61 | 18_quantization_quantized_weights_4bit |
19 | adversarial - attack - attacks - backdoor - adversarial examples | 60 | 19_adversarial_attack_attacks_backdoor |
20 | privacy - private - federated - privacypreserving - pii | 59 | 20_privacy_private_federated_privacypreserving |
21 | retrieval - ranking - rag - reranking - retrievalaugmented | 58 | 21_retrieval_ranking_rag_reranking |
22 | legal - patent - court - claim - law | 58 | 22_legal_patent_court_claim |
23 | code - software - developers - commit - code generation | 57 | 23_code_software_developers_commit |
24 | word - representations - negation - linguistic - sentence | 56 | 24_word_representations_negation_linguistic |
25 | recommendation - recommender - recommendations - recommender systems - user | 55 | 25_recommendation_recommender_recommendations_recommender systems |
26 | instruction - instruction tuning - tuning - instructions - data | 54 | 26_instruction_instruction tuning_tuning_instructions |
27 | pretraining - pretrained - seq2seq - tasks - masked | 54 | 27_pretraining_pretrained_seq2seq_tasks |
28 | vulnerability - vulnerabilities - security - code - smart | 54 | 28_vulnerability_vulnerabilities_security_code |
29 | transformer - transformers - layers - layer - attention | 48 | 29_transformer_transformers_layers_layer |
30 | jailbreak - attacks - jailbreaking - attack - safety | 44 | 30_jailbreak_attacks_jailbreaking_attack |
31 | ai - regulation - ethical - risk - regulatory | 43 | 31_ai_regulation_ethical_risk |
32 | materials - chemistry - chemical - molecular - materials science | 42 | 32_materials_chemistry_chemical_molecular |
33 | repair - bugs - bug - program repair - apr | 42 | 33_repair_bugs_bug_program repair |
34 | graph - graphs - graph reasoning - graph neural - graph data | 41 | 34_graph_graphs_graph reasoning_graph neural |
35 | speech - asr - speech recognition - audio - recognition | 41 | 35_speech_asr_speech recognition_audio |
36 | evaluation - nlg - metrics - human - text | 40 | 36_evaluation_nlg_metrics_human |
37 | personality - traits - personality traits - psychological - personas | 38 | 37_personality_traits_personality traits_psychological |
38 | agent - agents - language agents - environments - decisionmaking | 37 | 38_agent_agents_language agents_environments |
39 | texttosql - sql - database - spider - query | 36 | 39_texttosql_sql_database_spider |
40 | tom - cognitive - mind - theory mind - humans | 34 | 40_tom_cognitive_mind_theory mind |
41 | hate - hate speech - speech - offensive - hateful | 34 | 41_hate_hate speech_speech_offensive |
42 | question - qa - answering - question answering - questions | 34 | 42_question_qa_answering_question answering |
43 | incontext - icl - demonstrations - incontext learning - learning | 33 | 43_incontext_icl_demonstrations_incontext learning |
44 | navigation - robot - manipulation - embodied - robots | 33 | 44_navigation_robot_manipulation_embodied |
45 | hallucinations - hallucination - hallucination detection - detection - llms | 31 | 45_hallucinations_hallucination_hallucination detection_detection |
46 | commonsense - commonsense knowledge - knowledge - commonsense reasoning - commonsense question answering | 31 | 46_commonsense_commonsense knowledge_knowledge_commonsense reasoning |
47 | tool - tools - apis - api - tooluse | 31 | 47_tool_tools_apis_api |
48 | parallelism - training - distributed - distributed training - network | 30 | 48_parallelism_training_distributed_distributed training |
49 | brain - neural - gpt2 - circuit - attention | 30 | 49_brain_neural_gpt2_circuit |
50 | context - context window - window - length - extrapolation | 29 | 50_context_context window_window_length |
51 | knowledge - knowledge graph - kgs - wikidata - graph | 29 | 51_knowledge_knowledge graph_kgs_wikidata |
52 | chatbots - search - chatgpt - technology - chat | 28 | 52_chatbots_search_chatgpt_technology |
53 | cultural - political - opinions - values - survey | 28 | 53_cultural_political_opinions_values |
54 | sentiment - sentiment analysis - analysis - aspectbased - polarity | 28 | 54_sentiment_sentiment analysis_analysis_aspectbased |
55 | research - writing - ai - scientific - chatgpt | 28 | 55_research_writing_ai_scientific |
56 | music - musical - audio - lyrics - sounds | 28 | 56_music_musical_audio_lyrics |
57 | scaling - training - scaling laws - laws - emergent abilities | 28 | 57_scaling_training_scaling laws_laws |
58 | explanations - counterfactual - explanation - counterfactuals - natural language explanations | 27 | 58_explanations_counterfactual_explanation_counterfactuals |
59 | lora - lowrank - finetuning - adaptation - peft | 27 | 59_lora_lowrank_finetuning_adaptation |
60 | safety - unsafe - harmful - safety alignment - 2chat | 26 | 60_safety_unsafe_harmful_safety alignment |
61 | cybersecurity - cyber - security - genai - threat | 26 | 61_cybersecurity_cyber_security_genai |
62 | visualization - visualizations - data visualization - chart - natural language | 25 | 62_visualization_visualizations_data visualization_chart |
63 | attention - memory - matrix - linear - kv | 23 | 63_attention_memory_matrix_linear |
64 | correction - gec - grammatical - error - error correction | 23 | 64_correction_gec_grammatical_error |
65 | test - unit - tests - test generation - test cases | 22 | 65_test_unit_tests_test generation |
66 | entity - relation - ner - extraction - relation extraction | 22 | 66_entity_relation_ner_extraction |
67 | prompt - prompts - tuning - prompt tuning - optimization | 22 | 67_prompt_prompts_tuning_prompt tuning |
68 | distillation - teacher - student - kd - student model | 22 | 68_distillation_teacher_student_kd |
69 | pruning - sparsity - structured pruning - structured - weights | 21 | 69_pruning_sparsity_structured pruning_structured |
70 | hallucination - hallucinations - lvlms - mllms - visual | 21 | 70_hallucination_hallucinations_lvlms_mllms |
71 | ideas - creative - ai - creativity - fictional | 21 | 71_ideas_creative_ai_creativity |
72 | mental - mental health - health - depression - social media | 21 | 72_mental_mental health_health_depression |
73 | adversarial - vlms - attacks - attack - adversarial examples | 20 | 73_adversarial_vlms_attacks_attack |
74 | confidence - calibration - uncertainty - probabilities - confidence scores | 19 | 74_confidence_calibration_uncertainty_probabilities |
75 | crosslingual - multilingual - languages - english - transfer | 19 | 75_crosslingual_multilingual_languages_english |
76 | verilog - design - hardware - hardware design - rtl | 18 | 76_verilog_design_hardware_hardware design |
77 | intent - intent detection - slot - slot filling - detection | 17 | 77_intent_intent detection_slot_slot filling |
78 | arabic - hebrew - cultural - nlp - diacritization | 17 | 78_arabic_hebrew_cultural_nlp |
79 | watermarking - watermark - copyright - protection - ip | 16 | 79_watermarking_watermark_copyright_protection |
80 | robot - robots - dialogue - round - humanrobot | 16 | 80_robot_robots_dialogue_round |
81 | poetry - poems - poetry generation - lyrics - generation | 16 | 81_poetry_poems_poetry generation_lyrics |
82 | table - tabular - tables - tabular data - data | 16 | 82_table_tabular_tables_tabular data |
83 | spatial - geospatial - gis - geographic - location | 15 | 83_spatial_geospatial_gis_geographic |
84 | product - ecommerce - attribute - extraction - product descriptions | 15 | 84_product_ecommerce_attribute_extraction |
85 | geoscience - astronomy - scientific - astronomical - galactica | 15 | 85_geoscience_astronomy_scientific_astronomical |
86 | phishing - emails - phishing emails - email - phishing attacks | 15 | 86_phishing_emails_phishing emails_email |
87 | ai - generative ai - workers - generative - labor | 14 | 87_ai_generative ai_workers_generative |
88 | planning - robotic - robot - robogpt - task planning | 14 | 88_planning_robotic_robot_robogpt |
89 | mobile - wireless - edge - devices - aigc | 14 | 89_mobile_wireless_edge_devices |
90 | simplification - text simplification - sentence - text - readability | 14 | 90_simplification_text simplification_sentence_text |
91 | editing - knowledge editing - model editing - knowledge - editing methods | 14 | 91_editing_knowledge editing_model editing_knowledge |
92 | annotation - data annotation - metadata - annotators - data | 14 | 92_annotation_data annotation_metadata_annotators |
93 | gpu - hardware - communication - memory - accelerators | 14 | 93_gpu_hardware_communication_memory |
94 | argument - arguments - argumentation - fallacy - fallacies | 14 | 94_argument_arguments_argumentation_fallacy |
95 | toxicity - toxic - detoxification - content - toxic content | 14 | 95_toxicity_toxic_detoxification_content |
96 | causal - causal reasoning - causality - causal discovery - causal inference | 14 | 96_causal_causal reasoning_causality_causal discovery |
97 | design - bid - 3d - designs - generative | 14 | 97_design_bid_3d_designs |
98 | chinese - questions - subjects - school - ceval | 14 | 98_chinese_questions_subjects_school |
99 | scientific - papers - review - feedback - reviews | 13 | 99_scientific_papers_review_feedback |
100 | urban - traffic - transportation - foundation models - foundation | 13 | 100_urban_traffic_transportation_foundation models |
101 | humor - sarcasm - jokes - sarcasm detection - funny | 13 | 101_humor_sarcasm_jokes_sarcasm detection |
102 | analogical - analogies - analogy - analogical reasoning - metaphor | 12 | 102_analogical_analogies_analogy_analogical reasoning |
103 | public - early - sentiments - media - topics | 12 | 103_public_early_sentiments_media |
104 | optimizers - adam - deep - networks - training | 12 | 104_optimizers_adam_deep_networks |
105 | log - root - cloud - anomaly detection - anomaly | 12 | 105_log_root_cloud_anomaly detection |
106 | dialogue - norm - norms - conversations - persona | 12 | 106_dialogue_norm_norms_conversations |
107 | speculative - decoding - draft - speculative decoding - draft model | 11 | 107_speculative_decoding_draft_speculative decoding |
108 | protein - sequences - proteins - bioinformatics - protein sequence | 11 | 108_protein_sequences_proteins_bioinformatics |
109 | forgetting - catastrophic forgetting - catastrophic - continual - continual learning | 11 | 109_forgetting_catastrophic forgetting_catastrophic_continual |
110 | software - software engineering - software using - chatgpt - software testing | 11 | 110_software_software engineering_software using_chatgpt |
111 | verification - sva - configuration - proof - verified | 10 | 111_verification_sva_configuration_proof |
Training hyperparameters
- calculate_probabilities: False
- language: english
- low_memory: False
- min_topic_size: 10
- n_gram_range: (1, 1)
- nr_topics: None
- seed_topic_list: None
- top_n_words: 10
- verbose: True
- zeroshot_min_similarity: 0.7
- zeroshot_topic_list: None
Framework versions
- Numpy: 1.25.2
- HDBSCAN: 0.8.33
- UMAP: 0.5.6
- Pandas: 2.0.3
- Scikit-Learn: 1.2.2
- Sentence-transformers: 2.6.1
- Transformers: 4.38.2
- Numba: 0.58.1
- Plotly: 5.15.0
- Python: 3.10.12