Thang203's picture
Add BERTopic model
a36730c verified
|
raw
history blame
10.5 kB
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

topic_model_general_normal_april8

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("Thang203/topic_model_general_normal_april8")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 80
  • Number of training documents: 6795
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 models - language - llms - language models - chatgpt 11 -1_models_language_llms_language models
0 translation - language - models - data - generation 2010 0_translation_language_models_data
1 visual - multimodal - image - images - video 510 1_visual_multimodal_image_images
2 reasoning - math - cot - mathematical - problems 432 2_reasoning_math_cot_mathematical
3 attacks - attack - adversarial - safety - jailbreak 340 3_attacks_attack_adversarial_safety
4 medical - clinical - biomedical - health - healthcare 318 4_medical_clinical_biomedical_health
5 code - code generation - generation - programming - software 303 5_code_code generation_generation_programming
6 students - education - ai - chatgpt - student 153 6_students_education_ai_chatgpt
7 robot - planning - robots - navigation - robotic 110 7_robot_planning_robots_navigation
8 dialogue - taskoriented - dialog - dialogue systems - systems 107 8_dialogue_taskoriented_dialog_dialogue systems
9 knowledge - question - answering - question answering - kgs 97 9_knowledge_question_answering_question answering
10 financial - sentiment - stock - market - investment 78 10_financial_sentiment_stock_market
11 bias - gender - biases - gender bias - fairness 78 11_bias_gender_biases_gender bias
12 emotion - emotional - empathetic - mental health - affective 77 12_emotion_emotional_empathetic_mental health
13 privacy - private - federated - data - attack 76 13_privacy_private_federated_data
14 text - detection - texts - aigenerated - machinegenerated 75 14_text_detection_texts_aigenerated
15 radiology - medical - reports - image - radiology reports 75 15_radiology_medical_reports_image
16 training - parallelism - gpu - memory - hardware 71 16_training_parallelism_gpu_memory
17 summarization - summaries - abstractive - summary - text summarization 70 17_summarization_summaries_abstractive_summary
18 game - games - agents - social - llm agents 69 18_game_games_agents_social
19 quantization - quantized - weights - memory - compression 66 19_quantization_quantized_weights_memory
20 sql - texttosql - table - database - tabular 62 20_sql_texttosql_table_database
21 retrieval - ranking - rag - reranking - retrievalaugmented 61 21_retrieval_ranking_rag_reranking
22 lora - attention - lowrank - finetuning - memory 59 22_lora_attention_lowrank_finetuning
23 legal - patent - claim - court - law 58 23_legal_patent_claim_court
24 alignment - preference - reward - rlhf - preferences 58 24_alignment_preference_reward_rlhf
25 recommendation - recommender - recommendations - recommender systems - user 56 25_recommendation_recommender_recommendations_recommender systems
26 transformer - transformers - attention - layers - layer 55 26_transformer_transformers_attention_layers
27 tom - cognitive - analogical - analogies - human 52 27_tom_cognitive_analogical_analogies
28 vulnerability - vulnerabilities - code - security - smart 48 28_vulnerability_vulnerabilities_code_security
29 materials - chemistry - materials science - chemical - molecular 48 29_materials_chemistry_materials science_chemical
30 agent - agents - rl - environments - language agents 47 30_agent_agents_rl_environments
31 repair - bugs - bug - program repair - apr 43 31_repair_bugs_bug_program repair
32 graph - graphs - graph reasoning - graph neural - graph data 43 32_graph_graphs_graph reasoning_graph neural
33 speech - asr - audio - speech recognition - recognition 42 33_speech_asr_audio_speech recognition
34 ai - ethical - regulation - risks - risk 41 34_ai_ethical_regulation_risks
35 personality - traits - personality traits - personas - personalities 41 35_personality_traits_personality traits_personas
36 context - context window - window - length - long 36 36_context_context window_window_length
37 chatgpt - research - writing - ai - academic 34 37_chatgpt_research_writing_ai
38 incontext - demonstrations - icl - incontext learning - learning 33 38_incontext_demonstrations_icl_incontext learning
39 sentiment - sentiment analysis - analysis - aspectbased - polarity 32 39_sentiment_sentiment analysis_analysis_aspectbased
40 cultural - opinions - political - survey - values 30 40_cultural_opinions_political_survey
41 tool - tools - apis - api - llms 29 41_tool_tools_apis_api
42 hallucinations - hallucination - hallucination detection - detection - llms 29 42_hallucinations_hallucination_hallucination detection_detection
43 creative - ideas - ai - creativity - storytelling 28 43_creative_ideas_ai_creativity
44 music - musical - audio - lyrics - song 28 44_music_musical_audio_lyrics
45 scaling - scaling laws - laws - training - model 27 45_scaling_scaling laws_laws_training
46 physics - students - chatgpt - education - responses 26 46_physics_students_chatgpt_education
47 correction - grammatical - gec - error - error correction 26 47_correction_grammatical_gec_error
48 test - unit - tests - test generation - test cases 23 48_test_unit_tests_test generation
49 pruning - sparsity - structured pruning - structured - weights 23 49_pruning_sparsity_structured pruning_structured
50 commonsense - commonsense knowledge - knowledge - commonsense question answering - commonsense question 21 50_commonsense_commonsense knowledge_knowledge_commonsense question answering
51 distillation - teacher - student - kd - knowledge distillation 20 51_distillation_teacher_student_kd
52 visualization - visualizations - data visualization - natural - natural language 20 52_visualization_visualizations_data visualization_natural
53 hallucination - hallucinations - lvlms - mllms - visual 20 53_hallucination_hallucinations_lvlms_mllms
54 adversarial - vlms - attacks - attack - adversarial examples 20 54_adversarial_vlms_attacks_attack
55 verilog - design - hardware - hardware design - rtl 18 55_verilog_design_hardware_hardware design
56 spatial - geospatial - geographic - location - populations 18 56_spatial_geospatial_geographic_location
57 intent - intent detection - slot - detection - slot filling 18 57_intent_intent detection_slot_detection
58 prompts - prompt - performance - negated - pseudocode 18 58_prompts_prompt_performance_negated
59 brain - fmri - neural - activity - eeg 17 59_brain_fmri_neural_activity
60 watermarking - copyright - protection - text - model 16 60_watermarking_copyright_protection_text
61 public - social - media - early - ai 16 61_public_social_media_early
62 ai - productivity - chatbots - chatgpt - economy 15 62_ai_productivity_chatbots_chatgpt
63 poetry - poems - poetry generation - lyrics - poem 15 63_poetry_poems_poetry generation_lyrics
64 geoscience - astronomy - scientific - astronomical - galactica 15 64_geoscience_astronomy_scientific_astronomical
65 editing - knowledge editing - knowledge - model editing - editing methods 14 65_editing_knowledge editing_knowledge_model editing
66 argument - arguments - argumentation - fallacy - fallacies 14 66_argument_arguments_argumentation_fallacy
67 mobile - wireless - devices - aigc - network 14 67_mobile_wireless_devices_aigc
68 design - bid - 3d - designs - generative 14 68_design_bid_3d_designs
69 simplification - text simplification - text - sentence - readability 14 69_simplification_text simplification_text_sentence
70 urban - traffic - transportation - foundation models - foundation 13 70_urban_traffic_transportation_foundation models
71 log - anomaly - root - anomaly detection - cloud 13 71_log_anomaly_root_anomaly detection
72 forgetting - catastrophic forgetting - catastrophic - continual - finetuning 13 72_forgetting_catastrophic forgetting_catastrophic_continual
73 scientific - papers - review - gpt4 - feedback 13 73_scientific_papers_review_gpt4
74 causal - causality - causal discovery - causal inference - causal reasoning 13 74_causal_causality_causal discovery_causal inference
75 product - ecommerce - attribute - extraction - product descriptions 13 75_product_ecommerce_attribute_extraction
76 optimizers - adam - deep - training - networks 12 76_optimizers_adam_deep_training
77 chinese - questions - subjects - school - ceval 12 77_chinese_questions_subjects_school
78 speculative - decoding - draft - speculative decoding - draft model 12 78_speculative_decoding_draft_speculative decoding

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: auto
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.25.2
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.6
  • Pandas: 2.0.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.6.1
  • Transformers: 4.38.2
  • Numba: 0.58.1
  • Plotly: 5.15.0
  • Python: 3.10.12