metadata

tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

topic_model_general_normal_april8

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("Thang203/topic_model_general_normal_april8")

topic_model.get_topic_info()

Topic overview

Number of topics: 80
Number of training documents: 6795

Click here for an overview of all topics.

Topic ID	Topic Keywords	Topic Frequency	Label
-1	models - language - llms - language models - chatgpt	11	-1_models_language_llms_language models
0	translation - language - models - data - generation	2010	0_translation_language_models_data
1	visual - multimodal - image - images - video	510	1_visual_multimodal_image_images
2	reasoning - math - cot - mathematical - problems	432	2_reasoning_math_cot_mathematical
3	attacks - attack - adversarial - safety - jailbreak	340	3_attacks_attack_adversarial_safety
4	medical - clinical - biomedical - health - healthcare	318	4_medical_clinical_biomedical_health
5	code - code generation - generation - programming - software	303	5_code_code generation_generation_programming
6	students - education - ai - chatgpt - student	153	6_students_education_ai_chatgpt
7	robot - planning - robots - navigation - robotic	110	7_robot_planning_robots_navigation
8	dialogue - taskoriented - dialog - dialogue systems - systems	107	8_dialogue_taskoriented_dialog_dialogue systems
9	knowledge - question - answering - question answering - kgs	97	9_knowledge_question_answering_question answering
10	financial - sentiment - stock - market - investment	78	10_financial_sentiment_stock_market
11	bias - gender - biases - gender bias - fairness	78	11_bias_gender_biases_gender bias
12	emotion - emotional - empathetic - mental health - affective	77	12_emotion_emotional_empathetic_mental health
13	privacy - private - federated - data - attack	76	13_privacy_private_federated_data
14	text - detection - texts - aigenerated - machinegenerated	75	14_text_detection_texts_aigenerated
15	radiology - medical - reports - image - radiology reports	75	15_radiology_medical_reports_image
16	training - parallelism - gpu - memory - hardware	71	16_training_parallelism_gpu_memory
17	summarization - summaries - abstractive - summary - text summarization	70	17_summarization_summaries_abstractive_summary
18	game - games - agents - social - llm agents	69	18_game_games_agents_social
19	quantization - quantized - weights - memory - compression	66	19_quantization_quantized_weights_memory
20	sql - texttosql - table - database - tabular	62	20_sql_texttosql_table_database
21	retrieval - ranking - rag - reranking - retrievalaugmented	61	21_retrieval_ranking_rag_reranking
22	lora - attention - lowrank - finetuning - memory	59	22_lora_attention_lowrank_finetuning
23	legal - patent - claim - court - law	58	23_legal_patent_claim_court
24	alignment - preference - reward - rlhf - preferences	58	24_alignment_preference_reward_rlhf
25	recommendation - recommender - recommendations - recommender systems - user	56	25_recommendation_recommender_recommendations_recommender systems
26	transformer - transformers - attention - layers - layer	55	26_transformer_transformers_attention_layers
27	tom - cognitive - analogical - analogies - human	52	27_tom_cognitive_analogical_analogies
28	vulnerability - vulnerabilities - code - security - smart	48	28_vulnerability_vulnerabilities_code_security
29	materials - chemistry - materials science - chemical - molecular	48	29_materials_chemistry_materials science_chemical
30	agent - agents - rl - environments - language agents	47	30_agent_agents_rl_environments
31	repair - bugs - bug - program repair - apr	43	31_repair_bugs_bug_program repair
32	graph - graphs - graph reasoning - graph neural - graph data	43	32_graph_graphs_graph reasoning_graph neural
33	speech - asr - audio - speech recognition - recognition	42	33_speech_asr_audio_speech recognition
34	ai - ethical - regulation - risks - risk	41	34_ai_ethical_regulation_risks
35	personality - traits - personality traits - personas - personalities	41	35_personality_traits_personality traits_personas
36	context - context window - window - length - long	36	36_context_context window_window_length
37	chatgpt - research - writing - ai - academic	34	37_chatgpt_research_writing_ai
38	incontext - demonstrations - icl - incontext learning - learning	33	38_incontext_demonstrations_icl_incontext learning
39	sentiment - sentiment analysis - analysis - aspectbased - polarity	32	39_sentiment_sentiment analysis_analysis_aspectbased
40	cultural - opinions - political - survey - values	30	40_cultural_opinions_political_survey
41	tool - tools - apis - api - llms	29	41_tool_tools_apis_api
42	hallucinations - hallucination - hallucination detection - detection - llms	29	42_hallucinations_hallucination_hallucination detection_detection
43	creative - ideas - ai - creativity - storytelling	28	43_creative_ideas_ai_creativity
44	music - musical - audio - lyrics - song	28	44_music_musical_audio_lyrics
45	scaling - scaling laws - laws - training - model	27	45_scaling_scaling laws_laws_training
46	physics - students - chatgpt - education - responses	26	46_physics_students_chatgpt_education
47	correction - grammatical - gec - error - error correction	26	47_correction_grammatical_gec_error
48	test - unit - tests - test generation - test cases	23	48_test_unit_tests_test generation
49	pruning - sparsity - structured pruning - structured - weights	23	49_pruning_sparsity_structured pruning_structured
50	commonsense - commonsense knowledge - knowledge - commonsense question answering - commonsense question	21	50_commonsense_commonsense knowledge_knowledge_commonsense question answering
51	distillation - teacher - student - kd - knowledge distillation	20	51_distillation_teacher_student_kd
52	visualization - visualizations - data visualization - natural - natural language	20	52_visualization_visualizations_data visualization_natural
53	hallucination - hallucinations - lvlms - mllms - visual	20	53_hallucination_hallucinations_lvlms_mllms
54	adversarial - vlms - attacks - attack - adversarial examples	20	54_adversarial_vlms_attacks_attack
55	verilog - design - hardware - hardware design - rtl	18	55_verilog_design_hardware_hardware design
56	spatial - geospatial - geographic - location - populations	18	56_spatial_geospatial_geographic_location
57	intent - intent detection - slot - detection - slot filling	18	57_intent_intent detection_slot_detection
58	prompts - prompt - performance - negated - pseudocode	18	58_prompts_prompt_performance_negated
59	brain - fmri - neural - activity - eeg	17	59_brain_fmri_neural_activity
60	watermarking - copyright - protection - text - model	16	60_watermarking_copyright_protection_text
61	public - social - media - early - ai	16	61_public_social_media_early
62	ai - productivity - chatbots - chatgpt - economy	15	62_ai_productivity_chatbots_chatgpt
63	poetry - poems - poetry generation - lyrics - poem	15	63_poetry_poems_poetry generation_lyrics
64	geoscience - astronomy - scientific - astronomical - galactica	15	64_geoscience_astronomy_scientific_astronomical
65	editing - knowledge editing - knowledge - model editing - editing methods	14	65_editing_knowledge editing_knowledge_model editing
66	argument - arguments - argumentation - fallacy - fallacies	14	66_argument_arguments_argumentation_fallacy
67	mobile - wireless - devices - aigc - network	14	67_mobile_wireless_devices_aigc
68	design - bid - 3d - designs - generative	14	68_design_bid_3d_designs
69	simplification - text simplification - text - sentence - readability	14	69_simplification_text simplification_text_sentence
70	urban - traffic - transportation - foundation models - foundation	13	70_urban_traffic_transportation_foundation models
71	log - anomaly - root - anomaly detection - cloud	13	71_log_anomaly_root_anomaly detection
72	forgetting - catastrophic forgetting - catastrophic - continual - finetuning	13	72_forgetting_catastrophic forgetting_catastrophic_continual
73	scientific - papers - review - gpt4 - feedback	13	73_scientific_papers_review_gpt4
74	causal - causality - causal discovery - causal inference - causal reasoning	13	74_causal_causality_causal discovery_causal inference
75	product - ecommerce - attribute - extraction - product descriptions	13	75_product_ecommerce_attribute_extraction
76	optimizers - adam - deep - training - networks	12	76_optimizers_adam_deep_training
77	chinese - questions - subjects - school - ceval	12	77_chinese_questions_subjects_school
78	speculative - decoding - draft - speculative decoding - draft model	12	78_speculative_decoding_draft_speculative decoding

Training hyperparameters

calculate_probabilities: False
language: english
low_memory: False
min_topic_size: 10
n_gram_range: (1, 1)
nr_topics: auto
seed_topic_list: None
top_n_words: 10
verbose: True
zeroshot_min_similarity: 0.7
zeroshot_topic_list: None

Framework versions

Numpy: 1.25.2
HDBSCAN: 0.8.33
UMAP: 0.5.6
Pandas: 2.0.3
Scikit-Learn: 1.2.2
Sentence-transformers: 2.6.1
Transformers: 4.38.2
Numba: 0.58.1
Plotly: 5.15.0
Python: 3.10.12