Edit model card

general_nlp_research_paper

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("Thang203/general_nlp_research_paper")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 165
  • Number of training documents: 11000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 language - models - model - data - translation 10 -1_language_models_model_data
0 question - answer - questions - answering - question answering 3488 0_question_answer_questions_answering
1 speech - speech recognition - acoustic - recognition - asr 513 1_speech_speech recognition_acoustic_recognition
2 summarization - summaries - abstractive - summary - extractive 345 2_summarization_summaries_abstractive_summary
3 clinical - medical - biomedical - extraction - notes 337 3_clinical_medical_biomedical_extraction
4 translation - machine translation - parallel - machine - nmt 258 4_translation_machine translation_parallel_machine
5 emotion - emotions - emotional - emotion recognition - affective 211 5_emotion_emotions_emotional_emotion recognition
6 word - embeddings - word embeddings - similarity - vector 164 6_word_embeddings_word embeddings_similarity
7 bert - probing - tasks - pretraining - pretrained 145 7_bert_probing_tasks_pretraining
8 relation - relation extraction - extraction - relations - distant 138 8_relation_relation extraction_extraction_relations
9 hate - hate speech - offensive - detection - speech 134 9_hate_hate speech_offensive_detection
10 arabic - sanskrit - kurdish - transliteration - rules 118 10_arabic_sanskrit_kurdish_transliteration
11 aspect - sentiment - sentiment analysis - aspectbased sentiment - aspectbased 118 11_aspect_sentiment_sentiment analysis_aspectbased sentiment
12 morphological - inflection - languages - morphology - morphological analysis 112 12_morphological_inflection_languages_morphology
13 ner - named entity - named - entity recognition - named entity recognition 107 13_ner_named entity_named_entity recognition
14 multimodal - image - visual - captions - images 101 14_multimodal_image_visual_captions
15 discourse - discourse relation - discourse parsing - implicit discourse - discourse relations 98 15_discourse_discourse relation_discourse parsing_implicit discourse
16 chinese - segmentation - word segmentation - chinese word - chinese word segmentation 89 16_chinese_segmentation_word segmentation_chinese word
17 crosslingual - bilingual - embeddings - crosslingual word - word embeddings 84 17_crosslingual_bilingual_embeddings_crosslingual word
18 entropy - law - languages - script - frequency 79 18_entropy_law_languages_script
19 argument - argumentation - arguments - argumentative - mining 77 19_argument_argumentation_arguments_argumentative
20 nmt - neural machine - neural machine translation - translation - machine translation 77 20_nmt_neural machine_neural machine translation_translation
21 parsing - dependency - dependency parsing - parser - transitionbased 76 21_parsing_dependency_dependency parsing_parser
22 syntactic - rnns - grammatical - language models - agreement 71 22_syntactic_rnns_grammatical_language models
23 generation - datatotext - text generation - datatotext generation - text 71 23_generation_datatotext_text generation_datatotext generation
24 topic - topics - topic models - topic modeling - lda 71 24_topic_topics_topic models_topic modeling
25 knowledge - knowledge graph - entities - relation - graph 68 25_knowledge_knowledge graph_entities_relation
26 gender - bias - gender bias - biases - embeddings 66 26_gender_bias_gender bias_biases
27 story - stories - story generation - narrative - plot 65 27_story_stories_story generation_narrative
28 dialogue - dialog - user - taskoriented - agent 65 28_dialogue_dialog_user_taskoriented
29 transformer - attention - selfattention - heads - layers 65 29_transformer_attention_selfattention_heads
30 srl - semantic role - role labeling - semantic role labeling - role 64 30_srl_semantic role_role labeling_semantic role labeling
31 change - semantic change - diachronic - lexical semantic - semantic 64 31_change_semantic change_diachronic_lexical semantic
32 sense - wsd - disambiguation - word sense - sense disambiguation 64 32_sense_wsd_disambiguation_word sense
33 paraphrase - paraphrases - paraphrase generation - paraphrasing - paraphrase identification 63 33_paraphrase_paraphrases_paraphrase generation_paraphrasing
34 linking - entity linking - entity - el - entities 62 34_linking_entity linking_entity_el
35 authorship - attribution - authorship attribution - authors - stylistic 60 35_authorship_attribution_authorship attribution_authors
36 tracking - state tracking - dialogue state - state - dialogue 54 36_tracking_state tracking_dialogue state_state
37 nli - natural language inference - language inference - inference - natural language 54 37_nli_natural language inference_language inference_inference
38 act - dialogue act - dialogue - dialog act - dialog 51 38_act_dialogue act_dialogue_dialog act
39 commonsense - reasoning - commonsense reasoning - knowledge - commonsense knowledge 49 39_commonsense_reasoning_commonsense reasoning_knowledge
40 crosslingual - multilingual - transfer - crosslingual transfer - mbert 49 40_crosslingual_multilingual_transfer_crosslingual transfer
41 coreference - resolution - coreference resolution - mention - pronoun 49 41_coreference_resolution_coreference resolution_mention
42 legal - patent - court - case - legal domain 48 42_legal_patent_court_case
43 dialect - identification - language identification - dialect identification - arabic 47 43_dialect_identification_language identification_dialect identification
44 amr - amr parsing - parsing - meaning representation - meaning 46 44_amr_amr parsing_parsing_meaning representation
45 adversarial - adversarial examples - attacks - attack - examples 46 45_adversarial_adversarial examples_attacks_attack
46 health - mental - mental health - social media - media 45 46_health_mental_mental health_social media
47 offensive - offensive language - subtask - offensive language identification - hostile 45 47_offensive_offensive language_subtask_offensive language identification
48 semantic parsing - parsing - semantic - compositional generalization - logical 44 48_semantic parsing_parsing_semantic_compositional generalization
49 recurrent - language modeling - rnn - lstm - modeling 44 49_recurrent_language modeling_rnn_lstm
50 sql - texttosql - database - queries - query 44 50_sql_texttosql_database_queries
51 indian - smt - translation - machine translation - machine 43 51_indian_smt_translation_machine translation
52 style - style transfer - transfer - text style - text style transfer 43 52_style_style transfer_transfer_text style
53 poetry - poems - lyrics - music - verse 43 53_poetry_poems_lyrics_music
54 codeswitching - cs - codeswitched - codemixed - monolingual 43 54_codeswitching_cs_codeswitched_codemixed
55 sentiment - polarity - sentiment analysis - analysis - prior polarity 41 55_sentiment_polarity_sentiment analysis_analysis
56 sarcasm - sarcasm detection - sarcastic - detection - irony 41 56_sarcasm_sarcasm detection_sarcastic_detection
57 gec - grammatical error - grammatical error correction - error correction - correction 40 57_gec_grammatical error_grammatical error correction_error correction
58 intent - intent detection - slot - slot filling - filling 40 58_intent_intent detection_slot_slot filling
59 temporal - events - temporal relations - expressions - temporal relation 39 59_temporal_events_temporal relations_expressions
60 adaptation - domain - domain adaptation - indomain - translation 37 60_adaptation_domain_domain adaptation_indomain
61 stance - stance detection - detection - tweets - veracity 37 61_stance_stance detection_detection_tweets
62 codemixed - sentiment - sentiment analysis - analysis - semeval2020 36 62_codemixed_sentiment_sentiment analysis_analysis
63 keyphrase - keyphrases - keyphrase extraction - keyphrase generation - extraction 35 63_keyphrase_keyphrases_keyphrase extraction_keyphrase generation
64 nmt - subword - translation - vocabulary - neural machine translation 35 64_nmt_subword_translation_vocabulary
65 calculus - logic - semantics - proof - typelogical 35 65_calculus_logic_semantics_proof
66 simplification - text simplification - sentence simplification - sentence - ts 35 66_simplification_text simplification_sentence simplification_sentence
67 annotation - xml - formats - tei - standards 35 67_annotation_xml_formats_tei
68 correction - spelling - ocr - spelling correction - errors 33 68_correction_spelling_ocr_spelling correction
69 sentiment - sentiment classification - sentiment analysis - classification - analysis 33 69_sentiment_sentiment classification_sentiment analysis_classification
70 complexity - readability - lexical complexity - assessment - readability assessment 31 70_complexity_readability_lexical complexity_assessment
71 postediting - ape - automatic postediting - mt - translation 30 71_postediting_ape_automatic postediting_mt
72 gender - gender bias - bias - translation - pronouns 30 72_gender_gender bias_bias_translation
73 tagger - tagging - taggers - pos - partofspeech 30 73_tagger_tagging_taggers_pos
74 meeting - summarization - podcast - abstractive - summaries 30 74_meeting_summarization_podcast_abstractive
75 domain - domain adaptation - adaptation - domains - target domain 30 75_domain_domain adaptation_adaptation_domains
76 documentlevel - context - translation - nmt - neural machine 29 76_documentlevel_context_translation_nmt
77 text classification - classification - convolutional - networks - convolutional neural 29 77_text classification_classification_convolutional_networks
78 news - fake - fake news - clickbait - satirical 29 78_news_fake_fake news_clickbait
79 grammars - grammar - stochastic - contextfree - contextfree grammars 29 79_grammars_grammar_stochastic_contextfree
80 ontology - rogets - thesaurus - wordnet - concepts 29 80_ontology_rogets_thesaurus_wordnet
81 vietnamese - ner - named entity recognition - entity recognition - named entity 28 81_vietnamese_ner_named entity recognition_entity recognition
82 claim - verification - evidence - claims - fever 27 82_claim_verification_evidence_claims
83 metrics - nlg - language generation - evaluation - natural language generation 27 83_metrics_nlg_language generation_evaluation
84 responses - response - response generation - adversarial - generation 27 84_responses_response_response generation_adversarial
85 robustness - nmt - translation - neural machine - neural machine translation 27 85_robustness_nmt_translation_neural machine
86 revision - editing - seq2seq - revisions - rewriting 27 86_revision_editing_seq2seq_revisions
87 phonological - phonology - finitestate - reduplication - prosody 26 87_phonological_phonology_finitestate_reduplication
88 geolocation - location - geographic - twitter - names 26 88_geolocation_location_geographic_twitter
89 event - event extraction - extraction - event types - argument 26 89_event_event extraction_extraction_event types
90 mt - human - translation - evaluation - parity 25 90_mt_human_translation_evaluation
91 arabic - sentiment - sentiment analysis - arabic sentiment - arabic sentiment analysis 25 91_arabic_sentiment_sentiment analysis_arabic sentiment
92 emoji - emojis - emoji prediction - emoticons - sentiment 25 92_emoji_emojis_emoji prediction_emoticons
93 constituency - latent tree - parsing - constituency parsing - tree learning 25 93_constituency_latent tree_parsing_constituency parsing
94 spatial - instructions - 3d - environment - robot 24 94_spatial_instructions_3d_environment
95 persona - responses - personality - traits - consistency 23 95_persona_responses_personality_traits
96 matching - response - retrievalbased - chatbots - multiturn 23 96_matching_response_retrievalbased_chatbots
97 entity - entity typing - typing - finegrained entity - type 22 97_entity_entity typing_typing_finegrained entity
98 math - word problems - math word - word problem - problems 21 98_math_word problems_math word_word problem
99 bert - multilingual - multilingual bert - bert model - multilingual models 21 99_bert_multilingual_multilingual bert_bert model
100 financial - stock - market - news - price 21 100_financial_stock_market_news
101 video - multimodal - sceneaware - dialog - visual 21 101_video_multimodal_sceneaware_dialog
102 sense - multisense - senses - word sense - word 21 102_sense_multisense_senses_word sense
103 game - games - agents - communication - pragmatic 21 103_game_games_agents_communication
104 graph - amrtotext - amrtotext generation - amr - graphs 20 104_graph_amrtotext_amrtotext generation_amr
105 nmt - translation - neural machine translation - neural machine - machine translation 20 105_nmt_translation_neural machine translation_neural machine
106 normalization - text normalization - normalizing - text - historical 20 106_normalization_text normalization_normalizing_text
107 privacy - policies - anonymization - deidentification - vague 20 107_privacy_policies_anonymization_deidentification
108 beam - beam search - search - decoding - constraints 20 108_beam_beam search_search_decoding
109 hypernymy - distributional - pathbased - hypernymy detection - hypernyms 19 109_hypernymy_distributional_pathbased_hypernymy detection
110 political - bias - articles - news - ideology 19 110_political_bias_articles_news
111 generative adversarial - gans - gan - generative - generative adversarial networks 18 111_generative adversarial_gans_gan_generative
112 pos - tagger - tagging - pos tagging - codemixed 17 112_pos_tagger_tagging_pos tagging
113 humor - humorous - headlines - funny - puns 17 113_humor_humorous_headlines_funny
114 metaphor - metaphors - metaphoric - metaphorical - literal 17 114_metaphor_metaphors_metaphoric_metaphorical
115 codeswitching - cs - asr - speech - speech recognition 17 115_codeswitching_cs_asr_speech
116 event coreference - event - coreference - coreference resolution - resolution 17 116_event coreference_event_coreference_coreference resolution
117 reviews - review - helpfulness - opinion - online reviews 17 117_reviews_review_helpfulness_opinion
118 covid19 - tweets - wnut2020 - twitter - informative 17 118_covid19_tweets_wnut2020_twitter
119 anaphora - resolution - pronouns - pronoun - anaphora resolution 17 119_anaphora_resolution_pronouns_pronoun
120 bilingual - dictionary - comparability - termhood - comparable corpora 17 120_bilingual_dictionary_comparability_termhood
121 discourse - translation - pronouns - dp - discourse phenomena 17 121_discourse_translation_pronouns_dp
122 color - colour - naming - colors - character embeddings 16 122_color_colour_naming_colors
123 nonautoregressive - autoregressive - nat - nonautoregressive neural - decoding 16 123_nonautoregressive_autoregressive_nat_nonautoregressive neural
124 nlg - natural language generation - language generation - spoken dialogue - generation 16 124_nlg_natural language generation_language generation_spoken dialogue
125 crowdsourcing - workers - examples - protocols - data collection 16 125_crowdsourcing_workers_examples_protocols
126 african - revolution - african languages - technology - african language 16 126_african_revolution_african languages_technology
127 grading - scoring - essay - short answer - essay scoring 16 127_grading_scoring_essay_short answer
128 treebanks - treebank - parsing - crosslingual - dependency 16 128_treebanks_treebank_parsing_crosslingual
129 reviews - summarization - review - product - summaries 16 129_reviews_summarization_review_product
130 gaze - reading - eyetracking - eye - behaviour 16 130_gaze_reading_eyetracking_eye
131 nlp - natural - natural language - nlg - language 15 131_nlp_natural_natural language_nlg
132 news translation - news translation task - translation task - news - submission 14 132_news translation_news translation task_translation task_news
133 eat - meaning - semantics - formal - theory 14 133_eat_meaning_semantics_formal
134 sign - sign language - sl - asl - deaf 14 134_sign_sign language_sl_asl
135 multitask - labels - mtl - sequence - multitask learning 14 135_multitask_labels_mtl_sequence
136 phylogenetic - cognate - indoeuropean - historical linguistics - indoeuropean language 14 136_phylogenetic_cognate_indoeuropean_historical linguistics
137 syntax - translation - neural machine translation - neural machine - nmt 14 137_syntax_translation_neural machine translation_neural machine
138 explanations - explanation - explainers - nl explanations - faithful 14 138_explanations_explanation_explainers_nl explanations
139 slot - slot filling - filling - slots - nlu 13 139_slot_slot filling_filling_slots
140 personality - traits - profiling - author profiling - author 13 140_personality_traits_profiling_author profiling
141 preposition - prepositions - supersenses - prepositional - supersense 13 141_preposition_prepositions_supersenses_prepositional
142 scientific - application areas - application - areas - literature 13 142_scientific_application areas_application_areas
143 russian - similarity - semantic similarity - similarity task - semantic similarity task 13 143_russian_similarity_semantic similarity_similarity task
144 code - source code - documentation - code generation - programming 13 144_code_source code_documentation_code generation
145 semantic web - translation - machinetranslation - machine translation - technologies 12 145_semantic web_translation_machinetranslation_machine translation
146 knowledge - knowledgegrounded - response - dialogue generation - dialogue 12 146_knowledge_knowledgegrounded_response_dialogue generation
147 sentence - sentence representations - sentence embeddings - transfer - tasks 12 147_sentence_sentence representations_sentence embeddings_transfer
148 distributional - distributional semantics - semantics - functional distributional - functional distributional semantics 12 148_distributional_distributional semantics_semantics_functional distributional
149 compositionality - sc - distributional - sememe knowledge - phrase 12 149_compositionality_sc_distributional_sememe knowledge
150 ud - annotation - treebank - treebanks - universal dependencies 12 150_ud_annotation_treebank_treebanks
151 acronym - abbreviation - acronyms - abbreviations - disambiguation 12 151_acronym_abbreviation_acronyms_abbreviations
152 propaganda - task 11 - 11 - propaganda detection - semeval2020 task 12 152_propaganda_task 11_11_propaganda detection
153 open - open information extraction - open information - information extraction - tuples 12 153_open_open information extraction_open information_information extraction
154 hebrew - bible - intertextuality - restoration - homographs 11 154_hebrew_bible_intertextuality_restoration
155 typological - typology - typological features - languages - linguistic typology 11 155_typological_typology_typological features_languages
156 label - text classification - multilabel - labels - classification 11 156_label_text classification_multilabel_labels
157 variational - latent - variational autoencoders - variational autoencoder - autoencoders 11 157_variational_latent_variational autoencoders_variational autoencoder
158 crisis - messages - disasters - disaster - emergency 11 158_crisis_messages_disasters_disaster
159 adversarial - rc - rc models - robustness - comprehension 11 159_adversarial_rc_rc models_robustness
160 tree - treelstm - trees - tree structures - syntactic 11 160_tree_treelstm_trees_tree structures
161 headline - headlines - news - headline generation - synthetic news 11 161_headline_headlines_news_headline generation
162 reasoning - kg - paths - kgs - multihop 11 162_reasoning_kg_paths_kgs
163 text classification - classification - runtime - fasttext - text 10 163_text classification_classification_runtime_fasttext

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.25.2
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.6
  • Pandas: 2.0.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.6.1
  • Transformers: 4.38.2
  • Numba: 0.58.1
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
19
Inference API
This model can be loaded on Inference API (serverless).