FelipeSarmento's picture
Add BERTopic model
412e06f verified
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

transformers_issues_topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("FelipeSarmento/transformers_issues_topics")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 30
  • Number of training documents: 9000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 bert - model - input - models - layer 11 -1_bert_model_input_models
0 encoderdecoder - encoderdecodermodel - encoder - decoder - tokenizers 2113 0_encoderdecoder_encoderdecodermodel_encoder_decoder
1 trainertrain - trainer - trainingarguments - pytorch - training 1753 1_trainertrain_trainer_trainingarguments_pytorch
2 importerror - attributeerror - valueerror - typeerror - indexerror 1248 2_importerror_attributeerror_valueerror_typeerror
3 modelcard - modelcards - card - model - models 472 3_modelcard_modelcards_card_model
4 doc - typos - docstring - docs - typo 458 4_doc_typos_docstring_docs
5 xlnetmodel - xlnet - xlnetlmheadmodel - xlm - xlarge 358 5_xlnetmodel_xlnet_xlnetlmheadmodel_xlm
6 gpt2 - gpt2tokenizer - gpt2xl - gpt2tokenizerfast - gpt 301 6_gpt2_gpt2tokenizer_gpt2xl_gpt2tokenizerfast
7 readmemd - modelcard - readmetxt - readme - file 278 7_readmemd_modelcard_readmetxt_readme
8 ci - testing - tests - speedup - slow 262 8_ci_testing_tests_speedup
9 transformerscli - transformers - transformer - transformerxl - importerror 245 9_transformerscli_transformers_transformer_transformerxl
10 s2s - s2t - s2sdistill - s2strainer - exampless2s 238 10_s2s_s2t_s2sdistill_s2strainer
11 trainertrain - trainer - logging - training - feattrainer 212 11_trainertrain_trainer_logging_training
12 questionansweringpipeline - distilbertforquestionanswering - answering - questionanswering - tfalbertforquestionanswering 139 12_questionansweringpipeline_distilbertforquestionanswering_answering_questionanswering
13 ner - pipeline - pipelines - nerpipeline - fixpipeline 127 13_ner_pipeline_pipelines_nerpipeline
14 longformer - longformers - longform - longformerforqa - longformerlayer 126 14_longformer_longformers_longform_longformerforqa
15 label - labelsmoothingfactor - labels - labelsmoothing - labellist 116 15_label_labelsmoothingfactor_labels_labelsmoothing
16 onnxonnxruntime - onnx - onnxexport - 04onnxexport - 04onnxexportipynb 101 16_onnxonnxruntime_onnx_onnxexport_04onnxexport
17 generationbeamsearchpy - generatebeamsearch - generatebeamsearchoutputs - beamsearch - nonbeamsearch 86 17_generationbeamsearchpy_generatebeamsearch_generatebeamsearchoutputs_beamsearch
18 flax - flaxelectraformaskedlm - flaxelectraforpretraining - flaxjax - flaxelectramodel 55 18_flax_flaxelectraformaskedlm_flaxelectraforpretraining_flaxjax
19 configpath - configs - config - configuration - modelconfigs 49 19_configpath_configs_config_configuration
20 amp - tf - electrapretrainedmodel - tflongformer - modelingelectra 47 20_amp_tf_electrapretrainedmodel_tflongformer
21 wandbproject - wandb - wandbcallback - wandbdisabled - wandbdisabledtrue 39 21_wandbproject_wandb_wandbcallback_wandbdisabled
22 cachedir - cache - cachedpath - caching - cached 37 22_cachedir_cache_cachedpath_caching
23 notebook - notebooks - community - blenderbot3b - blenderbot 32 23_notebook_notebooks_community_blenderbot3b
24 adamw - adam - adambetas - trainingargs - wip 30 24_adamw_adam_adambetas_trainingargs
25 pplm - pr - deprecated - variable - ppl 24 25_pplm_pr_deprecated_variable
26 layoutlm - layoutlmtokenizer - layout - layoutlmbaseuncased - tf 15 26_layoutlm_layoutlmtokenizer_layout_layoutlmbaseuncased
27 closed - licens - license - deleted - uss 14 27_closed_licens_license_deleted
28 isort - github - repo - version - setupcfg 14 28_isort_github_repo_version

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: 30
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.26.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.5
  • Pandas: 2.2.2
  • Scikit-Learn: 1.4.2
  • Sentence-transformers: 2.7.0
  • Transformers: 4.39.3
  • Numba: 0.59.1
  • Plotly: 5.21.0
  • Python: 3.11.0