ml6team
/

keyphrase-extraction-distilbert-kptimes

@@ -32,7 +32,7 @@ Keyphrase extraction is a technique in text analysis where you extract the impor
 ## 📓 Model Description
-This model is a fine-tuned distilbert model on the kptimes dataset. More information can be found here: https://huggingface.co/distilbert-base-uncased.
 The model is fine-tuned as a token classification problem where the text is labeled using the BIO scheme.
@@ -80,18 +80,19 @@ class KeyphraseExtractionPipeline(TokenClassificationPipeline):
 ```python
 # Load pipeline
-model_name = "DeDeckerThomas/keyphrase-extraction-distilbert-kptimes"
 extractor = KeyphraseExtractionPipeline(model=model_name)
 ```
 ```python
 # Inference
 text = """
 Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
-Since this is a time-consuming process, Artificial Intelligence is used to automate it.
-Currently, classical machine learning methods, that use statistics and linguistics, are widely used for the extraction process.
-The fact that these methods have been widely used in the community has the advantage that there are many easy-to-use libraries.
-Now with the recent innovations in deep learning methods (such as recurrent neural networks and transformers, GANS, …),
-keyphrase extraction can be improved. These new methods also focus on the semantics and context of a document, which is quite an improvement.
 """.replace(
     "\n", ""
 )
@@ -103,10 +104,7 @@ print(keyphrases)
 ```
 # Output
-['Artificial Intelligence' 'GANS' 'Keyphrase extraction'
- 'classical machine learning' 'deep learning methods'
- 'keyphrase extraction' 'linguistics' 'recurrent neural networks'
- 'semantics' 'statistics' 'text analysis' 'transformers']
 ```
 ## 📚 Training Dataset
@@ -164,7 +162,7 @@ def preprocess_fuction(all_samples_per_split):
 ```
 ### Postprocessing
-For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive B and Is. As last you strip the keyphrase to ensure all spaces are removed.
 ```python
 # Define post_process functions
 def concat_tokens_by_tag(keyphrases):
@@ -198,7 +196,7 @@ def extract_keyphrases(example, predictions, tokenizer, index=0):
 ```
 ## 📝 Evaluation results
-One of the traditional evaluation methods is the precision, recall and F1-score @k,m where k is the number that stands for the first k predicted keyphrases and m for the average amount of predicted keyphrases.
 The model achieves the following results on the KPTimes test set:
 | Dataset           | P@5  | R@5  | F1@5 | P@10 | R@10 | F1@10 | P@M  | R@M  | F1@M |
@@ -208,4 +206,4 @@ The model achieves the following results on the KPTimes test set:
 For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
 ## 🚨 Issues
-Please feel free to contact Thomas De Decker for any problems with this model.

 ## 📓 Model Description
+This model is a fine-tuned distilbert model on the KPTimes dataset. More information can be found here: https://huggingface.co/distilbert-base-uncased.
 The model is fine-tuned as a token classification problem where the text is labeled using the BIO scheme.
 ```python
 # Load pipeline
+model_name = "ml6team/keyphrase-extraction-distilbert-kptimes"
 extractor = KeyphraseExtractionPipeline(model=model_name)
 ```
 ```python
 # Inference
 text = """
 Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
+Since this is a time-consuming process, Artificial Intelligence is used to automate it.
+Currently, classical machine learning methods, that use statistics and linguistics,
+are widely used for the extraction process. The fact that these methods have been widely used in the community
+has the advantage that there are many easy-to-use libraries. Now with the recent innovations in NLP,
+transformers can be used to improve keyphrase extraction. Transformers also focus on the semantics
+and context of a document, which is quite an improvement.
 """.replace(
     "\n", ""
 )
 ```
 # Output
+['artificial intelligence']
 ```
 ## 📚 Training Dataset
 ```
 ### Postprocessing
+For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive Bs and Is. As last you strip the keyphrase to ensure all spaces are removed.
 ```python
 # Define post_process functions
 def concat_tokens_by_tag(keyphrases):
 ```
 ## 📝 Evaluation results
+One of the traditional evaluation methods is the precision, recall and F1-score @K,M where k is the number that stands for the first K predicted keyphrases and M for the average amount of predicted keyphrases.
 The model achieves the following results on the KPTimes test set:
 | Dataset           | P@5  | R@5  | F1@5 | P@10 | R@10 | F1@10 | P@M  | R@M  | F1@M |
 For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
 ## 🚨 Issues
+Please feel free to start discussions in the Community Tab.