DeDeckerThomas
commited on
Commit
•
5032a92
1
Parent(s):
63d88a0
Update README.md
Browse files
README.md
CHANGED
@@ -32,7 +32,7 @@ Keyphrase extraction is a technique in text analysis where you extract the impor
|
|
32 |
|
33 |
|
34 |
## 📓 Model Description
|
35 |
-
This model is a fine-tuned distilbert model on the
|
36 |
|
37 |
The model is fine-tuned as a token classification problem where the text is labeled using the BIO scheme.
|
38 |
|
@@ -79,18 +79,20 @@ class KeyphraseExtractionPipeline(TokenClassificationPipeline):
|
|
79 |
|
80 |
```python
|
81 |
# Load pipeline
|
82 |
-
model_name = "
|
83 |
extractor = KeyphraseExtractionPipeline(model=model_name)
|
84 |
```
|
|
|
85 |
```python
|
86 |
# Inference
|
87 |
text = """
|
88 |
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
|
89 |
-
Since this is a time-consuming process, Artificial Intelligence is used to automate it.
|
90 |
-
Currently, classical machine learning methods, that use statistics and linguistics,
|
91 |
-
The fact that these methods have been widely used in the community
|
92 |
-
|
93 |
-
|
|
|
94 |
""".replace(
|
95 |
"\n", ""
|
96 |
)
|
@@ -102,10 +104,7 @@ print(keyphrases)
|
|
102 |
|
103 |
```
|
104 |
# Output
|
105 |
-
['
|
106 |
-
'classical machine learning' 'deep learning methods'
|
107 |
-
'keyphrase extraction' 'linguistics' 'recurrent neural networks'
|
108 |
-
'semantics' 'statistics' 'text analysis' 'transformers']
|
109 |
```
|
110 |
|
111 |
## 📚 Training Dataset
|
@@ -163,7 +162,7 @@ def preprocess_fuction(all_samples_per_split):
|
|
163 |
```
|
164 |
|
165 |
### Postprocessing
|
166 |
-
For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive
|
167 |
```python
|
168 |
# Define post_process functions
|
169 |
def concat_tokens_by_tag(keyphrases):
|
@@ -207,4 +206,4 @@ The model achieves the following results on the OpenKP test set:
|
|
207 |
For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
|
208 |
|
209 |
## 🚨 Issues
|
210 |
-
Please feel free to
|
|
|
32 |
|
33 |
|
34 |
## 📓 Model Description
|
35 |
+
This model is a fine-tuned distilbert model on the OpenKP dataset. More information can be found here: https://huggingface.co/distilbert-base-uncased.
|
36 |
|
37 |
The model is fine-tuned as a token classification problem where the text is labeled using the BIO scheme.
|
38 |
|
|
|
79 |
|
80 |
```python
|
81 |
# Load pipeline
|
82 |
+
model_name = "ml6team/keyphrase-extraction-distilbert-openkp"
|
83 |
extractor = KeyphraseExtractionPipeline(model=model_name)
|
84 |
```
|
85 |
+
|
86 |
```python
|
87 |
# Inference
|
88 |
text = """
|
89 |
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
|
90 |
+
Since this is a time-consuming process, Artificial Intelligence is used to automate it.
|
91 |
+
Currently, classical machine learning methods, that use statistics and linguistics,
|
92 |
+
are widely used for the extraction process. The fact that these methods have been widely used in the community
|
93 |
+
has the advantage that there are many easy-to-use libraries. Now with the recent innovations in NLP,
|
94 |
+
transformers can be used to improve keyphrase extraction. Transformers also focus on the semantics
|
95 |
+
and context of a document, which is quite an improvement.
|
96 |
""".replace(
|
97 |
"\n", ""
|
98 |
)
|
|
|
104 |
|
105 |
```
|
106 |
# Output
|
107 |
+
['keyphrase extraction', 'text analysis']
|
|
|
|
|
|
|
108 |
```
|
109 |
|
110 |
## 📚 Training Dataset
|
|
|
162 |
```
|
163 |
|
164 |
### Postprocessing
|
165 |
+
For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive Bs and Is. As last you strip the keyphrase to ensure all spaces are removed.
|
166 |
```python
|
167 |
# Define post_process functions
|
168 |
def concat_tokens_by_tag(keyphrases):
|
|
|
206 |
For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
|
207 |
|
208 |
## 🚨 Issues
|
209 |
+
Please feel free to start discussions in the Community Tab.
|