DeDeckerThomas
commited on
Commit
β’
2ef9d8f
1
Parent(s):
4fdb3ab
Update README.md
Browse files
README.md
CHANGED
@@ -83,18 +83,19 @@ class KeyphraseExtractionPipeline(TokenClassificationPipeline):
|
|
83 |
|
84 |
```python
|
85 |
# Load pipeline
|
86 |
-
model_name = "
|
87 |
extractor = KeyphraseExtractionPipeline(model=model_name)
|
88 |
```
|
89 |
```python
|
90 |
# Inference
|
91 |
text = """
|
92 |
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
|
93 |
-
Since this is a time-consuming process, Artificial Intelligence is used to automate it.
|
94 |
-
Currently, classical machine learning methods, that use statistics and linguistics,
|
95 |
-
The fact that these methods have been widely used in the community
|
96 |
-
|
97 |
-
|
|
|
98 |
""".replace(
|
99 |
"\n", ""
|
100 |
)
|
@@ -106,10 +107,9 @@ print(keyphrases)
|
|
106 |
|
107 |
```
|
108 |
# Output
|
109 |
-
['
|
110 |
-
'
|
111 |
-
'
|
112 |
-
'semantics' 'statistics' 'text analysis' 'transformers']
|
113 |
```
|
114 |
|
115 |
## π Training Dataset
|
@@ -172,7 +172,7 @@ def preprocess_fuction(all_samples_per_split):
|
|
172 |
```
|
173 |
|
174 |
### Postprocessing
|
175 |
-
For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive
|
176 |
```python
|
177 |
# Define post_process functions
|
178 |
def concat_tokens_by_tag(keyphrases):
|
@@ -216,4 +216,4 @@ The model achieves the following results on the Inspec test set:
|
|
216 |
For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
|
217 |
|
218 |
## π¨ Issues
|
219 |
-
Please feel free to
|
|
|
83 |
|
84 |
```python
|
85 |
# Load pipeline
|
86 |
+
model_name = "ml6team/keyphrase-extraction-distilbert-inspec"
|
87 |
extractor = KeyphraseExtractionPipeline(model=model_name)
|
88 |
```
|
89 |
```python
|
90 |
# Inference
|
91 |
text = """
|
92 |
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
|
93 |
+
Since this is a time-consuming process, Artificial Intelligence is used to automate it.
|
94 |
+
Currently, classical machine learning methods, that use statistics and linguistics,
|
95 |
+
are widely used for the extraction process. The fact that these methods have been widely used in the community
|
96 |
+
has the advantage that there are many easy-to-use libraries. Now with the recent innovations in NLP,
|
97 |
+
transformers can be used to improve keyphrase extraction. Transformers also focus on the semantics
|
98 |
+
and context of a document, which is quite an improvement.
|
99 |
""".replace(
|
100 |
"\n", ""
|
101 |
)
|
|
|
107 |
|
108 |
```
|
109 |
# Output
|
110 |
+
['artificial intelligence', 'classical machine learning methods',
|
111 |
+
'keyphrase extraction', 'linguistics', 'statistics',
|
112 |
+
'text analysis']
|
|
|
113 |
```
|
114 |
|
115 |
## π Training Dataset
|
|
|
172 |
```
|
173 |
|
174 |
### Postprocessing
|
175 |
+
For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive Bs and Is. As last you strip the keyphrase to ensure all spaces are removed.
|
176 |
```python
|
177 |
# Define post_process functions
|
178 |
def concat_tokens_by_tag(keyphrases):
|
|
|
216 |
For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
|
217 |
|
218 |
## π¨ Issues
|
219 |
+
Please feel free to start discussions in the Community Tab.
|