HamidBekam commited on
Commit
f7adca1
1 Parent(s): 3adab86

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -9,6 +9,9 @@ tags:
9
 
10
  # PatentSBERTa
11
 
 
 
 
12
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
13
 
14
  <!--- Describe your model here -->
 
9
 
10
  # PatentSBERTa
11
 
12
+
13
+ This study provides an efficient approach for using text data to calculate patent-to-patent (p2p) technological similarity, and presents a hybrid framework for leveraging the resulting p2p similarity for applications such as semantic search and automated patent classification. We create embeddings using Sentence-BERT (SBERT) based on patent claims. For domain adaptation of the general SBERT model, we implement an augmented approach for fine-tuning SBERT by in-domain supervised patent claims data. We leverage SBERTs efficiency in creating embedding distance measures to map p2p similarity in large sets of patent data. This framework is used for automated patent classification with a simple Nearest Neighbors (KNN) model that predicts assigned Cooperative Patent Classification (CPC) based on the class assignment of the K patents with the highest p2p similarity. We thereby validate that the p2p similarity captures their technological features in terms of CPC overlap, and at the same time demonstrate the usefulness of this approach for automatic patent classification based on text data. Furthermore, the presented classification framework is simple and the results easy to interpret and evaluate by end-users. In the out-of-sample model validation, we are able to perform a multi-label prediction of all assigned CPC classes on the subclass (663) level on 1,492,294 patents with an F1 score > 66%, which suggests that our model outperforms the current state-of-the-art in text-based multi-label and multi-class patent classification. We furthermore discuss the applicability of the presented framework for semantic IP search, patent landscaping, and technology intelligence. We finally point towards a future research agenda for leveraging multi-source patent embeddings, their appropriateness across applications, as well as to improve and validate patent embeddings by creating domain-expert curated Semantic Textual Similarity (STS) benchmark datasets.
14
+
15
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
16
 
17
  <!--- Describe your model here -->