Python clone detection

This is a codebert model for detecting Python clone codes, fine-tuned on the dataset shared by PoolC on Hugging Face Hub. The original source code for using the model can be found at https://github.com/sangHa0411/CloneDetection/blob/main/inference.py.

How to use

To use the model in an efficient way, you can refer to this repository: https://github.com/RepoAnalysis/PythonCloneDetection, which contains a class that integrates data preprocessing, input tokenization, and model inferencing.

You can also follow the original inference source code at https://github.com/sangHa0411/CloneDetection/blob/main/inference.py.

More conveniently, a pipeline for this model has been implemented, and you can initialize it with only two lines of code:

from transformers import pipeline

pipe = pipeline(model="Lazyhope/python-clone-detection", trust_remote_code=True)

To use it, pass a tuple of code pairs:

code1 = """def token_to_inputs(feature):
    inputs = {}
    for k, v in feature.items():
        inputs[k] = torch.tensor(v).unsqueeze(0)

    return inputs"""
code2 = """def f(feature):
    return {k: torch.tensor(v).unsqueeze(0) for k, v in feature.items()}"""

is_clone = pipe((code1, code2))
is_clone
# {False: 1.3705984201806132e-05, True: 0.9999862909317017}

Credits

We would like to thank the original team and authors of the model and the fine-tuning dataset:

Lincese

This model is released under the MIT license.

Downloads last month
571
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.