I want to fine tune this model to my custom dataset. Help

#13
by Qw3rtyd4ddy - opened

Hey, I am a newbie. Please help me.
I want to fine tune this model on my custom dataset.
Which format do i use to make data set. I know i need to use json but i dont know format of json dataset that will work with this model. Or is it that any kind of json dataset will work with this. Please tell me how to finetune this on google collab and how to make a dataset. My example dataset is what i pulled from this model itself. Please help and correct me.
How do i use this dataset or make new one.

{
    "Vulnerability1": {
        "Description": "SQL Injection",
        "Working": "The application does not properly validate user input. An attacker injects malicious SQL code to steal data or modify content.",
        "Payloads": ["SELECT * FROM users", "UPDATE users SET password = 'password' WHERE username = 'user'"],
        "Prevention": "Validate user input, use prepared statements, and limit database privileges."
    },
    "Vulnerability2": {
        "Description": "Cross-Site Scripting (XSS)",
        "Working": "An attacker injects malicious JavaScript code into a web page to steal cookies or session tokens.",
        "Payloads": ["<script>document.cookie</script>", "<script>fetch('/login', { method: 'POST', headers: { 'Authorization': 'Bearer <token>' } })</script>"],
        "Prevention": "Use input validation, encode output, and use Content Security Policy."
    },
    ... (49 more vulnerabilities)

@Qw3rtyd4ddy You can find tutorials here https://github.com/unslothai/unsloth , there's no difference between tuning this model or any other LLM (or Llama3 for that matter).

Orenguteng changed discussion status to closed

Can you please help me and show how to make a dataset? Like What should be the dataset made fo?

Sign up or log in to comment