metadata
license: apache-2.0
tags:
- Computer
- computervision
Uses
This LLM is trained on data generated by my code for the yolov8 model. Github code The model is capable of briefly describing what the yolov8 model can detect and can also execute a command (/click). When the command is triggered, a dictionary is generated containing the key data of the object to be clicked.
Testing
You can test the model by giving it this informations:
{
"Object": [
{
"index": "window_0",
"label": "window",
"property": "toplayer",
"coords": [
189.06007385253906,
79.33326721191406,
1156.018798828125,
750.1478271484375
],
"textes": 24,
"interactions": [
{
"label": "close_window",
"interaction_type": 1,
"coords": [
1114.04541015625,
84.65348815917969,
1149.1778564453125,
113.41248321533203
]
},
{
"label": "maximize",
"interaction_type": 1,
"coords": [
1067.0111083984375,
84.82215118408203,
1099.86328125,
112.69491577148438
]
},
{
"label": "minize_window",
"interaction_type": 1,
"coords": [
1024.7701416015625,
85.06327819824219,
1053.4327392578125,
111.52396392822266
]
}
]
}
]
}
You can give the model this informations and a prompt like "Was siehst du" or "Kannst du das Fenster schließen".
The Model is at the moment only trained on german.