LearnItAnyway
/

YOLO_LLaMa_7B_VisNav

Text Generation

Inference Endpoints

Model card Files Files and versions Community

YOLO_LLaMa_7B_VisNav / README.md

LearnItAnyway's picture

Update README.md

be97739 about 1 year ago

|

history blame contribute delete

No virus

934 Bytes

	---
	license: other
	---

	# Overview
	This project aims to support visually impaired individuals in their daily navigation.

	This project combines the [YOLO](https://ultralytics.com/yolov8) model and [LLaMa 2 7b](https://huggingface.co/meta-llama/Llama-2-7b) for the navigation.

	YOLO is trained on the bounding box data from the [AI Hub](https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=189),
	Output of YOLO (bbox data) is converted as lists like `[[class_of_obj_1, xmin, xmax, ymin, ymax, size], [class_of...] ...]` then added to the input of question.
	The LLM is trained to navigate using [LearnItAnyway/Visual-Navigation-21k](https://huggingface.co/datasets/LearnItAnyway/Visual-Navigation-21k) multi-turn dataset


	## Usage
	We show how to use the model in [yolo_llama_visnav_test.ipynb](https://huggingface.co/LearnItAnyway/YOLO_LLaMa_7B_VisNav/blob/main/yolo_llama_visnav_test.ipynb)