chore: Update LLaMA model to version 3 and refactor speech-to-text functionality
Browse files
README.md
CHANGED
@@ -1,41 +1,31 @@
|
|
1 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
-
A speaking assistant designed for in-car use, leveraging the LLaMA 2 model to facilitate vocal interactions between the car and its users. This notebook provides the foundation for a speech-enabled interface that can understand spoken questions and respond verbally, enhancing the driving experience with intelligent assistance.
|
4 |
|
5 |
## Description
|
6 |
|
7 |
-
This project integrates speech-to-text and text-to-speech functionalities into a car's infotainment system, using the LLaMA
|
8 |
|
9 |
## Features
|
10 |
|
11 |
• Speech-to-Text and Text-to-Speech: Enables the car assistant to listen to spoken questions and respond audibly, providing a hands-free experience for drivers and passengers.
|
12 |
-
• Intelligent Function Calling with NexusRaven: Implements a sophisticated system for executing commands and retrieving information based on user queries, using the LLaMA
|
13 |
• Dynamic Model Integration: Incorporates multiple models for language recognition, speech processing, and text generation.
|
14 |
• User-Friendly Gradio Interface: easy-to-use interface for testing and deploying the speaking assistant within the car's infotainment system.
|
15 |
• Real-Time Information Retrieval: Capable of integrating with various APIs to provide up-to-date information on weather, routes, points of interest, and more.
|
16 |
|
17 |
-
## Requirements
|
18 |
-
|
19 |
-
• Gradio for creating interactive interfaces
|
20 |
-
• Hugging Face Transformers and additional ML models for speech and language processing
|
21 |
-
• NexusRaven for complex function execution
|
22 |
-
All required libraries and packages are directly loaded inside the notebook.
|
23 |
-
|
24 |
-
## Installation
|
25 |
-
|
26 |
-
To set up the speaking assistant in your car's system, follow these steps:
|
27 |
-
1. Run all the cells until the “Interfaces (text and audio)” section.
|
28 |
-
2. Choose between the interfaces which one to run: audio-to-audio or text-to-text.
|
29 |
-
|
30 |
-
### Usage
|
31 |
-
1. Model Setup: Begin by loading the necessary models for speech recognition, language processing, and text-to-speech conversion as detailed in the "Models loads" section.
|
32 |
-
2. Function Definition: Customize the assistant's responses and capabilities by defining functions in the "Function calling with NexusRaven" section.
|
33 |
-
3. Interface Configuration: Choose the Gradio interface that suits your in-car system, following setup instructions in the "Interfaces (text and audio)" section.
|
34 |
-
4. Activation: Execute one of the interface to start the speaking assistant, enabling vocal interactions within the car.
|
35 |
|
36 |
## Authors
|
37 |
|
38 |
-
Sasan Jafarnejad
|
39 |
Abigail Berthe--Pardo
|
40 |
|
41 |
|
|
|
1 |
+
# KITT: Knowledge-based Intelligence for Transportation Technologies
|
2 |
+
|
3 |
+
Presented at [IEEE VNC 2024](https://ieee-vnc.org/2024/) (Vehicle Networking Conference) as a demo titled "Demo: Towards a Conversational LLM-Based Voice Assistant for Transportation Applications"
|
4 |
+
|
5 |
+
## Abstract
|
6 |
+
|
7 |
+
Conversational assistants based on large language models (LLMs) have spread widely across many domains, and the automotive industry is keen to follow suit.
|
8 |
+
However, current LLMs lack sufficient understanding of geospatial data; in addition, timely information, such as weather and traffic conditions, is inaccessible to LLMs.
|
9 |
+
In this demo, we present an in-car assistant capable of verbally communicating with the driver, and by utilizing external APIs, it can answer questions related to routing, finding points of interest, and is aware of the local weather and traffic conditions.
|
10 |
+
The assistant, including a customizable speech synthesizer, is accessible through a graphical user interface that facilitates experimentation by simulating the change in time, origin, destination, and location of the car.
|
11 |
|
|
|
12 |
|
13 |
## Description
|
14 |
|
15 |
+
This project integrates speech-to-text and text-to-speech functionalities into a car's infotainment system, using the LLaMA 3 model to process and respond to vocal queries from users. It employs Gradio for user interface creation, NexusRaven for function calling, and integrates various APIs to fetch real-time information, making it a comprehensive solution for creating a responsive and interactive car assistant.
|
16 |
|
17 |
## Features
|
18 |
|
19 |
• Speech-to-Text and Text-to-Speech: Enables the car assistant to listen to spoken questions and respond audibly, providing a hands-free experience for drivers and passengers.
|
20 |
+
• Intelligent Function Calling with NexusRaven: Implements a sophisticated system for executing commands and retrieving information based on user queries, using the LLaMA 3 model's capabilities.
|
21 |
• Dynamic Model Integration: Incorporates multiple models for language recognition, speech processing, and text generation.
|
22 |
• User-Friendly Gradio Interface: easy-to-use interface for testing and deploying the speaking assistant within the car's infotainment system.
|
23 |
• Real-Time Information Retrieval: Capable of integrating with various APIs to provide up-to-date information on weather, routes, points of interest, and more.
|
24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
## Authors
|
27 |
|
28 |
+
Sasan Jafarnejad
|
29 |
Abigail Berthe--Pardo
|
30 |
|
31 |
|