File size: 3,002 Bytes
1959cc0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e97ed61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
202e8d9
e97ed61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7cc2af3
 
 
06e222e
 
7cc2af3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: apache-2.0
datasets:
- bitext/Bitext-customer-support-llm-chatbot-training-dataset
language:
- en
base_model:
- unsloth/llama-3-8b-bnb-4bit
pipeline_tag: text-generation
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- gguf
- Customer-Support-Bot
---
# Customer Support Chatbot with LLaMA 3.1

> An end-to-end customer support chatbot solution powered by fine-tuned LLaMA 3.1 8B model, deployed using Flask, Docker, and AWS ECS.

## Overview

This project implements a sophisticated customer support chatbot leveraging the LLaMA 3.1 8B model fine-tuned on customer support conversations. The solution uses LoRA fine-tuning and various quantization techniques for optimized inference, deployed as a containerized application on AWS ECS with Fargate.

## Features

- **Fine-tuned LLaMA 3.1 Model**: Customized for customer support using the [Bitext customer support dataset](https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset)
- **Optimized Inference**: Implements 4-bit, 8-bit, and 16-bit quantization
- **Containerized Deployment**: Docker-based deployment for consistency and scalability
- **Cloud Infrastructure**: Hosted on AWS ECS with Fargate for serverless container management
- **CI/CD Pipeline**: Automated deployment using AWS CodePipeline
- **Monitoring**: Comprehensive logging and monitoring via AWS CloudWatch

## Model Details

The fine-tuned model is hosted on Hugging Face:
- Model Repository: [praneethposina/customer_support_bot](https://huggingface.co/praneethposina/customer_support_bot)
- Github Repository: [github.com/praneethposina/Customer_Support_Chatbot](https://github.com/praneethposina/Customer_Support_Chatbot)
- Base Model: LLaMA 3.1 8B
- Training Dataset: Bitext Customer Support Dataset
- Optimization: LoRA fine-tuning with quantization

## Tech Stack

- **Backend**: Flask API
- **Model Serving**: Ollama
- **Containerization**: Docker
- **Cloud Services**: 
  - AWS ECS (Fargate)
  - AWS CodePipeline
  - AWS CloudWatch
- **Model Training**: LoRA, Quantization

## Screenshots

### Chatbot Interface

![Chatbot SS](https://github.com/user-attachments/assets/220aea77-bb2b-4f50-b6a4-0541434d85ef)

![Chatbot SS2](https://github.com/user-attachments/assets/da440735-59d7-4be7-a43d-d51de8983738)

### AWS CloudWatch Monitoring

![CloudWatch SS](https://github.com/user-attachments/assets/9794bc3e-4b9c-4626-9a7f-3936d4757328)

### Docker Logs

<img width="1270" alt="Docker ss" src="https://github.com/user-attachments/assets/a72d1c35-8203-4a05-b944-743ea6c0a6b8" />
<img width="1268" alt="Docker ss2" src="https://github.com/user-attachments/assets/f1b0c0b1-2aad-462c-adf2-7a7ea9047a1a" />

## AWS Deployment

1. Push Docker image to Amazon ECR
2. Configure AWS ECS Task Definition
3. Set up AWS CodePipeline for CI/CD
4. Configure CloudWatch monitoring


# Uploaded  model

- **Developed by:** praneethposina
- **License:** apache-2.0
- **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit