fashxp commited on
Commit
8b54370
1 Parent(s): 6e08978

initial commit

Browse files
Dockerfile CHANGED
@@ -9,7 +9,9 @@ ENV HOME=/home/user \
9
  WORKDIR $HOME/app
10
 
11
  COPY --chown=user requirements.txt requirements.txt
12
- RUN pip install --no-cache-dir --upgrade -r requirements.txt
 
 
13
 
14
  COPY --chown=user . .
15
 
 
9
  WORKDIR $HOME/app
10
 
11
  COPY --chown=user requirements.txt requirements.txt
12
+ #RUN pip install --no-cache-dir --upgrade -r requirements.txt
13
+
14
+ RUN pip install --upgrade -r requirements.txt
15
 
16
  COPY --chown=user . .
17
 
data/action_descriptions.csv ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Action,Description
2
+ Customer automation rules,Open configuration of automation rules for customer data platform features.
3
+ Customer duplicates,"Detect, analyse and fix duplicate customers. "
4
+ Customers,"Open the customer listing to search for customers, filter and export them as well as open customer detail information. "
5
+ Webservice Configuration View,Setup all configurations for the webservice of the customer data platform.
6
+ Datahub Config,"Open the configuration of Pimcore datahub in order to configure graphql and simple rest endpoints, as well as webhook configurations, data imports and others. "
7
+ Back Office - Ordering,Open order listing for managing orders created via ecommerce framework
8
+ Pricing Rules,Configure pricing rules for discounts and special gifts for ecommerce framework applications.
9
+ Email Blocklist,Configure e-mail addresses to which no email should ever be sent to.
10
+ Send Test-Email,Sending a test mail to verify infrastructure and email settings.
11
+ Sent Emails,"Open a list of all mails sent via the system, also with the option to resend and forward them. "
12
+ GDPR Data Extractor,"Starting extration of data for GDPR requests, there you can search through assets, data objects, customers, users as well as emails. "
13
+ Glossary,Configuration of glossary entries for Pimcore documents.
14
+ Application Logger,Opening application logs to figure out problems.
15
+ Maintenance Mode,Activate maintenance mode to allow maintenance work without interference of other processes.
16
+ Notes & Events,"Open notes and events list, filter them and open details as well as create new notes"
17
+ Recycle Bin,Open recycle bin to restore deleted elements.
18
+ Redirects,Configure redirects of cms part of Pimcore.
19
+ System-Requirements Check,Open system requirements check to verify infrastructure meets all the requirements needed for running pimcore.
20
+ Translations,Maintain translations in the UI
21
+ XLIFF Export/Import,Export data in XLIFF format for translating and import translated data from uploaded XLIFF files
22
+ Microsoft® Word Export,Export data to Microsoft Word for translating
23
+ Compare Objects,Open dialog for comparing two data objects
24
+ Automation Blueprints,Export blueprints based on Pimcore datahub configurations for workflow automation engines like n8n.
25
+ Bookmark Lists,Open the list of all bookmarks lists visible to the current user. There the user can open the bookmark lists for further editing and delete bookmark lists.
26
+ DAM,Switch to the DAM perspective
27
+ Close all tabs,Closing all open tabs in the user interface
28
+ CMS,Switch to the CMS perspective
29
+ Documentation,Open Pimcore Documentation page.
30
+ Report Bugs,Open Pimcore Issue Tracker.
31
+ Recently Opened Elements,Get a list of recently opened elements to easily reopen them.
32
+ PIM,Switch to the PIM perspective
33
+ Default,Switch to the Default perspective
34
+ Search & Replace Assignments,Search for places where an element is linked and replace it with another element.
35
+ Collections ,"Open the collections list of the portal engine. From that list you can manage the collections, share them and open them as a tree in the UI. "
36
+ Catalog,Switch to the Catalog perspective
37
+ CDP,Switch to the CDP perspective
38
+ Commerce,Switch to the Commerce perspective
39
+ SEO Document Editor,"Open SEO Document editor and maintain most important SEO metadata of documents, like Pretty URL, Title and Description."
40
+ HTTP Errors,"Open list of HTTP errors that occurred in the system. You see error code, the frequency, dates and all the details of an error and you can open the affected path. "
41
+ Marketing Settings,"Define settings for marketing features like google analytics, google tag manager and google search console"
42
+ Custom Reports,Open configuration of custom reports. There you can manage existing reports as well as create new reports.
43
+ Reports,Open list of all reports available to you with the option to open the report details.
44
+ robots.txt,Configure robots.txt delivered for the different domains managed with this Pimcore instance.
45
+ Target Groups,"Open configuration of target groups in the system. There you can manage existing target groups, create new targeting groups and define their details. "
46
+ Global Targeting Rules,"Open configuration of global targeting rules in the system. There you can manage existing targeting rules, create new targeting rules and define their details. Targeting rules are an essential part of the personalization engine provided by Pimcore. "
47
+ Targeting Toolbar,Enable the targeting toolbar for debugging targeting behavior on the frontend
48
+ Assets,Search for Assets
49
+ Documents,Search for Documents
50
+ Data Objects,Search for Data Objects
51
+ Alternative Element Trees,"Open configuration of alternative element trees. There you can manage existing alternative element trees, create new alternative element trees as well as importing, exporting and cloning the configurations. Further, you can open an alternative element tree configuration and configure all the data sources and tree levels."
52
+ OpenID Connect Configuration,Open configuration of OpenID connect single sign on options. You can configure difffernt providers with all their configuration options as well as defining a default provider.
53
+ Pimcore Copilot Configuration,"Open configuration of copilot actions. There you can manage existing actions, create new actions, as well as importing, exporting and cloning the actions. Further, you can open an action configuration and configure all details."
54
+ Pimcore Copilot Job Runs,"Open an overview of all job runs executed by copilot. There you also see all the outputs, errors as well as you have the option to restart an action. "
55
+ Workflow Designer,"Open the workflow designer which allows the configuration of workflows. There you can manage existing workflows, create new workflows, as well as importing, exporting and cloning the workflows. Further, you can open an workflow configuration and configure all details and define the workflow in a visual editor."
56
+ Analyze Permissions,Open a tool to analyse all permissions a certain user has
57
+ Classes,"Open configuration of data object classes. You can manage existing classes, create new classes, as well as importing and exporting the class definitions. Further you can open the details of a data object class definition and define all settings as well as its attributes. "
58
+ Classification Store,"Open configuration of data object classification stores. You can manage stores, create new stores. Further you can define the details of a classification store with all its keys, groups and collections."
59
+ All Caches (Symfony + Data),Clear all caches of the system
60
+ Clear Full Page Cache,Clear the full page cache of the system
61
+ Clear temporary files,"Clear all temporary files like thumbnails, generated pdfs and similar content. "
62
+ Data Cache,Clear the data cache of Pimcore.
63
+ Document Types,"Open configuration of document types. There you can manage existing document types, create new document types and define their details. "
64
+ Tag Configuration,Open configuration of the tags tree. There you can manage existing tags and create new tags.
65
+ Field-Collections,"Open configuration of data object field collections. You can manage existing field collections, create new field collections, as well as importing and exporting the definitions. Further you can open the details of a field collection and define all settings as well as its attributes. "
66
+ Appearance & Branding,"Open configuration for customizing appearance of Pimcore backend which allows to define some colors, defining a logo, custom background image and others. "
67
+ Icon Library,"Open the icon library shipped with Pimcore to select custom icons for classes, reports, alternative element trees, etc. "
68
+ Objectbricks,"Open configuration of data object object bricks. You can manage existing object bricks, create new object bricks, as well as importing and exporting the definitions. Further you can open the details of an object brick and define all settings as well as its attributes. "
69
+ Perspectives / Views,"Open configuration of perspectives and custom views. There you can manage existing configurations, create new configurations and define their details. "
70
+ Web-to-Print Settings,Open configuration of web-to-print renderers. You can configure settings for different providers with all their configuration options.
71
+ Predefined Properties,"Open configuration of predefined properties. There you can manage existing predefined properties, create new predefined properties and define their details. "
72
+ Quantity Value,"Open configuration of quantity units which can be used in quantity value data types. There you can manage existing units, create new ones and define conversions between these ones. "
73
+ Roles,"Open configuration of user roles. There you can manage existing user roles, create user roles and define their details. "
74
+ Static Routes,"Open configuration of static routes. There you can manage existing static routes, create new static routes and define their details. "
75
+ Select Options,"Open configuration of select options that can be used select and multi-select data types. There you can manage existing option groups, create new option groups and define all the select options. "
76
+ System Settings,"Open configuration of system settings like system languages, debugging settings, domains and error messages as well as versioning settings of data objects, documents and assets. "
77
+ Image Thumbnails,"Open configuration of image thumbnails. There you can manage existing thumbnails, create new thumbnails and define their details. "
78
+ Admin Translations,"Define translations for the Pimcore Admin UI, like labels of data object field labels, etc. "
79
+ Users,"Open configuration of users. There you can manage existing users, create users and define their details. "
80
+ Video Thumbnails,"Open configuration of video thumbnails. There you can manage existing thumbnails, create new thumbnails and define their details. "
81
+ Website Settings,Define translations for keys used for output channels like websites etc.
82
+ Asset Metadata Class Definitions,"Open configuration of asset metadata definitions. You can manage existing classes, create new classes, as well as importing and exporting the class definitions. Further you can open the details of a data object class definition and define all settings as well as its attributes. "
docker-compose.yaml CHANGED
@@ -5,8 +5,13 @@ services:
5
  ports:
6
  - 7860:7860
7
  environment:
8
- - foo=bar
9
  develop:
10
  watch:
11
  - action: rebuild
12
- path: .
 
 
 
 
 
 
5
  ports:
6
  - 7860:7860
7
  environment:
8
+ - OPENAI_ACCESS_TOKEN=sk-proj-IUn0yvSWvAtvqI92scssT3BlbkFJD9paDN9rtiaN39T142fd
9
  develop:
10
  watch:
11
  - action: rebuild
12
+ path: .
13
+ volumes:
14
+ - python-cache:/home/user/.cache
15
+
16
+ volumes:
17
+ python-cache:
requirements.txt CHANGED
@@ -2,3 +2,9 @@ fastapi==0.111.*
2
  requests==2.*
3
  uvicorn[standard]==0.30.*
4
  torch
 
 
 
 
 
 
 
2
  requests==2.*
3
  uvicorn[standard]==0.30.*
4
  torch
5
+ lancedb
6
+ sentence-transformers
7
+ OpenAI
8
+ accelerate
9
+ pandas
10
+ numpy
src/main.py CHANGED
@@ -2,24 +2,40 @@ import os
2
  import requests
3
  import torch
4
 
5
- #from typing import Optional
 
 
6
 
7
  from fastapi import FastAPI, Header, HTTPException, BackgroundTasks
8
  from fastapi.responses import FileResponse
9
- #from huggingface_hub.hf_api import HfApi
10
 
11
- #from .models import config, WebhookPayload
12
 
13
- #WEBHOOK_SECRET = os.getenv("WEBHOOK_SECRET")
14
- #HF_ACCESS_TOKEN = os.getenv("HF_ACCESS_TOKEN")
15
- #AUTOTRAIN_API_URL = "https://api.autotrain.huggingface.co"
16
- #AUTOTRAIN_UI_URL = "https://ui.autotrain.huggingface.co"
17
 
 
 
18
 
19
- app = FastAPI()
 
 
 
 
 
 
 
20
 
21
- @app.get("/")
22
- async def home():
 
 
 
 
 
 
 
 
 
 
23
 
24
  gpu = 'GPU not available'
25
  if torch.cuda.is_available():
@@ -28,6 +44,5 @@ async def home():
28
  else:
29
  print("GPU is not available")
30
 
31
- print('hello world')
32
- print(os.getenv("foo"))
33
- return {'success': True, 'response': 'hello world 3', 'gpu': gpu}
 
2
  import requests
3
  import torch
4
 
5
+ from .vector_db import VectorDB
6
+ from .open_ai_connector import OpenAIConnector
7
+ from .parameters import *
8
 
9
  from fastapi import FastAPI, Header, HTTPException, BackgroundTasks
10
  from fastapi.responses import FileResponse
 
11
 
 
12
 
13
+ app = FastAPI()
 
 
 
14
 
15
+ vector_db = VectorDB(emb_model, db_location, full_actions_list_file_path, num_sub_vectors, batch_size)
16
+ open_ai_connector = OpenAIConnector()
17
 
18
+ @app.get("/find-action")
19
+ async def find_action(query: str):
20
+
21
+ #data = vector_db.get_embedding_db_as_pandas()
22
+ #print(data)
23
+
24
+
25
+ prefiltered_names, prefiltered_descriptions = vector_db.retrieve_prefiltered_hits(query, K)
26
 
27
+ print('prefiltered list')
28
+ print(prefiltered_names)
29
+
30
+ print('start query openAI')
31
+ response = open_ai_connector.query_open_ai(query, prefiltered_names, prefiltered_descriptions)
32
+ print(response)
33
+
34
+ return {'success': True, 'query': query, 'response': response}
35
+
36
+
37
+ @app.get("/gpu_check")
38
+ async def gpu_check():
39
 
40
  gpu = 'GPU not available'
41
  if torch.cuda.is_available():
 
44
  else:
45
  print("GPU is not available")
46
 
47
+ return {'success': True, 'response': 'hello world 3', 'gpu': gpu}
48
+
 
src/open_ai_connector.py ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ from openai import OpenAI
3
+ from io import StringIO
4
+ import json
5
+ import os
6
+
7
+ class OpenAIConnector:
8
+
9
+ OPENAI_ACCESS_TOKEN = os.getenv("OPENAI_ACCESS_TOKEN")
10
+
11
+ def generate_llm_system_message(self, prefiltered_names, prefiltered_descriptions):
12
+
13
+ #print(prefiltered_names)
14
+
15
+ actions_list = pd.DataFrame({
16
+ 'action': prefiltered_names,
17
+ 'descriptions': prefiltered_descriptions
18
+ })
19
+
20
+ csv_buffer = StringIO()
21
+ actions_list.to_csv(csv_buffer, index=False)
22
+
23
+
24
+ system_message = "following is a csv list of actions and their descriptions: \n"
25
+ system_message += csv_buffer.getvalue()
26
+ system_message += "\n\n"
27
+ system_message += "find me all best fitting actions for the user request and order them by match. please just consider these actions and nothing else, but there might be multiple fitting actions.\n"
28
+ system_message += 'return the actions just in form of a json with action name and short reasoning, no additional text around, no formatting, etc.: [{ "action": "Icon Library", "reason": "Shows you a list of icons you can use in Pimcore configurations." }].\n'
29
+ system_message += 'also state when there is no fitting action for the request with a json like [{"action": null, "reason": "no fitting action found"}].'
30
+
31
+ return system_message
32
+
33
+ def query_open_ai(self, query, prefiltered_names, prefiltered_descriptions):
34
+ client = OpenAI(api_key=self.OPENAI_ACCESS_TOKEN)
35
+ system_message = self.generate_llm_system_message(prefiltered_names, prefiltered_descriptions)
36
+ messages = [{"role": "system", "content": system_message},{"role": "user", "content": query}]
37
+ #print(messages)
38
+
39
+ response = client.chat.completions.create(
40
+ model="gpt-4o",
41
+ messages=messages,
42
+ )
43
+ response_message = response.choices[0].message
44
+
45
+ return json.loads(response_message.content)
src/parameters.py ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ emb_model = "BAAI/bge-large-en-v1.5"
2
+ full_actions_list_file_path = "./data/action_descriptions.csv"
3
+ db_location = ".lancedb"
4
+
5
+ num_sub_vectors = 128 # number of sub-vectors for index
6
+ batch_size = 32 # batch size for embedding model
7
+
8
+ K = 20
src/vector_db.py ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import AutoConfig
2
+ from sentence_transformers import SentenceTransformer
3
+ import lancedb
4
+ import torch
5
+ import pyarrow as pa
6
+ import pandas as pd
7
+ import numpy as np
8
+ import tqdm
9
+
10
+ class VectorDB:
11
+
12
+ vector_column = "vector"
13
+ description_column = "description"
14
+ name_column = "name"
15
+ table_name = "pimcore_actions"
16
+ emb_model = ''
17
+ db_location = ''
18
+
19
+ def __init__(self, emb_model, db_location, actions_list_file_path, num_sub_vectors, batch_size):
20
+ self.emb_model = emb_model
21
+ self.db_location = db_location
22
+
23
+ emb_config = AutoConfig.from_pretrained(emb_model)
24
+ emb_dimension = emb_config.hidden_size
25
+
26
+ assert emb_dimension % num_sub_vectors == 0, \
27
+ "Embedding size must be divisible by the num of sub vectors"
28
+
29
+ print('Model loaded...')
30
+ print(emb_model)
31
+
32
+ model = SentenceTransformer(emb_model)
33
+ model.eval()
34
+
35
+ if torch.backends.mps.is_available():
36
+ device = "mps"
37
+ elif torch.cuda.is_available():
38
+ device = "cuda"
39
+ else:
40
+ device = "cpu"
41
+
42
+ print(f"Device: {device}")
43
+
44
+ db = lancedb.connect(db_location)
45
+
46
+ schema = pa.schema(
47
+ [
48
+ pa.field(self.vector_column, pa.list_(pa.float32(), emb_dimension)),
49
+ pa.field(self.description_column, pa.string()),
50
+ pa.field(self.name_column, pa.string())
51
+ ]
52
+ )
53
+ tbl = db.create_table(self.table_name, schema=schema, mode="overwrite")
54
+
55
+
56
+ df = pd.read_csv(actions_list_file_path)
57
+ sentences = df.values
58
+
59
+ print("Starting vector generation")
60
+ for i in tqdm.tqdm(range(0, int(np.ceil(len(sentences) / batch_size)))):
61
+ try:
62
+ batch = [sent for sent in sentences[i * batch_size:(i + 1) * batch_size] if len(sent) > 0]
63
+
64
+ to_encode = [entry[1] for entry in batch]
65
+ names = [entry[0] for entry in batch]
66
+
67
+ encoded = model.encode(to_encode, normalize_embeddings=True, device=device)
68
+ encoded = [list(vec) for vec in encoded]
69
+
70
+ df = pd.DataFrame({
71
+ self.vector_column: encoded,
72
+ self.description_column: to_encode,
73
+ self.name_column: names
74
+ })
75
+
76
+ tbl.add(df)
77
+ except:
78
+ print(f"batch {i} was skipped")
79
+ print("Vector generation done.")
80
+
81
+
82
+ def get_embedding_db_as_pandas(self):
83
+ db = lancedb.connect(self.db_location)
84
+ tbl = db.open_table(self.table_name)
85
+ return tbl.to_pandas()
86
+
87
+
88
+
89
+ def retrieve_prefiltered_hits(self, query, k):
90
+ db = lancedb.connect(".lancedb")
91
+ table = db.open_table(self.table_name)
92
+ retriever = SentenceTransformer(self.emb_model)
93
+
94
+ query_vec = retriever.encode(query)
95
+ documents = table.search(query_vec, vector_column_name=self.vector_column).limit(k).to_list()
96
+ names = [doc[self.name_column] for doc in documents]
97
+ descriptions = [doc[self.description_column] for doc in documents]
98
+
99
+ return names, descriptions