madhuroopa commited on
Commit
0c9a4dd
·
1 Parent(s): 234534d

added new application files

Browse files
Files changed (3) hide show
  1. config/.dummy_env +5 -0
  2. config/README.md +46 -0
  3. config/config.json +26 -0
config/.dummy_env ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ ## Create a .env file and fill in the values for the following keys
2
+ OPENAI_API_KEY = ""
3
+ PINECONE_API_KEY = ""
4
+ AWS_ACCESS_KEY = ""
5
+ AWS_SECRET_ACCESS_KEY = ""
config/README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Project Configuration
2
+
3
+ ## .env File
4
+
5
+ This file contains the necessary API keys required for the application to function properly. Obtain the API keys from the following sources:
6
+
7
+ - [OPENAI_API_KEY](https://platform.openai.com/api-keys)
8
+ - [PINECONE_API_KEY](https://app.pinecone.io/)
9
+ - [AWS_ACCESS_KEY](https://console.aws.amazon.com/)
10
+ - [AWS_SECRET_ACCESS_KEY](https://console.aws.amazon.com/)
11
+
12
+ ## config.json
13
+
14
+ This JSON file holds crucial configuration values for the entire application. Please refer to the documentation before modifying any configurations.
15
+
16
+ ### Pinecone Configuration
17
+
18
+ - **PINECONE_INDEX_NAME**: The name of the index, the highest-level organizational unit of vector data in Pinecone.
19
+ - **PINECONE_VECTOR_DIMENSION**: Dimensionality of the embedding model's vectors.
20
+ - **PINECONE_UPSERT_BATCH_LIMIT**: Number of transcript rows inserted into Pinecone Serverless in parallel.
21
+ - **PINECONE_TOP_K_RESULTS**: Number of results fetched by Pinecone for a query.
22
+ - **PINECONE_DELTA_WINDOW**: Conversation window size fetched for TOP_K results.
23
+ - **PINECONE_CLOUD_PROVIDER**: Cloud provider for Pinecone DB.
24
+ - **PINECONE_REGION**: Region of the Pinecone Cloud provider.
25
+ - **PINECONE_METRIC**: Distance metric used by Pinecone to calculate similarity.
26
+ - **PINECONE_NAMESPACE**: Logical separation inside the Pinecone Index.
27
+
28
+ ### Embedding Provider Configuration
29
+
30
+ - **EMBEDDING_PROVIDER**: Provider of the embedding model for text-to-vector conversion.
31
+ - **EMBEDDING_MODEL_NAME**: Name of the embedding model provided by the provider.
32
+
33
+ ### AWS Configuration
34
+
35
+ - **AWS_INPUT_BUCKET**: Bucket for storing audio files for AWS Transcribe.
36
+ - **AWS_OUTPUT_BUCKET**: Bucket collecting transcribed files.
37
+ - **AWS_REGION**: AWS region in use.
38
+ - **AWS_TRANSCRIBE_JOB_NAME**: Default name for Transcribe job.
39
+
40
+ ### LangChain Configuration
41
+
42
+ - **LC_LLM_TEMPERATURE**: Temperature value for the Large Language Model.
43
+ - **LC_CONV_BUFFER_MEMORY_WINDOW**: Conversation memory window limit. (Future Use)
44
+ - **LC_LLM_SUMMARY_MAX_TOKEN_LIMIT**: Maximum tokens allowed for summary in the memory buffer.
45
+ - **LC_LLM_MODEL**: Large Language Model used for inference.
46
+
config/config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "PINECONE_INDEX_NAME": "langchain-retrieval-transcript",
3
+ "PINECONE_VECTOR_DIMENSION": 3072,
4
+ "PINECONE_UPSERT_BATCH_LIMIT": 90,
5
+ "PINECONE_TOP_K_RESULTS": 10,
6
+ "PINECONE_DELTA_WINDOW": 2,
7
+ "PINECONE_CLOUD_PROVIDER": "aws",
8
+ "PINECONE_REGION": "us-west-2",
9
+ "PINECONE_METRIC": "cosine",
10
+ "PINECONE_NAMESPACE": "default_namespace",
11
+
12
+ "EMBEDDING_PROVIDER": "OpenAI",
13
+ "EMBEDDING_MODEL_NAME": "text-embedding-3-large",
14
+
15
+ "MASTER_JSON_FILENAME": "master_meeting_details",
16
+
17
+ "AWS_INPUT_BUCKET": "input-bucket",
18
+ "AWS_OUTPUT_BUCKET": "output-bucket",
19
+ "AWS_REGION": "us-east-2",
20
+ "AWS_TRANSCRIBE_JOB_NAME": "transcribe-job",
21
+
22
+ "LC_LLM_TEMPERATURE": 0.01,
23
+ "LC_CONV_BUFFER_MEMORY_WINDOW": 1,
24
+ "LC_LLM_SUMMARY_MAX_TOKEN_LIMIT": 650,
25
+ "LC_LLM_MODEL": "gpt-3.5-turbo"
26
+ }