Joe Shamon commited on
Commit
5b1b407
·
unverified ·
1 Parent(s): 4ef96ee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -2
README.md CHANGED
@@ -1,2 +1,67 @@
1
- # code-chunker
2
- An open-source tool for intelligently chunking and parsing code files, enhancing readability and maintainability by organizing code around key points of interest.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ '''
2
+ # Code Chunker & Parser
3
+
4
+ The Code Chunker & Parser is a novel open-source tool designed to enhance code readability and maintainability by intelligently chunking code files based on key points of interest. This tool leverages advanced parsing techniques to identify significant elements in your code, such as functions, classes, and comments, to organize your codebase into manageable, easily understandable chunks. It's an invaluable resource for developers looking to optimize their code for better collaboration and efficiency.
5
+
6
+ ## Features
7
+
8
+ - **Intelligent Chunking:** Break down your code files into chunks around key points of interest like function definitions, class declarations, and crucial comments.
9
+ - **Customizable Token Limits:** Control the size of each chunk with customizable token limits, ensuring that chunks remain manageable and focused.
10
+ - **Support for Multiple Languages:** Initially supporting Python, JavaScript, and CSS, with plans to expand to more programming languages.
11
+
12
+ ## Getting Started
13
+
14
+ ### Prerequisites
15
+
16
+ - Python 3.8+
17
+ - OpenAI API key (for token counting features)
18
+
19
+ ### Installation
20
+
21
+ 1. Clone the repository:
22
+ ```sh
23
+ git clone https://github.com/yourgithubusername/code-chunker-parser.git
24
+ ```
25
+
26
+ 2. Navigate to the project directory
27
+ ```sh
28
+ pip install -r requirements.txt
29
+ ```
30
+ 4. Install the required dependencies
31
+ ```sh
32
+ pip install -r requirements.txt
33
+ ```
34
+ ## Usage
35
+ 1. Chunking a Code File:
36
+ Use the CodeChunker class to chunk a specific code file. You can specify the file extension and token limit for chunking.
37
+ Example:
38
+ ```py
39
+ from backend.app.util.TextChunker.Chunker import CodeChunker
40
+
41
+ chunker = CodeChunker(file_extension='py', encoding_name='gpt-4')
42
+ chunks = chunker.chunk(your_code_here, token_limit=1000)
43
+ CodeChunker.print_chunks(chunks)
44
+ ```
45
+ 2. Parsing Code for Points of Interest:
46
+
47
+ The CodeParser class allows you to parse code to identify points of interest and comments, which can then be used for chunking or other analysis.
48
+ Example:
49
+ ```
50
+ from backend.app.util.CodeParsing.CodeParser import CodeParser
51
+
52
+ parser = CodeParser(['py'])
53
+ tree = parser.parse_code(your_code_here, 'py')
54
+ points_of_interest = parser.extract_points_of_interest(tree, 'py')
55
+ ```
56
+
57
+ ## Contributing
58
+ We welcome contributions from the community, whether it's through reporting bugs, submitting feature requests, or sending pull requests. Please check the CONTRIBUTING.md file for more details on how to contribute to the project.
59
+
60
+ ## License
61
+ This project is licensed under the Apache 2.0 license. See the License file for details
62
+
63
+ ## Acknowledgments
64
+ - This project utilizes the tree-sitter project for parsing code.
65
+ - This also uses tiktoken to count tokens for determining chunk sizes.
66
+
67
+