Spaces:
Running
Running
Merge pull request #5 from CintraAI/enhancement/updated-header
Browse files
app.py
CHANGED
@@ -55,7 +55,20 @@ def get_language_by_extension(file_extension):
|
|
55 |
|
56 |
language = get_language_by_extension(file_extension)
|
57 |
|
58 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
|
60 |
original_col, chunked_col = st.columns(2)
|
61 |
|
|
|
55 |
|
56 |
language = get_language_by_extension(file_extension)
|
57 |
|
58 |
+
st.write("""
|
59 |
+
### Choose Chunk Size Target""")
|
60 |
+
token_chunk_size = st.number_input('Target Chunk Size Target', min_value=5, max_value=1000, value=25, help="The token limit guides the chunk size in tokens (tiktoken, gpt-4), aiming for readability without enforcing a strict upper limit.")
|
61 |
+
|
62 |
+
with st.expander("Learn more about the chunk size target"):
|
63 |
+
st.markdown("""
|
64 |
+
The `token_limit` parameter in the `chunk` function serves as a guideline to optimize the size of code chunks produced. It is not a hard limit but rather an ideal target, attempting to achieve a balance between chunk size and maintaining logical coherence within the code.
|
65 |
+
|
66 |
+
- **Adherence to Logical Breakpoints:** The chunking logic respects logical breakpoints in the code, ensuring that chunks are coherent and maintain readability.
|
67 |
+
- **Flexibility in Chunk Size:** Chunks might be slightly smaller or larger than the specified `token_limit` to avoid breaking the code in the middle of logical sections.
|
68 |
+
- **Handling Final Chunks:** The last chunk of code captures any remaining code, which may vary significantly in size depending on the remaining code's structure.
|
69 |
+
|
70 |
+
This approach allows for flexibility in how code is segmented into chunks, emphasizing the balance between readable, logical code segments and size constraints.
|
71 |
+
""")
|
72 |
|
73 |
original_col, chunked_col = st.columns(2)
|
74 |
|