Joe Shamon commited on
Commit
a4fd9a7
·
unverified ·
2 Parent(s): c72577f 81d8b21

Merge pull request #5 from CintraAI/enhancement/updated-header

Browse files
Files changed (1) hide show
  1. app.py +14 -1
app.py CHANGED
@@ -55,7 +55,20 @@ def get_language_by_extension(file_extension):
55
 
56
  language = get_language_by_extension(file_extension)
57
 
58
- token_chunk_size = st.number_input('Chunk Size Target Measured in Tokens (tiktoken, gpt-4)', min_value=5, max_value=1000, value=25)
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
  original_col, chunked_col = st.columns(2)
61
 
 
55
 
56
  language = get_language_by_extension(file_extension)
57
 
58
+ st.write("""
59
+ ### Choose Chunk Size Target""")
60
+ token_chunk_size = st.number_input('Target Chunk Size Target', min_value=5, max_value=1000, value=25, help="The token limit guides the chunk size in tokens (tiktoken, gpt-4), aiming for readability without enforcing a strict upper limit.")
61
+
62
+ with st.expander("Learn more about the chunk size target"):
63
+ st.markdown("""
64
+ The `token_limit` parameter in the `chunk` function serves as a guideline to optimize the size of code chunks produced. It is not a hard limit but rather an ideal target, attempting to achieve a balance between chunk size and maintaining logical coherence within the code.
65
+
66
+ - **Adherence to Logical Breakpoints:** The chunking logic respects logical breakpoints in the code, ensuring that chunks are coherent and maintain readability.
67
+ - **Flexibility in Chunk Size:** Chunks might be slightly smaller or larger than the specified `token_limit` to avoid breaking the code in the middle of logical sections.
68
+ - **Handling Final Chunks:** The last chunk of code captures any remaining code, which may vary significantly in size depending on the remaining code's structure.
69
+
70
+ This approach allows for flexibility in how code is segmented into chunks, emphasizing the balance between readable, logical code segments and size constraints.
71
+ """)
72
 
73
  original_col, chunked_col = st.columns(2)
74