Spaces:

sklearn-docs
/

text-feature-extraction-evaluation

Sleeping

App Files Files Community

dominguesm commited on May 18, 2023

Commit

3e3d92e

1 Parent(s): 3a4d722

Reformulação do exemplo e atualização das descrições

Browse files

Files changed (6) hide show

app.py +46 -60
descriptions/parameter_grid/alpha.md +1 -1
descriptions/parameter_grid/max_df.md +1 -1
descriptions/parameter_grid/min_df.md +1 -1
descriptions/parameter_grid/ngram_range.md +1 -1
descriptions/parameter_grid/norm.md +6 -1

app.py CHANGED Viewed

@@ -189,67 +189,53 @@ with gr.Blocks(theme=gr.themes.Soft()) as app:
                 interactive=True,
             )
     with gr.Row():
-        with gr.Column():
-            gr.Markdown("""## PARAMETERS GRID""")
             gr.Markdown(load_description("description_parameter_grid"))
-            with gr.Column():
-                gr.Markdown("""### Classifier Alpha""")
-                gr.Markdown(load_description("parameter_grid/alpha"))
-                clf__alpha = gr.Textbox(
-                    label="clf__alpha",
-                    value="1.e-06, 1.e-05, 1.e-04",
-                    info="Due to practical considerations, this parameter was kept constant.",
-                    interactive=False,
-                )
-            with gr.Column():
-                gr.Markdown("""### Vectorizer max_df""")
-                gr.Markdown(load_description("parameter_grid/max_df"))
-                vect__max_df = gr.Textbox(
-                    label="vect__max_df",
-                    value="0.2, 0.4, 0.6, 0.8, 1.0",
-                    info="Values ranging from 0 to 1.0, separated by a comma.",
-                    interactive=True,
-                )
-            with gr.Column():
-                gr.Markdown("""### Vectorizer min_df""")
-                gr.Markdown(load_description("parameter_grid/min_df"))
-                vect__min_df = gr.Textbox(
-                    label="vect__min_df",
-                    value="1, 3, 5, 10",
-                    info="Values ranging from 0 to 1.0, separated by a comma, or integers separated by a comma. If float, the parameter represents a proportion of documents, integer absolute counts.",
-                    interactive=True,
-                )
-            with gr.Column():
-                gr.Markdown("""### Vectorizer ngram_range""")
-                gr.Markdown(load_description("parameter_grid/ngram_range"))
-                vect__ngram_range = gr.Textbox(
-                    label="vect__ngram_range",
-                    value="(1, 1), (1, 2)",
-                    info="""Tuples of integer values separated by a comma. For example an ``ngram_range`` of ``(1, 1)`` means only unigrams, ``(1, 2)`` means unigrams and bigrams, and ``(2, 2)`` means only bigrams.""",
-                    interactive=True,
-                )
-            with gr.Column():
-                gr.Markdown("""### Vectorizer norm""")
-                gr.Markdown(load_description("parameter_grid/norm"))
-                gr.Markdown(
-                    """- 'l2': Sum of squares of vector elements is 1. The cosine
-                            similarity between two vectors is their dot product when l2 norm has
-                            been applied.
-                            - 'l1': Sum of absolute values of vector elements is 1."""
-                )
-                vect__norm = gr.Textbox(
-                    label="vect__norm",
-                    value="l1, l2",
-                    info="'l1' or 'l2', separated by a comma",
-                    interactive=True,
-                )
     with gr.Row():
         gr.Markdown(

                 interactive=True,
             )
     with gr.Row():
+        with gr.Tab("PARAMETERS GRID"):
             gr.Markdown(load_description("description_parameter_grid"))
+            with gr.Row():
+                with gr.Column():
+                    clf__alpha = gr.Textbox(
+                        label="Classifier Alpha (clf__alpha)",
+                        value="1.e-06, 1.e-05, 1.e-04",
+                        info="Due to practical considerations, this parameter was kept constant.",
+                        interactive=False,
+                    )
+                    vect__max_df = gr.Textbox(
+                        label="Vectorizer max_df (vect__max_df)",
+                        value="0.2, 0.4, 0.6, 0.8, 1.0",
+                        info="Values ranging from 0 to 1.0, separated by a comma.",
+                        interactive=True,
+                    )
+                    vect__min_df = gr.Textbox(
+                        label="Vectorizer min_df (vect__min_df)",
+                        value="1, 3, 5, 10",
+                        info="Values ranging from 0 to 1.0, separated by a comma, or integers separated by a comma. If float, the parameter represents a proportion of documents, integer absolute counts.",
+                        interactive=True,
+                    )
+                with gr.Column():
+                    vect__ngram_range = gr.Textbox(
+                        label="Vectorizer ngram_range (vect__ngram_range)",
+                        value="(1, 1), (1, 2)",
+                        info="""Tuples of integer values separated by a comma. For example an `ngram_range` of `(1, 1)` means only unigrams, `(1, 2)` means unigrams and bigrams, and `(2, 2)` means only bigrams.""",
+                        interactive=True,
+                    )
+                    vect__norm = gr.Textbox(
+                        label="Vectorizer norm (vect__norm)",
+                        value="l1, l2",
+                        info="'l1' or 'l2', separated by a comma",
+                        interactive=True,
+                    )
+        with gr.Tab("DESCRIPTION OF PARAMETERS"):
+            gr.Markdown("""### Classifier Alpha""")
+            gr.Markdown(load_description("parameter_grid/alpha"))
+            gr.Markdown("""### Vectorizer max_df""")
+            gr.Markdown(load_description("parameter_grid/max_df"))
+            gr.Markdown("""### Vectorizer min_df""")
+            gr.Markdown(load_description("parameter_grid/min_df"))
+            gr.Markdown("""### Vectorizer ngram_range""")
+            gr.Markdown(load_description("parameter_grid/ngram_range"))
+            gr.Markdown("""### Vectorizer norm""")
+            gr.Markdown(load_description("parameter_grid/norm"))
     with gr.Row():
         gr.Markdown(

descriptions/parameter_grid/alpha.md CHANGED Viewed

	@@ -1 +1 @@
1	- The ~~value of~~ "alpha" adds a constant ~~amount~~ to the occurrence counters of features, ensuring that even unobserved feature values have a non-zero probability. Smaller values of "alpha" result in weaker smoothing, while larger values increase the level of smoothing. The default value is 1.0, which applies Laplace smoothing, but it can be adjusted based on the model's requirements.


1	+ The "alpha" parameter adds a constant value to the occurrence counters of features, ensuring that even unobserved feature values have a non-zero probability. Smaller values of "alpha" result in weaker smoothing, while larger values increase the level of smoothing. The default value is 1.0, which applies Laplace smoothing, but it can be adjusted based on the model's requirements.

descriptions/parameter_grid/max_df.md CHANGED Viewed

	@@ -1 +1 @@
1	- The "max_df" parameter of TfidfVectorizer in scikit-learn is used to set an upper limit on the term frequency within a document, ~~where~~ ~~terms~~ that occur more frequently than the specified value are ignored during the vectorization process.


1	+ The "max_df" parameter of TfidfVectorizer in scikit-learn is used to set an upper limit on the term frequency within a document. Terms that occur more frequently than the specified value are ignored during the vectorization process.

descriptions/parameter_grid/min_df.md CHANGED Viewed

	@@ -1 +1 @@
1	- The "min_df" parameter of TfidfVectorizer in scikit-learn is used to set a lower limit on the term frequency within a document, ~~where~~ ~~terms~~ that occur less frequently than the specified value are ignored during the vectorization process.


1	+ The "min_df" parameter of TfidfVectorizer in scikit-learn is used to set a lower limit on the term frequency within a document. Terms that occur less frequently than the specified value are ignored during the vectorization process.

descriptions/parameter_grid/ngram_range.md CHANGED Viewed

	@@ -1 +1 @@
1	- The "ngram_range" parameter of TfidfVectorizer in scikit-learn is used to specify the range of n-grams (contiguous sequences of n words) to ~~consider~~ during the vectorization process. It defines the lower and upper bounds for the n-gram sizes that will be included in the feature representation.


1	+ The "ngram_range" parameter of TfidfVectorizer in scikit-learn is used to specify the range of n-grams (contiguous sequences of n words) considered during the vectorization process. It defines the lower and upper bounds for the n-gram sizes that will be included in the feature representation.

descriptions/parameter_grid/norm.md CHANGED Viewed

	@@ -1 +1,6 @@
1	- The "norm" parameter of TfidfVectorizer in scikit-learn is used to specify the normalization method applied to the resulting TF-IDF vectors. It controls whether the vectors should be normalized to have unit norm (L2 normalization) or left unnormalized (None).

+The "norm" parameter of TfidfVectorizer in scikit-learn is used to specify the normalization method applied to the resulting TF-IDF vectors. It controls whether the vectors should be normalized to have unit norm (L2 normalization) or left unnormalized (None).
+```
+- 'l2': The sum of squares of vector elements is 1. The cosine similarity between two vectors is their dot product when the L2 norm has been applied.
+- 'l1': The sum of the absolute values of vector elements is 1.
+```