Spaces:
Running
Running
MilesCranmer
commited on
Commit
•
87880d1
1
Parent(s):
8685680
Deprecate ncyclesperiteration -> ncycles_per_iteration
Browse files- README.md +1 -1
- docs/options.md +2 -2
- docs/tuning.md +2 -2
- pysr/deprecated.py +1 -0
- pysr/param_groupings.yml +1 -1
- pysr/sr.py +5 -5
- pysr/test/params.py +1 -1
- pysr/test/test.py +2 -2
- pysr/test/test_warm_start.py +1 -1
README.md
CHANGED
@@ -297,7 +297,7 @@ model = PySRRegressor(
|
|
297 |
# ^ 2 populations per core, so one is always running.
|
298 |
population_size=50,
|
299 |
# ^ Slightly larger populations, for greater diversity.
|
300 |
-
|
301 |
# ^ Generations between migrations.
|
302 |
niterations=10000000, # Run forever
|
303 |
early_stop_condition=(
|
|
|
297 |
# ^ 2 populations per core, so one is always running.
|
298 |
population_size=50,
|
299 |
# ^ Slightly larger populations, for greater diversity.
|
300 |
+
ncycles_per_iteration=500,
|
301 |
# ^ Generations between migrations.
|
302 |
niterations=10000000, # Run forever
|
303 |
early_stop_condition=(
|
docs/options.md
CHANGED
@@ -78,11 +78,11 @@ with the equations.
|
|
78 |
Each cycle considers every 10-equation subsample (re-sampled for each individual 10,
|
79 |
unless `fast_cycle` is set in which case the subsamples are separate groups of equations)
|
80 |
a single time, producing one mutated equation for each.
|
81 |
-
The parameter `
|
82 |
occurs before the equations are compared to the hall of fame,
|
83 |
and new equations are migrated from the hall of fame, or from other populations.
|
84 |
It also controls how slowly annealing occurs. You may find that increasing
|
85 |
-
`
|
86 |
worker needs to reduce and distribute new equations less often, and also increases
|
87 |
diversity. But at the same
|
88 |
time, a smaller number it might be that migrating equations from the hall of fame helps
|
|
|
78 |
Each cycle considers every 10-equation subsample (re-sampled for each individual 10,
|
79 |
unless `fast_cycle` is set in which case the subsamples are separate groups of equations)
|
80 |
a single time, producing one mutated equation for each.
|
81 |
+
The parameter `ncycles_per_iteration` defines how many times this
|
82 |
occurs before the equations are compared to the hall of fame,
|
83 |
and new equations are migrated from the hall of fame, or from other populations.
|
84 |
It also controls how slowly annealing occurs. You may find that increasing
|
85 |
+
`ncycles_per_iteration` results in a higher cycles-per-second, as the head
|
86 |
worker needs to reduce and distribute new equations less often, and also increases
|
87 |
diversity. But at the same
|
88 |
time, a smaller number it might be that migrating equations from the hall of fame helps
|
docs/tuning.md
CHANGED
@@ -14,12 +14,12 @@ I run from IPython (Jupyter Notebooks don't work as well[^1]) on the head node o
|
|
14 |
2. Use only the operators I think it needs and no more.
|
15 |
3. Increase `populations` to `3*num_cores`.
|
16 |
4. If my dataset is more than 1000 points, I either subsample it (low-dimensional and not much noise) or set `batching=True` (high-dimensional or very noisy, so it needs to evaluate on all the data).
|
17 |
-
5. While on a laptop or single node machine, you might leave the default `
|
18 |
6. Set `constraints` and `nested_constraints` as strict as possible. These can help quite a bit with exploration. Typically, if I am using `pow`, I would set `constraints={"pow": (9, 1)}`, so that power laws can only have a variable or constant as their exponent. If I am using `sin` and `cos`, I also like to set `nested_constraints={"sin": {"sin": 0, "cos": 0}, "cos": {"sin": 0, "cos": 0}}`, so that sin and cos can't be nested, which seems to happen frequently. (Although in practice I would just use `sin`, since the search could always add a phase offset!)
|
19 |
7. Set `maxsize` a bit larger than the final size you want. e.g., if you want a final equation of size `30`, you might set this to `35`, so that it has a bit of room to explore.
|
20 |
8. I typically don't use `maxdepth`, but if I do, I set it strictly, while also leaving a bit of room for exploration. e.g., if you want a final equation limited to a depth of `5`, you might set this to `6` or `7`, so that it has a bit of room to explore.
|
21 |
9. Set `parsimony` equal to about the minimum loss you would expect, divided by 5-10. e.g., if you expect the final equation to have a loss of `0.001`, you might set `parsimony=0.0001`.
|
22 |
-
10. Set `weight_optimize` to some larger value, maybe `0.001`. This is very important if `
|
23 |
11. Set `turbo` to `True`. This may or not work, if there's an error just turn it off (some operators are not SIMD-capable). If it does work, it should give you a nice 20% speedup.
|
24 |
12. For final runs, after I have tuned everything, I typically set `niterations` to some very large value, and just let it run for a week until my job finishes (genetic algorithms tend not to converge, they can look like they settle down, but then find a new family of expression, and explore a new space). If I am satisfied with the current equations (which are visible either in the terminal or in the saved csv file), I quit the job early.
|
25 |
|
|
|
14 |
2. Use only the operators I think it needs and no more.
|
15 |
3. Increase `populations` to `3*num_cores`.
|
16 |
4. If my dataset is more than 1000 points, I either subsample it (low-dimensional and not much noise) or set `batching=True` (high-dimensional or very noisy, so it needs to evaluate on all the data).
|
17 |
+
5. While on a laptop or single node machine, you might leave the default `ncycles_per_iteration`, on a cluster with ~100 cores I like to set `ncycles_per_iteration` to maybe `5000` or so, until the head node occupation is under `10%`. (A larger value means the workers talk less frequently to eachother, which is useful when you have many workers!)
|
18 |
6. Set `constraints` and `nested_constraints` as strict as possible. These can help quite a bit with exploration. Typically, if I am using `pow`, I would set `constraints={"pow": (9, 1)}`, so that power laws can only have a variable or constant as their exponent. If I am using `sin` and `cos`, I also like to set `nested_constraints={"sin": {"sin": 0, "cos": 0}, "cos": {"sin": 0, "cos": 0}}`, so that sin and cos can't be nested, which seems to happen frequently. (Although in practice I would just use `sin`, since the search could always add a phase offset!)
|
19 |
7. Set `maxsize` a bit larger than the final size you want. e.g., if you want a final equation of size `30`, you might set this to `35`, so that it has a bit of room to explore.
|
20 |
8. I typically don't use `maxdepth`, but if I do, I set it strictly, while also leaving a bit of room for exploration. e.g., if you want a final equation limited to a depth of `5`, you might set this to `6` or `7`, so that it has a bit of room to explore.
|
21 |
9. Set `parsimony` equal to about the minimum loss you would expect, divided by 5-10. e.g., if you expect the final equation to have a loss of `0.001`, you might set `parsimony=0.0001`.
|
22 |
+
10. Set `weight_optimize` to some larger value, maybe `0.001`. This is very important if `ncycles_per_iteration` is large, so that optimization happens more frequently.
|
23 |
11. Set `turbo` to `True`. This may or not work, if there's an error just turn it off (some operators are not SIMD-capable). If it does work, it should give you a nice 20% speedup.
|
24 |
12. For final runs, after I have tuned everything, I typically set `niterations` to some very large value, and just let it run for a week until my job finishes (genetic algorithms tend not to converge, they can look like they settle down, but then find a new family of expression, and explore a new space). If I am satisfied with the current equations (which are visible either in the terminal or in the saved csv file), I quit the job early.
|
25 |
|
pysr/deprecated.py
CHANGED
@@ -79,6 +79,7 @@ def make_deprecated_kwargs_for_pysr_regressor():
|
|
79 |
warmupMaxsizeBy => warmup_maxsize_by
|
80 |
useFrequency => use_frequency
|
81 |
useFrequencyInTournament => use_frequency_in_tournament
|
|
|
82 |
"""
|
83 |
# Turn this into a dict:
|
84 |
deprecated_kwargs = {}
|
|
|
79 |
warmupMaxsizeBy => warmup_maxsize_by
|
80 |
useFrequency => use_frequency
|
81 |
useFrequencyInTournament => use_frequency_in_tournament
|
82 |
+
ncyclesperiteration => ncycles_per_iteration
|
83 |
"""
|
84 |
# Turn this into a dict:
|
85 |
deprecated_kwargs = {}
|
pysr/param_groupings.yml
CHANGED
@@ -8,7 +8,7 @@
|
|
8 |
- niterations
|
9 |
- populations
|
10 |
- population_size
|
11 |
-
-
|
12 |
- The Objective:
|
13 |
- loss
|
14 |
- full_objective
|
|
|
8 |
- niterations
|
9 |
- populations
|
10 |
- population_size
|
11 |
+
- ncycles_per_iteration
|
12 |
- The Objective:
|
13 |
- loss
|
14 |
- full_objective
|
pysr/sr.py
CHANGED
@@ -354,7 +354,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
354 |
takes a loss and complexity as input, for example:
|
355 |
`"f(loss, complexity) = (loss < 0.1) && (complexity < 10)"`.
|
356 |
Default is `None`.
|
357 |
-
|
358 |
Number of total mutations to run, per 10 samples of the
|
359 |
population, per iteration.
|
360 |
Default is `550`.
|
@@ -398,7 +398,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
398 |
Constant optimization can also be performed as a mutation, in addition to
|
399 |
the normal strategy controlled by `optimize_probability` which happens
|
400 |
every iteration. Using it as a mutation is useful if you want to use
|
401 |
-
a large `
|
402 |
Default is `0.0`.
|
403 |
crossover_probability : float
|
404 |
Absolute probability of crossover-type genetic operation, instead of a mutation.
|
@@ -688,7 +688,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
688 |
alpha: float = 0.1,
|
689 |
annealing: bool = False,
|
690 |
early_stop_condition: Optional[Union[float, str]] = None,
|
691 |
-
|
692 |
fraction_replaced: float = 0.000364,
|
693 |
fraction_replaced_hof: float = 0.035,
|
694 |
weight_add_node: float = 0.79,
|
@@ -756,7 +756,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
756 |
self.niterations = niterations
|
757 |
self.populations = populations
|
758 |
self.population_size = population_size
|
759 |
-
self.
|
760 |
# - Equation Constraints
|
761 |
self.maxsize = maxsize
|
762 |
self.maxdepth = maxdepth
|
@@ -1652,7 +1652,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
1652 |
use_frequency_in_tournament=self.use_frequency_in_tournament,
|
1653 |
adaptive_parsimony_scaling=self.adaptive_parsimony_scaling,
|
1654 |
npop=self.population_size,
|
1655 |
-
ncycles_per_iteration=self.
|
1656 |
fraction_replaced=self.fraction_replaced,
|
1657 |
topn=self.topn,
|
1658 |
print_precision=self.print_precision,
|
|
|
354 |
takes a loss and complexity as input, for example:
|
355 |
`"f(loss, complexity) = (loss < 0.1) && (complexity < 10)"`.
|
356 |
Default is `None`.
|
357 |
+
ncycles_per_iteration : int
|
358 |
Number of total mutations to run, per 10 samples of the
|
359 |
population, per iteration.
|
360 |
Default is `550`.
|
|
|
398 |
Constant optimization can also be performed as a mutation, in addition to
|
399 |
the normal strategy controlled by `optimize_probability` which happens
|
400 |
every iteration. Using it as a mutation is useful if you want to use
|
401 |
+
a large `ncycles_periteration`, and may not optimize very often.
|
402 |
Default is `0.0`.
|
403 |
crossover_probability : float
|
404 |
Absolute probability of crossover-type genetic operation, instead of a mutation.
|
|
|
688 |
alpha: float = 0.1,
|
689 |
annealing: bool = False,
|
690 |
early_stop_condition: Optional[Union[float, str]] = None,
|
691 |
+
ncycles_per_iteration: int = 550,
|
692 |
fraction_replaced: float = 0.000364,
|
693 |
fraction_replaced_hof: float = 0.035,
|
694 |
weight_add_node: float = 0.79,
|
|
|
756 |
self.niterations = niterations
|
757 |
self.populations = populations
|
758 |
self.population_size = population_size
|
759 |
+
self.ncycles_per_iteration = ncycles_per_iteration
|
760 |
# - Equation Constraints
|
761 |
self.maxsize = maxsize
|
762 |
self.maxdepth = maxdepth
|
|
|
1652 |
use_frequency_in_tournament=self.use_frequency_in_tournament,
|
1653 |
adaptive_parsimony_scaling=self.adaptive_parsimony_scaling,
|
1654 |
npop=self.population_size,
|
1655 |
+
ncycles_per_iteration=self.ncycles_per_iteration,
|
1656 |
fraction_replaced=self.fraction_replaced,
|
1657 |
topn=self.topn,
|
1658 |
print_precision=self.print_precision,
|
pysr/test/params.py
CHANGED
@@ -5,4 +5,4 @@ from .. import PySRRegressor
|
|
5 |
DEFAULT_PARAMS = inspect.signature(PySRRegressor.__init__).parameters
|
6 |
DEFAULT_NITERATIONS = DEFAULT_PARAMS["niterations"].default
|
7 |
DEFAULT_POPULATIONS = DEFAULT_PARAMS["populations"].default
|
8 |
-
DEFAULT_NCYCLES = DEFAULT_PARAMS["
|
|
|
5 |
DEFAULT_PARAMS = inspect.signature(PySRRegressor.__init__).parameters
|
6 |
DEFAULT_NITERATIONS = DEFAULT_PARAMS["niterations"].default
|
7 |
DEFAULT_POPULATIONS = DEFAULT_PARAMS["populations"].default
|
8 |
+
DEFAULT_NCYCLES = DEFAULT_PARAMS["ncycles_per_iteration"].default
|
pysr/test/test.py
CHANGED
@@ -224,7 +224,7 @@ class TestPipeline(unittest.TestCase):
|
|
224 |
# Test if repeated fit works:
|
225 |
regressor.set_params(
|
226 |
niterations=1,
|
227 |
-
|
228 |
warm_start=True,
|
229 |
early_stop_condition=None,
|
230 |
)
|
@@ -661,7 +661,7 @@ class TestMiscellaneous(unittest.TestCase):
|
|
661 |
model = PySRRegressor(
|
662 |
niterations=int(1 + DEFAULT_NITERATIONS / 10),
|
663 |
populations=int(1 + DEFAULT_POPULATIONS / 3),
|
664 |
-
|
665 |
verbosity=0,
|
666 |
progress=False,
|
667 |
random_state=0,
|
|
|
224 |
# Test if repeated fit works:
|
225 |
regressor.set_params(
|
226 |
niterations=1,
|
227 |
+
ncycles_per_iteration=2,
|
228 |
warm_start=True,
|
229 |
early_stop_condition=None,
|
230 |
)
|
|
|
661 |
model = PySRRegressor(
|
662 |
niterations=int(1 + DEFAULT_NITERATIONS / 10),
|
663 |
populations=int(1 + DEFAULT_POPULATIONS / 3),
|
664 |
+
ncycles_per_iteration=int(2 + DEFAULT_NCYCLES / 10),
|
665 |
verbosity=0,
|
666 |
progress=False,
|
667 |
random_state=0,
|
pysr/test/test_warm_start.py
CHANGED
@@ -78,7 +78,7 @@ class TestWarmStart(unittest.TestCase):
|
|
78 |
model.warm_start = True
|
79 |
model.niterations = 0
|
80 |
model.max_evals = 0
|
81 |
-
model.
|
82 |
|
83 |
model.fit(X, y)
|
84 |
|
|
|
78 |
model.warm_start = True
|
79 |
model.niterations = 0
|
80 |
model.max_evals = 0
|
81 |
+
model.ncycles_per_iteration = 0
|
82 |
|
83 |
model.fit(X, y)
|
84 |
|