Spaces:

MilesCranmer
/

PySR

Running

App Files Files Community

MilesCranmer commited on Sep 2, 2022

Commit

b4fc2d4

1 Parent(s): f887ed1

Revert "Remove newlines which break docs building"

Browse files

This reverts commit bdf365a834456e1a372b1b47bbf8c496c7f4b406.

Files changed (1) hide show

pysr/sr.py +72 -0

pysr/sr.py CHANGED Viewed

@@ -236,40 +236,52 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
         - `"best"` selects the candidate model with the highest score
           among expressions with a loss better than at least 1.5x the
           most accurate model.
     binary_operators : list[str], default=["+", "-", "*", "/"]
         List of strings giving the binary operators in Julia's Base.
     unary_operators : list[str], default=None
         Same as :param`binary_operators` but for operators taking a
         single scalar.
     niterations : int, default=40
         Number of iterations of the algorithm to run. The best
         equations are printed and migrate between populations at the
         end of each iteration.
     populations : int, default=15
         Number of populations running.
     population_size : int, default=33
         Number of individuals in each population.
     max_evals : int, default=None
         Limits the total number of evaluations of expressions to
         this number.
     maxsize : int, default=20
         Max complexity of an equation.
     maxdepth : int, default=None
         Max depth of an equation. You can use both :param`maxsize` and
         :param`maxdepth`. :param`maxdepth` is by default not used.
     warmup_maxsize_by : float, default=0.0
         Whether to slowly increase max size from a small number up to
         the maxsize (if greater than 0).  If greater than 0, says the
         fraction of training time at which the current maxsize will
         reach the user-passed maxsize.
     timeout_in_seconds : float, default=None
         Make the search return early once this many seconds have passed.
     constraints : dict[str, int | tuple[int,int]], default=None
         Dictionary of int (unary) or 2-tuples (binary), this enforces
         maxsize constraints on the individual arguments of operators.
         E.g., `'pow': (-1, 1)` says that power laws can have any
         complexity left argument, but only 1 complexity in the right
         argument. Use this to force more interpretable solutions.
     nested_constraints : dict[str, dict], default=None
         Specifies how many times a combination of operators can be
         nested. For example, `{"sin": {"cos": 0}}, "cos": {"cos": 2}}`
@@ -286,6 +298,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
         operators, you only need to provide a single number: both
         arguments are treated the same way, and the max of each
         argument is constrained.
     loss : str, default="L2DistLoss()"
         String of Julia code specifying the loss function. Can either
         be a loss from LossFunctions.jl, or your own loss written as a
@@ -301,6 +314,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
         `L1HingeLoss()`, `SmoothedL1HingeLoss(γ)`,
         `ModifiedHuberLoss()`, `L2MarginLoss()`, `ExpLoss()`,
         `SigmoidLoss()`, `DWDMarginLoss(q)`.
     complexity_of_operators : dict[str, float], default=None
         If you would like to use a complexity other than 1 for an
         operator, specify the complexity here. For example,
@@ -309,156 +323,210 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
         the `+` operator (which is the default). You may specify real
         numbers for a complexity, and the total complexity of a tree
         will be rounded to the nearest integer after computing.
     complexity_of_constants : float, default=1
         Complexity of constants.
     complexity_of_variables : float, default=1
         Complexity of variables.
     parsimony : float, default=0.0032
         Multiplicative factor for how much to punish complexity.
     use_frequency : bool, default=True
         Whether to measure the frequency of complexities, and use that
         instead of parsimony to explore equation space. Will naturally
         find equations of all complexities.
     use_frequency_in_tournament : bool, default=True
         Whether to use the frequency mentioned above in the tournament,
         rather than just the simulated annealing.
     alpha : float, default=0.1
         Initial temperature for simulated annealing
         (requires :param`annealing` to be `True`).
     annealing : bool, default=False
         Whether to use annealing.
     early_stop_condition : { float | str }, default=None
         Stop the search early if this loss is reached. You may also
         pass a string containing a Julia function which
         takes a loss and complexity as input, for example:
         `"f(loss, complexity) = (loss < 0.1) && (complexity < 10)"`.
     ncyclesperiteration : int, default=550
         Number of total mutations to run, per 10 samples of the
         population, per iteration.
     fraction_replaced : float, default=0.000364
         How much of population to replace with migrating equations from
         other populations.
     fraction_replaced_hof : float, default=0.035
         How much of population to replace with migrating equations from
         hall of fame.
     weight_add_node : float, default=0.79
         Relative likelihood for mutation to add a node.
     weight_insert_node : float, default=5.1
         Relative likelihood for mutation to insert a node.
     weight_delete_node : float, default=1.7
         Relative likelihood for mutation to delete a node.
     weight_do_nothing : float, default=0.21
         Relative likelihood for mutation to leave the individual.
     weight_mutate_constant : float, default=0.048
         Relative likelihood for mutation to change the constant slightly
         in a random direction.
     weight_mutate_operator : float, default=0.47
         Relative likelihood for mutation to swap an operator.
     weight_randomize : float, default=0.00023
         Relative likelihood for mutation to completely delete and then
         randomly generate the equation
     weight_simplify : float, default=0.0020
         Relative likelihood for mutation to simplify constant parts by evaluation
     crossover_probability : float, default=0.066
         Absolute probability of crossover-type genetic operation, instead of a mutation.
     skip_mutation_failures : bool, default=True
         Whether to skip mutation and crossover failures, rather than
         simply re-sampling the current member.
     migration : bool, default=True
         Whether to migrate.
     hof_migration : bool, default=True
         Whether to have the hall of fame migrate.
     topn : int, default=12
         How many top individuals migrate from each population.
     should_optimize_constants : bool, default=True
         Whether to numerically optimize constants (Nelder-Mead/Newton)
         at the end of each iteration.
     optimizer_algorithm : str, default="BFGS"
         Optimization scheme to use for optimizing constants. Can currently
         be `NelderMead` or `BFGS`.
     optimizer_nrestarts : int, default=2
         Number of time to restart the constants optimization process with
         different initial conditions.
     optimize_probability : float, default=0.14
         Probability of optimizing the constants during a single iteration of
         the evolutionary algorithm.
     optimizer_iterations : int, default=8
         Number of iterations that the constants optimizer can take.
     perturbation_factor : float, default=0.076
         Constants are perturbed by a max factor of
         (perturbation_factor*T + 1). Either multiplied by this or
         divided by this.
     tournament_selection_n : int, default=10
         Number of expressions to consider in each tournament.
     tournament_selection_p : float, default=0.86
         Probability of selecting the best expression in each
         tournament. The probability will decay as p*(1-p)^n for other
         expressions, sorted by loss.
     procs : int, default=multiprocessing.cpu_count()
         Number of processes (=number of populations running).
     multithreading : bool, default=True
         Use multithreading instead of distributed backend.
         Using procs=0 will turn off both.
     cluster_manager : str, default=None
         For distributed computing, this sets the job queue system. Set
         to one of "slurm", "pbs", "lsf", "sge", "qrsh", "scyld", or
         "htc". If set to one of these, PySR will run in distributed
         mode, and use `procs` to figure out how many processes to launch.
     batching : bool, default=False
         Whether to compare population members on small batches during
         evolution. Still uses full dataset for comparing against hall
         of fame.
     batch_size : int, default=50
         The amount of data to use if doing batching.
     fast_cycle : bool, default=False (experimental)
         Batch over population subsamples. This is a slightly different
         algorithm than regularized evolution, but does cycles 15%
         faster. May be algorithmically less efficient.
     precision : int, default=32
         What precision to use for the data. By default this is 32
         (float32), but you can select 64 or 16 as well.
     random_state : int, Numpy RandomState instance or None, default=None
         Pass an int for reproducible results across multiple function calls.
         See :term:`Glossary <random_state>`.
     deterministic : bool, default=False
         Make a PySR search give the same result every run.
         To use this, you must turn off parallelism
         (with :param`procs`=0, :param`multithreading`=False),
         and set :param`random_state` to a fixed seed.
     warm_start : bool, default=False
         Tells fit to continue from where the last call to fit finished.
         If false, each call to fit will be fresh, overwriting previous results.
     verbosity : int, default=1e9
         What verbosity level to use. 0 means minimal print statements.
     update_verbosity : int, default=None
         What verbosity level to use for package updates.
         Will take value of :param`verbosity` if not given.
     progress : bool, default=True
         Whether to use a progress bar instead of printing to stdout.
     equation_file : str, default=None
         Where to save the files (.csv extension).
     temp_equation_file : bool, default=False
         Whether to put the hall of fame file in the temp directory.
         Deletion is then controlled with the :param`delete_tempfiles`
         parameter.
     tempdir : str, default=None
         directory for the temporary files.
     delete_tempfiles : bool, default=True
         Whether to delete the temporary files after finishing.
     julia_project : str, default=None
         A Julia environment location containing a Project.toml
         (and potentially the source code for SymbolicRegression.jl).
         Default gives the Python package directory, where a
         Project.toml file should be present from the install.
     update: bool, default=True
         Whether to automatically update Julia packages.
     output_jax_format : bool, default=False
         Whether to create a 'jax_format' column in the output,
         containing jax-callable functions and the default parameters in
         a jax array.
     output_torch_format : bool, default=False
         Whether to create a 'torch_format' column in the output,
         containing a torch module with trainable parameters.
     extra_sympy_mappings : dict[str, Callable], default=None
         Provides mappings between custom :param`binary_operators` or
         :param`unary_operators` defined in julia strings, to those same
@@ -466,19 +534,23 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
         E.G if `unary_operators=["inv(x)=1/x"]`, then for the fitted
         model to be export to sympy, :param`extra_sympy_mappings`
         would be `{"inv": lambda x: 1/x}`.
     extra_jax_mappings : dict[Callable, str], default=None
         Similar to :param`extra_sympy_mappings` but for model export
         to jax. The dictionary maps sympy functions to jax functions.
         For example: `extra_jax_mappings={sympy.sin: "jnp.sin"}` maps
         the `sympy.sin` function to the equivalent jax expression `jnp.sin`.
     extra_torch_mappings : dict[Callable, Callable], default=None
         The same as :param`extra_jax_mappings` but for model export
         to pytorch. Note that the dictionary keys should be callable
         pytorch expressions.
         For example: `extra_torch_mappings={sympy.sin: torch.sin}`
     denoise : bool, default=False
         Whether to use a Gaussian Process to denoise the data before
         inputting to PySR. Can help PySR fit noisy data.
     select_k_features : int, default=None
          whether to run feature selection in Python using random forests,
          before passing to the symbolic regression code. None means no

         - `"best"` selects the candidate model with the highest score
           among expressions with a loss better than at least 1.5x the
           most accurate model.
     binary_operators : list[str], default=["+", "-", "*", "/"]
         List of strings giving the binary operators in Julia's Base.
     unary_operators : list[str], default=None
         Same as :param`binary_operators` but for operators taking a
         single scalar.
     niterations : int, default=40
         Number of iterations of the algorithm to run. The best
         equations are printed and migrate between populations at the
         end of each iteration.
     populations : int, default=15
         Number of populations running.
     population_size : int, default=33
         Number of individuals in each population.
     max_evals : int, default=None
         Limits the total number of evaluations of expressions to
         this number.
     maxsize : int, default=20
         Max complexity of an equation.
     maxdepth : int, default=None
         Max depth of an equation. You can use both :param`maxsize` and
         :param`maxdepth`. :param`maxdepth` is by default not used.
     warmup_maxsize_by : float, default=0.0
         Whether to slowly increase max size from a small number up to
         the maxsize (if greater than 0).  If greater than 0, says the
         fraction of training time at which the current maxsize will
         reach the user-passed maxsize.
     timeout_in_seconds : float, default=None
         Make the search return early once this many seconds have passed.
     constraints : dict[str, int | tuple[int,int]], default=None
         Dictionary of int (unary) or 2-tuples (binary), this enforces
         maxsize constraints on the individual arguments of operators.
         E.g., `'pow': (-1, 1)` says that power laws can have any
         complexity left argument, but only 1 complexity in the right
         argument. Use this to force more interpretable solutions.
     nested_constraints : dict[str, dict], default=None
         Specifies how many times a combination of operators can be
         nested. For example, `{"sin": {"cos": 0}}, "cos": {"cos": 2}}`
         operators, you only need to provide a single number: both
         arguments are treated the same way, and the max of each
         argument is constrained.
     loss : str, default="L2DistLoss()"
         String of Julia code specifying the loss function. Can either
         be a loss from LossFunctions.jl, or your own loss written as a
         `L1HingeLoss()`, `SmoothedL1HingeLoss(γ)`,
         `ModifiedHuberLoss()`, `L2MarginLoss()`, `ExpLoss()`,
         `SigmoidLoss()`, `DWDMarginLoss(q)`.
     complexity_of_operators : dict[str, float], default=None
         If you would like to use a complexity other than 1 for an
         operator, specify the complexity here. For example,
         the `+` operator (which is the default). You may specify real
         numbers for a complexity, and the total complexity of a tree
         will be rounded to the nearest integer after computing.
     complexity_of_constants : float, default=1
         Complexity of constants.
     complexity_of_variables : float, default=1
         Complexity of variables.
     parsimony : float, default=0.0032
         Multiplicative factor for how much to punish complexity.
     use_frequency : bool, default=True
         Whether to measure the frequency of complexities, and use that
         instead of parsimony to explore equation space. Will naturally
         find equations of all complexities.
     use_frequency_in_tournament : bool, default=True
         Whether to use the frequency mentioned above in the tournament,
         rather than just the simulated annealing.
     alpha : float, default=0.1
         Initial temperature for simulated annealing
         (requires :param`annealing` to be `True`).
     annealing : bool, default=False
         Whether to use annealing.
     early_stop_condition : { float | str }, default=None
         Stop the search early if this loss is reached. You may also
         pass a string containing a Julia function which
         takes a loss and complexity as input, for example:
         `"f(loss, complexity) = (loss < 0.1) && (complexity < 10)"`.
     ncyclesperiteration : int, default=550
         Number of total mutations to run, per 10 samples of the
         population, per iteration.
     fraction_replaced : float, default=0.000364
         How much of population to replace with migrating equations from
         other populations.
     fraction_replaced_hof : float, default=0.035
         How much of population to replace with migrating equations from
         hall of fame.
     weight_add_node : float, default=0.79
         Relative likelihood for mutation to add a node.
     weight_insert_node : float, default=5.1
         Relative likelihood for mutation to insert a node.
     weight_delete_node : float, default=1.7
         Relative likelihood for mutation to delete a node.
     weight_do_nothing : float, default=0.21
         Relative likelihood for mutation to leave the individual.
     weight_mutate_constant : float, default=0.048
         Relative likelihood for mutation to change the constant slightly
         in a random direction.
     weight_mutate_operator : float, default=0.47
         Relative likelihood for mutation to swap an operator.
     weight_randomize : float, default=0.00023
         Relative likelihood for mutation to completely delete and then
         randomly generate the equation
     weight_simplify : float, default=0.0020
         Relative likelihood for mutation to simplify constant parts by evaluation
     crossover_probability : float, default=0.066
         Absolute probability of crossover-type genetic operation, instead of a mutation.
     skip_mutation_failures : bool, default=True
         Whether to skip mutation and crossover failures, rather than
         simply re-sampling the current member.
     migration : bool, default=True
         Whether to migrate.
     hof_migration : bool, default=True
         Whether to have the hall of fame migrate.
     topn : int, default=12
         How many top individuals migrate from each population.
     should_optimize_constants : bool, default=True
         Whether to numerically optimize constants (Nelder-Mead/Newton)
         at the end of each iteration.
     optimizer_algorithm : str, default="BFGS"
         Optimization scheme to use for optimizing constants. Can currently
         be `NelderMead` or `BFGS`.
     optimizer_nrestarts : int, default=2
         Number of time to restart the constants optimization process with
         different initial conditions.
     optimize_probability : float, default=0.14
         Probability of optimizing the constants during a single iteration of
         the evolutionary algorithm.
     optimizer_iterations : int, default=8
         Number of iterations that the constants optimizer can take.
     perturbation_factor : float, default=0.076
         Constants are perturbed by a max factor of
         (perturbation_factor*T + 1). Either multiplied by this or
         divided by this.
     tournament_selection_n : int, default=10
         Number of expressions to consider in each tournament.
     tournament_selection_p : float, default=0.86
         Probability of selecting the best expression in each
         tournament. The probability will decay as p*(1-p)^n for other
         expressions, sorted by loss.
     procs : int, default=multiprocessing.cpu_count()
         Number of processes (=number of populations running).
     multithreading : bool, default=True
         Use multithreading instead of distributed backend.
         Using procs=0 will turn off both.
     cluster_manager : str, default=None
         For distributed computing, this sets the job queue system. Set
         to one of "slurm", "pbs", "lsf", "sge", "qrsh", "scyld", or
         "htc". If set to one of these, PySR will run in distributed
         mode, and use `procs` to figure out how many processes to launch.
     batching : bool, default=False
         Whether to compare population members on small batches during
         evolution. Still uses full dataset for comparing against hall
         of fame.
     batch_size : int, default=50
         The amount of data to use if doing batching.
     fast_cycle : bool, default=False (experimental)
         Batch over population subsamples. This is a slightly different
         algorithm than regularized evolution, but does cycles 15%
         faster. May be algorithmically less efficient.
     precision : int, default=32
         What precision to use for the data. By default this is 32
         (float32), but you can select 64 or 16 as well.
     random_state : int, Numpy RandomState instance or None, default=None
         Pass an int for reproducible results across multiple function calls.
         See :term:`Glossary <random_state>`.
     deterministic : bool, default=False
         Make a PySR search give the same result every run.
         To use this, you must turn off parallelism
         (with :param`procs`=0, :param`multithreading`=False),
         and set :param`random_state` to a fixed seed.
     warm_start : bool, default=False
         Tells fit to continue from where the last call to fit finished.
         If false, each call to fit will be fresh, overwriting previous results.
     verbosity : int, default=1e9
         What verbosity level to use. 0 means minimal print statements.
     update_verbosity : int, default=None
         What verbosity level to use for package updates.
         Will take value of :param`verbosity` if not given.
     progress : bool, default=True
         Whether to use a progress bar instead of printing to stdout.
     equation_file : str, default=None
         Where to save the files (.csv extension).
     temp_equation_file : bool, default=False
         Whether to put the hall of fame file in the temp directory.
         Deletion is then controlled with the :param`delete_tempfiles`
         parameter.
     tempdir : str, default=None
         directory for the temporary files.
     delete_tempfiles : bool, default=True
         Whether to delete the temporary files after finishing.
     julia_project : str, default=None
         A Julia environment location containing a Project.toml
         (and potentially the source code for SymbolicRegression.jl).
         Default gives the Python package directory, where a
         Project.toml file should be present from the install.
     update: bool, default=True
         Whether to automatically update Julia packages.
     output_jax_format : bool, default=False
         Whether to create a 'jax_format' column in the output,
         containing jax-callable functions and the default parameters in
         a jax array.
     output_torch_format : bool, default=False
         Whether to create a 'torch_format' column in the output,
         containing a torch module with trainable parameters.
     extra_sympy_mappings : dict[str, Callable], default=None
         Provides mappings between custom :param`binary_operators` or
         :param`unary_operators` defined in julia strings, to those same
         E.G if `unary_operators=["inv(x)=1/x"]`, then for the fitted
         model to be export to sympy, :param`extra_sympy_mappings`
         would be `{"inv": lambda x: 1/x}`.
     extra_jax_mappings : dict[Callable, str], default=None
         Similar to :param`extra_sympy_mappings` but for model export
         to jax. The dictionary maps sympy functions to jax functions.
         For example: `extra_jax_mappings={sympy.sin: "jnp.sin"}` maps
         the `sympy.sin` function to the equivalent jax expression `jnp.sin`.
     extra_torch_mappings : dict[Callable, Callable], default=None
         The same as :param`extra_jax_mappings` but for model export
         to pytorch. Note that the dictionary keys should be callable
         pytorch expressions.
         For example: `extra_torch_mappings={sympy.sin: torch.sin}`
     denoise : bool, default=False
         Whether to use a Gaussian Process to denoise the data before
         inputting to PySR. Can help PySR fit noisy data.
     select_k_features : int, default=None
          whether to run feature selection in Python using random forests,
          before passing to the symbolic regression code. None means no