MilesCranmer commited on
Commit
8b49600
1 Parent(s): beecd14

Proper pydoc markdown format

Browse files
Files changed (1) hide show
  1. pysr/sr.py +61 -61
pysr/sr.py CHANGED
@@ -132,122 +132,122 @@ def pysr(X, y, weights=None,
132
 
133
  # Arguments
134
 
135
- X (np.ndarray/pandas.DataFrame): 2D array. Rows are examples,
136
- columns are features. If pandas DataFrame, the columns are used
137
  for variable names (so make sure they don't contain spaces).
138
- y (np.ndarray): 1D array (rows are examples) or 2D array (rows
139
- are examples, columns are outputs). Putting in a 2D array will
140
  trigger a search for equations for each feature of y.
141
- weights (np.ndarray): same shape as y. Each element is how to
142
- weight the mean-square-error loss for that particular element
143
  of y.
144
- binary_operators (list): List of strings giving the binary operators
145
  in Julia's Base. Default is ["+", "-", "*", "/",].
146
- unary_operators (list): Same but for operators taking a single scalar.
147
  Default is [].
148
  procs (int): Number of processes (=number of populations running).
149
- loss (str): String of Julia code specifying the loss function.
150
- Can either be a loss from LossFunctions.jl, or your own
151
- loss written as a function. Examples of custom written losses
152
- include: `myloss(x, y) = abs(x-y)` for non-weighted, or
153
- `myloss(x, y, w) = w*abs(x-y)` for weighted.
154
- Among the included losses, these are as follows. Regression:
155
- `LPDistLoss{P}()`, `L1DistLoss()`, `L2DistLoss()` (mean square),
156
- `LogitDistLoss()`, `HuberLoss(d)`, `L1EpsilonInsLoss(ϵ)`,
157
- `L2EpsilonInsLoss(ϵ)`, `PeriodicLoss(c)`, `QuantileLoss(τ)`.
158
- Classification: `ZeroOneLoss()`, `PerceptronLoss()`, `L1HingeLoss()`,
159
- `SmoothedL1HingeLoss(γ)`, `ModifiedHuberLoss()`, `L2MarginLoss()`,
160
  `ExpLoss()`, `SigmoidLoss()`, `DWDMarginLoss(q)`.
161
  populations (int): Number of populations running.
162
- niterations (int): Number of iterations of the algorithm to run. The best
163
- equations are printed, and migrate between populations, at the
164
  end of each.
165
- ncyclesperiteration (int): Number of total mutations to run, per 10
166
  samples of the population, per iteration.
167
  alpha (float): Initial temperature.
168
  annealing (bool): Whether to use annealing. You should (and it is default).
169
- fractionReplaced (float): How much of population to replace with migrating
170
  equations from other populations.
171
- fractionReplacedHof (float): How much of population to replace with migrating
172
  equations from hall of fame.
173
  npop (int): Number of individuals in each population
174
  parsimony (float): Multiplicative factor for how much to punish complexity.
175
  migration (bool): Whether to migrate.
176
  hofMigration (bool): Whether to have the hall of fame migrate.
177
- shouldOptimizeConstants (bool): Whether to numerically optimize
178
  constants (Nelder-Mead/Newton) at the end of each iteration.
179
  topn (int): How many top individuals migrate from each population.
180
- perturbationFactor (float): Constants are perturbed by a max
181
- factor of (perturbationFactor*T + 1). Either multiplied by this
182
  or divided by this.
183
  weightAddNode (float): Relative likelihood for mutation to add a node
184
  weightInsertNode (float): Relative likelihood for mutation to insert a node
185
  weightDeleteNode (float): Relative likelihood for mutation to delete a node
186
  weightDoNothing (float): Relative likelihood for mutation to leave the individual
187
- weightMutateConstant (float): Relative likelihood for mutation to change
188
  the constant slightly in a random direction.
189
- weightMutateOperator (float): Relative likelihood for mutation to swap
190
  an operator.
191
- weightRandomize (float): Relative likelihood for mutation to completely
192
  delete and then randomly generate the equation
193
- weightSimplify (float): Relative likelihood for mutation to simplify
194
  constant parts by evaluation
195
  timeout (float): Time in seconds to timeout search
196
  equation_file (str): Where to save the files (.csv separated by |)
197
  verbosity (int): What verbosity level to use. 0 means minimal print statements.
198
  progress (bool): Whether to use a progress bar instead of printing to stdout.
199
  maxsize (int): Max size of an equation.
200
- maxdepth (int): Max depth of an equation. You can use both maxsize and maxdepth.
201
  maxdepth is by default set to = maxsize, which means that it is redundant.
202
- fast_cycle (bool): (experimental) - batch over population subsamples. This
203
- is a slightly different algorithm than regularized evolution, but does cycles
204
  15% faster. May be algorithmically less efficient.
205
- variable_names (list): a list of names for the variables, other
206
  than "x0", "x1", etc.
207
- batching (bool): whether to compare population members on small batches
208
- during evolution. Still uses full dataset for comparing against
209
  hall of fame.
210
  batchSize (int): the amount of data to use if doing batching.
211
- select_k_features (None/int), whether to run feature selection in
212
- Python using random forests, before passing to the symbolic regression
213
- code. None means no feature selection; an int means select that many
214
  features.
215
- warmupMaxsizeBy (float): whether to slowly increase max size from
216
- a small number up to the maxsize (if greater than 0).
217
- If greater than 0, says the fraction of training time at which
218
  the current maxsize will reach the user-passed maxsize.
219
- constraints (dict): Dictionary of `int` (unary operators)
220
- or tuples of two `int`s (binary),
221
- this enforces maxsize constraints on the individual
222
- arguments of operators. e.g., `'pow': (-1, 1)`
223
- says that power laws can have any complexity left argument, but only
224
  1 complexity exponent. Use this to force more interpretable solutions.
225
- useFrequency (bool): whether to measure the frequency of complexities,
226
- and use that instead of parsimony to explore equation space. Will
227
  naturally find equations of all complexities.
228
  julia_optimization (int): Optimization level (0, 1, 2, 3)
229
  tempdir (str/None): directory for the temporary files
230
  delete_tempfiles (bool): whether to delete the temporary files after finishing
231
- julia_project (str/None): a Julia environment location containing
232
- a Project.toml (and potentially the source code for SymbolicRegression.jl).
233
- Default gives the Python package directory, where a Project.toml file
234
  should be present from the install.
235
- user_input (bool): Whether to ask for user input or not for installing (to
236
  be used for automated scripts). Will choose to install when asked.
237
  update (bool): Whether to automatically update Julia packages.
238
- temp_equation_file (bool): Whether to put the hall of fame file in
239
- the temp directory. Deletion is then controlled with the
240
  delete_tempfiles argument.
241
- output_jax_format (bool): Whether to create a 'jax_format' column in the output,
242
  containing jax-callable functions and the default parameters in a jax array.
243
- output_torch_format (bool): Whether to create a 'torch_format' column in the output,
244
  containing a torch module with trainable parameters.
245
 
246
  # Returns
247
 
248
- equations (pd.DataFrame/list): Results dataframe,
249
- giving complexity, MSE, and equations (as strings), as well as functional
250
- forms. If list, each element corresponds to a dataframe of equations
251
  for each output.
252
  """
253
  if binary_operators is None:
 
132
 
133
  # Arguments
134
 
135
+ X (np.ndarray/pandas.DataFrame): 2D array. Rows are examples, \
136
+ columns are features. If pandas DataFrame, the columns are used \
137
  for variable names (so make sure they don't contain spaces).
138
+ y (np.ndarray): 1D array (rows are examples) or 2D array (rows \
139
+ are examples, columns are outputs). Putting in a 2D array will \
140
  trigger a search for equations for each feature of y.
141
+ weights (np.ndarray): same shape as y. Each element is how to \
142
+ weight the mean-square-error loss for that particular element \
143
  of y.
144
+ binary_operators (list): List of strings giving the binary operators \
145
  in Julia's Base. Default is ["+", "-", "*", "/",].
146
+ unary_operators (list): Same but for operators taking a single scalar. \
147
  Default is [].
148
  procs (int): Number of processes (=number of populations running).
149
+ loss (str): String of Julia code specifying the loss function. \
150
+ Can either be a loss from LossFunctions.jl, or your own \
151
+ loss written as a function. Examples of custom written losses \
152
+ include: `myloss(x, y) = abs(x-y)` for non-weighted, or \
153
+ `myloss(x, y, w) = w*abs(x-y)` for weighted. \
154
+ Among the included losses, these are as follows. Regression: \
155
+ `LPDistLoss{P}()`, `L1DistLoss()`, `L2DistLoss()` (mean square), \
156
+ `LogitDistLoss()`, `HuberLoss(d)`, `L1EpsilonInsLoss(ϵ)`, \
157
+ `L2EpsilonInsLoss(ϵ)`, `PeriodicLoss(c)`, `QuantileLoss(τ)`. \
158
+ Classification: `ZeroOneLoss()`, `PerceptronLoss()`, `L1HingeLoss()`, \
159
+ `SmoothedL1HingeLoss(γ)`, `ModifiedHuberLoss()`, `L2MarginLoss()`, \
160
  `ExpLoss()`, `SigmoidLoss()`, `DWDMarginLoss(q)`.
161
  populations (int): Number of populations running.
162
+ niterations (int): Number of iterations of the algorithm to run. The best \
163
+ equations are printed, and migrate between populations, at the \
164
  end of each.
165
+ ncyclesperiteration (int): Number of total mutations to run, per 10 \
166
  samples of the population, per iteration.
167
  alpha (float): Initial temperature.
168
  annealing (bool): Whether to use annealing. You should (and it is default).
169
+ fractionReplaced (float): How much of population to replace with migrating \
170
  equations from other populations.
171
+ fractionReplacedHof (float): How much of population to replace with migrating \
172
  equations from hall of fame.
173
  npop (int): Number of individuals in each population
174
  parsimony (float): Multiplicative factor for how much to punish complexity.
175
  migration (bool): Whether to migrate.
176
  hofMigration (bool): Whether to have the hall of fame migrate.
177
+ shouldOptimizeConstants (bool): Whether to numerically optimize \
178
  constants (Nelder-Mead/Newton) at the end of each iteration.
179
  topn (int): How many top individuals migrate from each population.
180
+ perturbationFactor (float): Constants are perturbed by a max \
181
+ factor of (perturbationFactor*T + 1). Either multiplied by this \
182
  or divided by this.
183
  weightAddNode (float): Relative likelihood for mutation to add a node
184
  weightInsertNode (float): Relative likelihood for mutation to insert a node
185
  weightDeleteNode (float): Relative likelihood for mutation to delete a node
186
  weightDoNothing (float): Relative likelihood for mutation to leave the individual
187
+ weightMutateConstant (float): Relative likelihood for mutation to change \
188
  the constant slightly in a random direction.
189
+ weightMutateOperator (float): Relative likelihood for mutation to swap \
190
  an operator.
191
+ weightRandomize (float): Relative likelihood for mutation to completely \
192
  delete and then randomly generate the equation
193
+ weightSimplify (float): Relative likelihood for mutation to simplify \
194
  constant parts by evaluation
195
  timeout (float): Time in seconds to timeout search
196
  equation_file (str): Where to save the files (.csv separated by |)
197
  verbosity (int): What verbosity level to use. 0 means minimal print statements.
198
  progress (bool): Whether to use a progress bar instead of printing to stdout.
199
  maxsize (int): Max size of an equation.
200
+ maxdepth (int): Max depth of an equation. You can use both maxsize and maxdepth. \
201
  maxdepth is by default set to = maxsize, which means that it is redundant.
202
+ fast_cycle (bool): (experimental) - batch over population subsamples. This \
203
+ is a slightly different algorithm than regularized evolution, but does cycles \
204
  15% faster. May be algorithmically less efficient.
205
+ variable_names (list): a list of names for the variables, other \
206
  than "x0", "x1", etc.
207
+ batching (bool): whether to compare population members on small batches \
208
+ during evolution. Still uses full dataset for comparing against \
209
  hall of fame.
210
  batchSize (int): the amount of data to use if doing batching.
211
+ select_k_features (None/int), whether to run feature selection in \
212
+ Python using random forests, before passing to the symbolic regression \
213
+ code. None means no feature selection; an int means select that many \
214
  features.
215
+ warmupMaxsizeBy (float): whether to slowly increase max size from \
216
+ a small number up to the maxsize (if greater than 0). \
217
+ If greater than 0, says the fraction of training time at which \
218
  the current maxsize will reach the user-passed maxsize.
219
+ constraints (dict): Dictionary of `int` (unary operators) \
220
+ or tuples of two `int`s (binary), \
221
+ this enforces maxsize constraints on the individual \
222
+ arguments of operators. e.g., `'pow': (-1, 1)` \
223
+ says that power laws can have any complexity left argument, but only \
224
  1 complexity exponent. Use this to force more interpretable solutions.
225
+ useFrequency (bool): whether to measure the frequency of complexities, \
226
+ and use that instead of parsimony to explore equation space. Will \
227
  naturally find equations of all complexities.
228
  julia_optimization (int): Optimization level (0, 1, 2, 3)
229
  tempdir (str/None): directory for the temporary files
230
  delete_tempfiles (bool): whether to delete the temporary files after finishing
231
+ julia_project (str/None): a Julia environment location containing \
232
+ a Project.toml (and potentially the source code for SymbolicRegression.jl). \
233
+ Default gives the Python package directory, where a Project.toml file \
234
  should be present from the install.
235
+ user_input (bool): Whether to ask for user input or not for installing (to \
236
  be used for automated scripts). Will choose to install when asked.
237
  update (bool): Whether to automatically update Julia packages.
238
+ temp_equation_file (bool): Whether to put the hall of fame file in \
239
+ the temp directory. Deletion is then controlled with the \
240
  delete_tempfiles argument.
241
+ output_jax_format (bool): Whether to create a 'jax_format' column in the output, \
242
  containing jax-callable functions and the default parameters in a jax array.
243
+ output_torch_format (bool): Whether to create a 'torch_format' column in the output, \
244
  containing a torch module with trainable parameters.
245
 
246
  # Returns
247
 
248
+ equations (pd.DataFrame/list): Results dataframe, \
249
+ giving complexity, MSE, and equations (as strings), as well as functional \
250
+ forms. If list, each element corresponds to a dataframe of equations \
251
  for each output.
252
  """
253
  if binary_operators is None: