Spaces:
Running
Running
MilesCranmer
commited on
Commit
•
012bfcc
1
Parent(s):
7fb9d91
Greatly improve readme
Browse files
README.md
CHANGED
@@ -11,74 +11,17 @@ For python, you need to have Python 3, numpy, and pandas installed.
|
|
11 |
|
12 |
## Running:
|
13 |
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
Here is the full list of arguments:
|
21 |
-
```
|
22 |
-
usage: eureqa.py [-h] [--threads THREADS] [--parsimony PARSIMONY]
|
23 |
-
[--alpha ALPHA] [--maxsize MAXSIZE]
|
24 |
-
[--niterations NITERATIONS] [--npop NPOP]
|
25 |
-
[--ncyclesperiteration NCYCLESPERITERATION] [--topn TOPN]
|
26 |
-
[--fractionReplacedHof FRACTIONREPLACEDHOF]
|
27 |
-
[--fractionReplaced FRACTIONREPLACED] [--migration MIGRATION]
|
28 |
-
[--hofMigration HOFMIGRATION]
|
29 |
-
[--shouldOptimizeConstants SHOULDOPTIMIZECONSTANTS]
|
30 |
-
[--annealing ANNEALING] [--equation_file EQUATION_FILE]
|
31 |
-
[--test TEST]
|
32 |
-
[--binary-operators BINARY_OPERATORS [BINARY_OPERATORS ...]]
|
33 |
-
[--unary-operators UNARY_OPERATORS]
|
34 |
-
|
35 |
-
optional arguments:
|
36 |
-
-h, --help show this help message and exit
|
37 |
-
--threads THREADS Number of threads (default: 4)
|
38 |
-
--parsimony PARSIMONY
|
39 |
-
How much to punish complexity (default: 0.001)
|
40 |
-
--alpha ALPHA Scaling of temperature (default: 10)
|
41 |
-
--maxsize MAXSIZE Max size of equation (default: 20)
|
42 |
-
--niterations NITERATIONS
|
43 |
-
Number of total migration periods (default: 20)
|
44 |
-
--npop NPOP Number of members per population (default: 100)
|
45 |
-
--ncyclesperiteration NCYCLESPERITERATION
|
46 |
-
Number of evolutionary cycles per migration (default:
|
47 |
-
5000)
|
48 |
-
--topn TOPN How many best species to distribute from each
|
49 |
-
population (default: 10)
|
50 |
-
--fractionReplacedHof FRACTIONREPLACEDHOF
|
51 |
-
Fraction of population to replace with hall of fame
|
52 |
-
(default: 0.1)
|
53 |
-
--fractionReplaced FRACTIONREPLACED
|
54 |
-
Fraction of population to replace with best from other
|
55 |
-
populations (default: 0.1)
|
56 |
-
--migration MIGRATION
|
57 |
-
Whether to migrate (default: True)
|
58 |
-
--hofMigration HOFMIGRATION
|
59 |
-
Whether to have hall of fame migration (default: True)
|
60 |
-
--shouldOptimizeConstants SHOULDOPTIMIZECONSTANTS
|
61 |
-
Whether to use classical optimization on constants
|
62 |
-
before every migration (doesn't impact performance
|
63 |
-
that much) (default: True)
|
64 |
-
--annealing ANNEALING
|
65 |
-
Whether to use simulated annealing (default: True)
|
66 |
-
--equation_file EQUATION_FILE
|
67 |
-
File to dump best equations to (default:
|
68 |
-
hall_of_fame.csv)
|
69 |
-
--test TEST Which test to run (default: simple1)
|
70 |
-
--binary-operators BINARY_OPERATORS [BINARY_OPERATORS ...]
|
71 |
-
Binary operators. Make sure they are defined in
|
72 |
-
operators.jl (default: ['plus', 'mult'])
|
73 |
-
--unary-operators UNARY_OPERATORS
|
74 |
-
Unary operators. Make sure they are defined in
|
75 |
-
operators.jl (default: ['exp', 'sin', 'cos'])
|
76 |
-
```
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
|
81 |
-
|
|
|
|
|
|
|
|
|
82 |
|
83 |
You can add more operators in `operators.jl`, or use default
|
84 |
Julia ones. Make sure all operators are defined for scalar `Float32`.
|
@@ -86,9 +29,61 @@ Then just specify the operator names in your call, as above.
|
|
86 |
You can also change the dataset learned on by passing in `X` and `y` as
|
87 |
numpy arrays to `eureqa(...)`.
|
88 |
|
89 |
-
|
90 |
-
|
91 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
92 |
|
93 |
# TODO
|
94 |
|
|
|
11 |
|
12 |
## Running:
|
13 |
|
14 |
+
What follows is the API reference for running the numpy interface.
|
15 |
+
Note that nearly all parameters here
|
16 |
+
have been tuned with ~1000 trials over several example
|
17 |
+
equations. However, you should adjust `threads`, `niterations`,
|
18 |
+
`binary_operators`, `unary_operators` to your requirements.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
+
The program will output a pandas DataFrame containing the equations,
|
21 |
+
mean square error, and complexity. It will also dump to a csv
|
22 |
+
at the end of every iteration,
|
23 |
+
which is `hall_of_fame.csv` by default. It also prints the
|
24 |
+
equations to stdout.
|
25 |
|
26 |
You can add more operators in `operators.jl`, or use default
|
27 |
Julia ones. Make sure all operators are defined for scalar `Float32`.
|
|
|
29 |
You can also change the dataset learned on by passing in `X` and `y` as
|
30 |
numpy arrays to `eureqa(...)`.
|
31 |
|
32 |
+
```python
|
33 |
+
eureqa(X=None, y=None, threads=4, niterations=20, ncyclesperiteration=int(default_ncyclesperiteration), binary_operators=["plus", "mult"], unary_operators=["cos", "exp", "sin"], alpha=default_alpha, annealing=True, fractionReplaced=default_fractionReplaced, fractionReplacedHof=default_fractionReplacedHof, npop=int(default_npop), parsimony=default_parsimony, migration=True, hofMigration=True, shouldOptimizeConstants=True, topn=int(default_topn), weightAddNode=default_weightAddNode, weightDeleteNode=default_weightDeleteNode, weightDoNothing=default_weightDoNothing, weightMutateConstant=default_weightMutateConstant, weightMutateOperator=default_weightMutateOperator, weightRandomize=default_weightRandomize, weightSimplify=default_weightSimplify, timeout=None, equation_file='hall_of_fame.csv', test='simple1', maxsize=20)
|
34 |
+
```
|
35 |
+
|
36 |
+
Run symbolic regression to fit f(X[i, :]) ~ y[i] for all i.
|
37 |
+
|
38 |
+
**Arguments**:
|
39 |
+
|
40 |
+
- `X`: np.ndarray, 2D array. Rows are examples, columns are features.
|
41 |
+
- `y`: np.ndarray, 1D array. Rows are examples.
|
42 |
+
- `threads`: int, Number of threads (=number of populations running).
|
43 |
+
You can have more threads than cores - it actually makes it more
|
44 |
+
efficient.
|
45 |
+
- `niterations`: int, Number of iterations of the algorithm to run. The best
|
46 |
+
equations are printed, and migrate between populations, at the
|
47 |
+
end of each.
|
48 |
+
- `ncyclesperiteration`: int, Number of total mutations to run, per 10
|
49 |
+
samples of the population, per iteration.
|
50 |
+
- `binary_operators`: list, List of strings giving the binary operators
|
51 |
+
in Julia's Base, or in `operator.jl`.
|
52 |
+
- `unary_operators`: list, Same but for operators taking a single `Float32`.
|
53 |
+
- `alpha`: float, Initial temperature.
|
54 |
+
- `annealing`: bool, Whether to use annealing. You should (and it is default).
|
55 |
+
- `fractionReplaced`: float, How much of population to replace with migrating
|
56 |
+
equations from other populations.
|
57 |
+
- `fractionReplacedHof`: float, How much of population to replace with migrating
|
58 |
+
equations from hall of fame.
|
59 |
+
- `npop`: int, Number of individuals in each population
|
60 |
+
- `parsimony`: float, Multiplicative factor for how much to punish complexity.
|
61 |
+
- `migration`: bool, Whether to migrate.
|
62 |
+
- `hofMigration`: bool, Whether to have the hall of fame migrate.
|
63 |
+
- `shouldOptimizeConstants`: bool, Whether to numerically optimize
|
64 |
+
constants (Nelder-Mead/Newton) at the end of each iteration.
|
65 |
+
- `topn`: int, How many top individuals migrate from each population.
|
66 |
+
- `weightAddNode`: float, Relative likelihood for mutation to add a node
|
67 |
+
- `weightDeleteNode`: float, Relative likelihood for mutation to delete a node
|
68 |
+
- `weightDoNothing`: float, Relative likelihood for mutation to leave the individual
|
69 |
+
- `weightMutateConstant`: float, Relative likelihood for mutation to change
|
70 |
+
the constant slightly in a random direction.
|
71 |
+
- `weightMutateOperator`: float, Relative likelihood for mutation to swap
|
72 |
+
an operator.
|
73 |
+
- `weightRandomize`: float, Relative likelihood for mutation to completely
|
74 |
+
delete and then randomly generate the equation
|
75 |
+
- `weightSimplify`: float, Relative likelihood for mutation to simplify
|
76 |
+
constant parts by evaluation
|
77 |
+
- `timeout`: float, Time in seconds to timeout search
|
78 |
+
- `equation_file`: str, Where to save the files (.csv separated by |)
|
79 |
+
- `test`: str, What test to run, if X,y not passed.
|
80 |
+
- `maxsize`: int, Max size of an equation.
|
81 |
+
|
82 |
+
**Returns**:
|
83 |
+
|
84 |
+
pd.DataFrame, Results dataframe, giving complexity, MSE, and equations
|
85 |
+
(as strings).
|
86 |
+
|
87 |
|
88 |
# TODO
|
89 |
|
eureqa.py
CHANGED
@@ -56,78 +56,49 @@ def eureqa(X=None, y=None, threads=4,
|
|
56 |
equations, but you should adjust `threads`, `niterations`,
|
57 |
`binary_operators`, `unary_operators` to your requirements.
|
58 |
|
59 |
-
:param X: 2D array. Rows are examples, columns are features.
|
60 |
-
:
|
61 |
-
:param
|
62 |
-
:type y: np.ndarray, optional
|
63 |
-
:param threads: Number of threads (=number of populations running).
|
64 |
You can have more threads than cores - it actually makes it more
|
65 |
efficient.
|
66 |
-
:
|
67 |
-
:param niterations: Number of iterations of the algorithm to run. The best
|
68 |
equations are printed, and migrate between populations, at the
|
69 |
end of each.
|
70 |
-
:
|
71 |
-
:param ncyclesperiteration: Number of total mutations to run, per 10
|
72 |
samples of the population, per iteration.
|
73 |
-
:
|
74 |
-
:param binary_operators: List of strings giving the binary operators
|
75 |
in Julia's Base, or in `operator.jl`.
|
76 |
-
:
|
77 |
-
:param
|
78 |
-
:
|
79 |
-
:param
|
80 |
-
:type alpha: float, optional
|
81 |
-
:param annealing: Whether to use annealing. You should (and it is default).
|
82 |
-
:type annealing: bool, optional
|
83 |
-
:param fractionReplaced: How much of population to replace with migrating
|
84 |
equations from other populations.
|
85 |
-
:
|
86 |
-
:param fractionReplacedHof: How much of population to replace with migrating
|
87 |
equations from hall of fame.
|
88 |
-
:
|
89 |
-
:param
|
90 |
-
:
|
91 |
-
:param
|
92 |
-
:
|
93 |
-
:param migration: Whether to migrate.
|
94 |
-
:type migration: bool, optional
|
95 |
-
:param hofMigration: Whether to have the hall of fame migrate.
|
96 |
-
:type hofMigration: bool, optional
|
97 |
-
:param shouldOptimizeConstants: Whether to numerically optimize
|
98 |
constants (Nelder-Mead/Newton) at the end of each iteration.
|
99 |
-
:
|
100 |
-
:param
|
101 |
-
:
|
102 |
-
:param
|
103 |
-
:
|
104 |
-
:param weightDeleteNode: Relative likelihood for mutation to delete a node
|
105 |
-
:type weightDeleteNode: float, optional
|
106 |
-
:param weightDoNothing: Relative likelihood for mutation to leave the individual
|
107 |
-
:type weightDoNothing: float, optional
|
108 |
-
:param weightMutateConstant: Relative likelihood for mutation to change
|
109 |
the constant slightly in a random direction.
|
110 |
-
:
|
111 |
-
:param weightMutateOperator: Relative likelihood for mutation to swap
|
112 |
an operator.
|
113 |
-
:
|
114 |
-
:param weightRandomize: Relative likelihood for mutation to completely
|
115 |
delete and then randomly generate the equation
|
116 |
-
:
|
117 |
-
:param weightSimplify: Relative likelihood for mutation to simplify
|
118 |
constant parts by evaluation
|
119 |
-
:
|
120 |
-
:param
|
121 |
-
:
|
122 |
-
:param
|
123 |
-
:
|
124 |
-
:param test: What test to run, if X,y not passed.
|
125 |
-
:type test: str, optional
|
126 |
-
:param maxsize: Max size of an equation.
|
127 |
-
:type maxsize: int, optional
|
128 |
-
:returns: Results dataframe, giving complexity, MSE, and equations
|
129 |
(as strings).
|
130 |
-
:rtype: pd.DataFrame
|
131 |
|
132 |
"""
|
133 |
|
|
|
56 |
equations, but you should adjust `threads`, `niterations`,
|
57 |
`binary_operators`, `unary_operators` to your requirements.
|
58 |
|
59 |
+
:param X: np.ndarray, 2D array. Rows are examples, columns are features.
|
60 |
+
:param y: np.ndarray, 1D array. Rows are examples.
|
61 |
+
:param threads: int, Number of threads (=number of populations running).
|
|
|
|
|
62 |
You can have more threads than cores - it actually makes it more
|
63 |
efficient.
|
64 |
+
:param niterations: int, Number of iterations of the algorithm to run. The best
|
|
|
65 |
equations are printed, and migrate between populations, at the
|
66 |
end of each.
|
67 |
+
:param ncyclesperiteration: int, Number of total mutations to run, per 10
|
|
|
68 |
samples of the population, per iteration.
|
69 |
+
:param binary_operators: list, List of strings giving the binary operators
|
|
|
70 |
in Julia's Base, or in `operator.jl`.
|
71 |
+
:param unary_operators: list, Same but for operators taking a single `Float32`.
|
72 |
+
:param alpha: float, Initial temperature.
|
73 |
+
:param annealing: bool, Whether to use annealing. You should (and it is default).
|
74 |
+
:param fractionReplaced: float, How much of population to replace with migrating
|
|
|
|
|
|
|
|
|
75 |
equations from other populations.
|
76 |
+
:param fractionReplacedHof: float, How much of population to replace with migrating
|
|
|
77 |
equations from hall of fame.
|
78 |
+
:param npop: int, Number of individuals in each population
|
79 |
+
:param parsimony: float, Multiplicative factor for how much to punish complexity.
|
80 |
+
:param migration: bool, Whether to migrate.
|
81 |
+
:param hofMigration: bool, Whether to have the hall of fame migrate.
|
82 |
+
:param shouldOptimizeConstants: bool, Whether to numerically optimize
|
|
|
|
|
|
|
|
|
|
|
83 |
constants (Nelder-Mead/Newton) at the end of each iteration.
|
84 |
+
:param topn: int, How many top individuals migrate from each population.
|
85 |
+
:param weightAddNode: float, Relative likelihood for mutation to add a node
|
86 |
+
:param weightDeleteNode: float, Relative likelihood for mutation to delete a node
|
87 |
+
:param weightDoNothing: float, Relative likelihood for mutation to leave the individual
|
88 |
+
:param weightMutateConstant: float, Relative likelihood for mutation to change
|
|
|
|
|
|
|
|
|
|
|
89 |
the constant slightly in a random direction.
|
90 |
+
:param weightMutateOperator: float, Relative likelihood for mutation to swap
|
|
|
91 |
an operator.
|
92 |
+
:param weightRandomize: float, Relative likelihood for mutation to completely
|
|
|
93 |
delete and then randomly generate the equation
|
94 |
+
:param weightSimplify: float, Relative likelihood for mutation to simplify
|
|
|
95 |
constant parts by evaluation
|
96 |
+
:param timeout: float, Time in seconds to timeout search
|
97 |
+
:param equation_file: str, Where to save the files (.csv separated by |)
|
98 |
+
:param test: str, What test to run, if X,y not passed.
|
99 |
+
:param maxsize: int, Max size of an equation.
|
100 |
+
:returns: pd.DataFrame, Results dataframe, giving complexity, MSE, and equations
|
|
|
|
|
|
|
|
|
|
|
101 |
(as strings).
|
|
|
102 |
|
103 |
"""
|
104 |
|