Spaces:
Running
Running
Commit
·
1344be6
1
Parent(s):
40270bf
Organize todo list
Browse files
README.md
CHANGED
@@ -301,40 +301,49 @@ pd.DataFrame, Results dataframe, giving complexity, MSE, and equations
|
|
301 |
- [x] Test suite
|
302 |
- [x] Performance: - Use an enum for functions instead of storing them?
|
303 |
- Gets ~40% speedup on small test.
|
304 |
-
- [
|
305 |
-
|
306 |
-
|
307 |
-
|
308 |
-
|
309 |
-
- [ ]
|
|
|
|
|
|
|
310 |
- [ ] Refresh screen rather than dumping to stdout?
|
311 |
- [ ] Add ability to save state from python
|
312 |
-
- [ ]
|
313 |
-
- [ ]
|
314 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
315 |
- [ ] Implement more parts of the original Eureqa algorithms: https://www.creativemachineslab.com/eureqa.html
|
316 |
- [ ] Experiment with freezing parts of model; then we only append/delete at end of tree.
|
317 |
-
- [ ] Sympy printing
|
318 |
-
- [ ] Consider adding mutation for constant<->variable
|
319 |
-
- [ ] Hierarchical model, so can re-use functional forms. Output of one equation goes into second equation?
|
320 |
- [ ] Use NN to generate weights over all probability distribution conditional on error and existing equation, and train on some randomly-generated equations
|
321 |
-
- [ ] Add GPU capability?
|
322 |
-
- Not sure if possible, as binary trees are the real bottleneck.
|
323 |
-
- Could generate on CPU, evaluate score on GPU?
|
324 |
-
- [ ] Idea: use gradient of equation with respect to each operator (perhaps simply add to each operator) to tell which part is the most "sensitive" to changes. Then, perhaps insert/delete/mutate on that part of the tree?
|
325 |
- [ ] For hierarchical idea: after running some number of iterations, do a search for "most common pattern". Then, turn that subtree into its own operator.
|
326 |
-
- [ ]
|
327 |
-
- [ ]
|
328 |
-
-
|
329 |
-
- [ ] Can we cache calculations, or does the compiler do that? E.g., I should only have to run exp(x0) once; after that it should be read from memory.
|
330 |
-
- Maybe I could store the result of calculations in a tree (or an index to a massive array that does this). And only when something in the subtree updates, does the rest of the tree update!
|
331 |
-
- [ ] Try Memoize.jl instead of manually caching.
|
332 |
-
- [ ] Try threading over population. Do random sort, compute mutation for each, then replace 10% oldest.
|
333 |
-
- [ ] Call function to read from csv after running
|
334 |
-
- [ ] Add function to plot equations
|
335 |
-
- [ ] Sort this todo list by priority
|
336 |
-
- [ ] Consider printing output sorted by score, not by complexity.
|
337 |
-
- [ ] Performance: try inling things?
|
338 |
-
- [ ] Multi targets (vector ops)
|
339 |
|
340 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
301 |
- [x] Test suite
|
302 |
- [x] Performance: - Use an enum for functions instead of storing them?
|
303 |
- Gets ~40% speedup on small test.
|
304 |
+
- [x] Use @fastmath
|
305 |
+
- [ ] Sort these todo lists by priority
|
306 |
+
|
307 |
+
## Feature ideas
|
308 |
+
|
309 |
+
- [ ] Sympy printing
|
310 |
+
- [ ] Hierarchical model, so can re-use functional forms. Output of one equation goes into second equation?
|
311 |
+
- [ ] Call function to read from csv after running
|
312 |
+
- [ ] Add function to plot equations
|
313 |
- [ ] Refresh screen rather than dumping to stdout?
|
314 |
- [ ] Add ability to save state from python
|
315 |
+
- [ ] Additional degree operators?
|
316 |
+
- [ ] Multi targets (vector ops)
|
317 |
+
- [ ] Tree crossover? I.e., can take as input a part of the same equation, so long as it is the same level or below?
|
318 |
+
- [ ] Consider printing output sorted by score, not by complexity.
|
319 |
+
- [ ] Dump scores alongside MSE to .csv (and return with Pandas).
|
320 |
+
- [ ] Create flexible way of providing "simplification recipes." I.e., plus(plus(T, C), C) => plus(T, +(C, C)). The user could pass these.
|
321 |
+
- [ ] Consider allowing multi-threading turned off, for faster testing (cache issue on travis). Or could simply fix the caching issue there.
|
322 |
+
- [ ] Consider returning only the equation of interest; rather than all equations.
|
323 |
+
|
324 |
+
## Algorithmic performance ideas:
|
325 |
+
|
326 |
+
- [ ] Idea: use gradient of equation with respect to each operator (perhaps simply add to each operator) to tell which part is the most "sensitive" to changes. Then, perhaps insert/delete/mutate on that part of the tree?
|
327 |
+
- [ ] Consider adding mutation for constant<->variable
|
328 |
- [ ] Implement more parts of the original Eureqa algorithms: https://www.creativemachineslab.com/eureqa.html
|
329 |
- [ ] Experiment with freezing parts of model; then we only append/delete at end of tree.
|
|
|
|
|
|
|
330 |
- [ ] Use NN to generate weights over all probability distribution conditional on error and existing equation, and train on some randomly-generated equations
|
|
|
|
|
|
|
|
|
331 |
- [ ] For hierarchical idea: after running some number of iterations, do a search for "most common pattern". Then, turn that subtree into its own operator.
|
332 |
+
- [ ] Calculate feature importances based on features we've already seen, then weight those features up in all random generations.
|
333 |
+
- [ ] Calculate feature importances of future mutations, by looking at correlation between residual of model, and the features.
|
334 |
+
- Store feature importances of future, and periodically update it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
335 |
|
336 |
|
337 |
+
## Code performance ideas:
|
338 |
+
|
339 |
+
- [ ] Add true multi-node processing, with MPI, or just file sharing. Multiple populations per core.
|
340 |
+
- Ongoing in cluster branch
|
341 |
+
- [ ] Try @spawn over each sub-population. Do random sort, compute mutation for each, then replace 10% oldest.
|
342 |
+
- [ ] Performance: try inling things?
|
343 |
+
- [ ] Try defining a binary tree as an array, rather than a linked list. See https://stackoverflow.com/a/6384714/2689923
|
344 |
+
|
345 |
+
- [ ] Can we cache calculations, or does the compiler do that? E.g., I should only have to run exp(x0) once; after that it should be read from memory.
|
346 |
+
- Done on caching branch. Currently am finding that this is quiet slow (presumably because memory allocation is the main issue).
|
347 |
+
- [ ] Add GPU capability?
|
348 |
+
- Not sure if possible, as binary trees are the real bottleneck.
|
349 |
+
- Could generate on CPU, evaluate score on GPU?
|