MilesCranmer commited on
Commit
1344be6
·
1 Parent(s): 40270bf

Organize todo list

Browse files
Files changed (1) hide show
  1. README.md +38 -29
README.md CHANGED
@@ -301,40 +301,49 @@ pd.DataFrame, Results dataframe, giving complexity, MSE, and equations
301
  - [x] Test suite
302
  - [x] Performance: - Use an enum for functions instead of storing them?
303
  - Gets ~40% speedup on small test.
304
- - [ ] Add true multi-node processing, with MPI, or just file sharing. Multiple populations per core.
305
- - Ongoing in cluster branch
306
- - [ ] Consider allowing multi-threading turned off, for faster testing (cache issue on travis). Or could simply fix the caching issue there.
307
- - [ ] Dump scores alongside MSE to .csv (and return with Pandas).
308
- - [ ] Consider returning only the equation of interest; rather than all equations.
309
- - [ ] Use @fastmath
 
 
 
310
  - [ ] Refresh screen rather than dumping to stdout?
311
  - [ ] Add ability to save state from python
312
- - [ ] Calculate feature importances based on features we've already seen, then weight those features up in all random generations.
313
- - [ ] Calculate feature importances of future mutations, by looking at correlation between residual of model, and the features.
314
- - Store feature importances of future, and periodically update it.
 
 
 
 
 
 
 
 
 
 
315
  - [ ] Implement more parts of the original Eureqa algorithms: https://www.creativemachineslab.com/eureqa.html
316
  - [ ] Experiment with freezing parts of model; then we only append/delete at end of tree.
317
- - [ ] Sympy printing
318
- - [ ] Consider adding mutation for constant<->variable
319
- - [ ] Hierarchical model, so can re-use functional forms. Output of one equation goes into second equation?
320
  - [ ] Use NN to generate weights over all probability distribution conditional on error and existing equation, and train on some randomly-generated equations
321
- - [ ] Add GPU capability?
322
- - Not sure if possible, as binary trees are the real bottleneck.
323
- - Could generate on CPU, evaluate score on GPU?
324
- - [ ] Idea: use gradient of equation with respect to each operator (perhaps simply add to each operator) to tell which part is the most "sensitive" to changes. Then, perhaps insert/delete/mutate on that part of the tree?
325
  - [ ] For hierarchical idea: after running some number of iterations, do a search for "most common pattern". Then, turn that subtree into its own operator.
326
- - [ ] Additional degree operators?
327
- - [ ] Tree crossover? I.e., can take as input a part of the same equation, so long as it is the same level or below?
328
- - [ ] Create flexible way of providing "simplification recipes." I.e., plus(plus(T, C), C) => plus(T, +(C, C)). The user could pass these.
329
- - [ ] Can we cache calculations, or does the compiler do that? E.g., I should only have to run exp(x0) once; after that it should be read from memory.
330
- - Maybe I could store the result of calculations in a tree (or an index to a massive array that does this). And only when something in the subtree updates, does the rest of the tree update!
331
- - [ ] Try Memoize.jl instead of manually caching.
332
- - [ ] Try threading over population. Do random sort, compute mutation for each, then replace 10% oldest.
333
- - [ ] Call function to read from csv after running
334
- - [ ] Add function to plot equations
335
- - [ ] Sort this todo list by priority
336
- - [ ] Consider printing output sorted by score, not by complexity.
337
- - [ ] Performance: try inling things?
338
- - [ ] Multi targets (vector ops)
339
 
340
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
301
  - [x] Test suite
302
  - [x] Performance: - Use an enum for functions instead of storing them?
303
  - Gets ~40% speedup on small test.
304
+ - [x] Use @fastmath
305
+ - [ ] Sort these todo lists by priority
306
+
307
+ ## Feature ideas
308
+
309
+ - [ ] Sympy printing
310
+ - [ ] Hierarchical model, so can re-use functional forms. Output of one equation goes into second equation?
311
+ - [ ] Call function to read from csv after running
312
+ - [ ] Add function to plot equations
313
  - [ ] Refresh screen rather than dumping to stdout?
314
  - [ ] Add ability to save state from python
315
+ - [ ] Additional degree operators?
316
+ - [ ] Multi targets (vector ops)
317
+ - [ ] Tree crossover? I.e., can take as input a part of the same equation, so long as it is the same level or below?
318
+ - [ ] Consider printing output sorted by score, not by complexity.
319
+ - [ ] Dump scores alongside MSE to .csv (and return with Pandas).
320
+ - [ ] Create flexible way of providing "simplification recipes." I.e., plus(plus(T, C), C) => plus(T, +(C, C)). The user could pass these.
321
+ - [ ] Consider allowing multi-threading turned off, for faster testing (cache issue on travis). Or could simply fix the caching issue there.
322
+ - [ ] Consider returning only the equation of interest; rather than all equations.
323
+
324
+ ## Algorithmic performance ideas:
325
+
326
+ - [ ] Idea: use gradient of equation with respect to each operator (perhaps simply add to each operator) to tell which part is the most "sensitive" to changes. Then, perhaps insert/delete/mutate on that part of the tree?
327
+ - [ ] Consider adding mutation for constant<->variable
328
  - [ ] Implement more parts of the original Eureqa algorithms: https://www.creativemachineslab.com/eureqa.html
329
  - [ ] Experiment with freezing parts of model; then we only append/delete at end of tree.
 
 
 
330
  - [ ] Use NN to generate weights over all probability distribution conditional on error and existing equation, and train on some randomly-generated equations
 
 
 
 
331
  - [ ] For hierarchical idea: after running some number of iterations, do a search for "most common pattern". Then, turn that subtree into its own operator.
332
+ - [ ] Calculate feature importances based on features we've already seen, then weight those features up in all random generations.
333
+ - [ ] Calculate feature importances of future mutations, by looking at correlation between residual of model, and the features.
334
+ - Store feature importances of future, and periodically update it.
 
 
 
 
 
 
 
 
 
 
335
 
336
 
337
+ ## Code performance ideas:
338
+
339
+ - [ ] Add true multi-node processing, with MPI, or just file sharing. Multiple populations per core.
340
+ - Ongoing in cluster branch
341
+ - [ ] Try @spawn over each sub-population. Do random sort, compute mutation for each, then replace 10% oldest.
342
+ - [ ] Performance: try inling things?
343
+ - [ ] Try defining a binary tree as an array, rather than a linked list. See https://stackoverflow.com/a/6384714/2689923
344
+
345
+ - [ ] Can we cache calculations, or does the compiler do that? E.g., I should only have to run exp(x0) once; after that it should be read from memory.
346
+ - Done on caching branch. Currently am finding that this is quiet slow (presumably because memory allocation is the main issue).
347
+ - [ ] Add GPU capability?
348
+ - Not sure if possible, as binary trees are the real bottleneck.
349
+ - Could generate on CPU, evaluate score on GPU?