Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sure. I mean part of it will be finding and exploiting symmetries, improvements to it alternatives to backprop, parallelism as well.


> finding and exploiting symmetries

I'm having trouble finding the papers I've skimmed on this (my original search phrase was something about succinct neural network encodings, entropy, yada yada yada), but that's already being done here and there, at least at a high level looking at graph symmetries (results were okayish -- theoretical space bounds aren't substantial improvements, and the compressed representations didn't perform super well). I haven't seen anything interesting yet explicitly dealing with the fact that neural networks represent the real world in some fashion and have a lot of biases imposed on the values the weights can take.

> improvements to it alternatives to backprop

I love seeing these come out. There's the no free lunch theorem and all that jazz, but in practice for real networks this can be a huge win.

> parallelism as well

Not really, at least if I'm understanding the chain of ancestor comments correctly. The arguments are more about the total cost of a given network than the total time to train it. Those are loosely entangled since with low levels of parallelism we're inclined to operate at higher clock speeds or take other energy-inefficient actions, but generally we would expect a parallel algorithm to be no more energy efficient (with respect to total conceptual work performed -- e.g., training a fixed neural network) than an equivalent serial implementation.


> generally we would expect a parallel algorithm to be no more energy efficient (with respect to total conceptual work performed -- e.g., training a fixed neural network) than an equivalent serial implementation

I think it depends on how you conceptualize the topology in both cases. A serial implementation requires threading all the data through a single point, whereas a parallel implementation can leave data where it is going to be used. Moving data around requires energy, so implementations that maximize locality of data should be more energy efficient. Such implementations would naturally synchronize as little as possible, so they would be highly parallel.

Basically, serial implementations of neural networks require a clock and a form of RAM, including the energy overhead of dispatch, synchronization and data transport, whereas parallel implementations don't: each neuron could just contain whatever little data it needs and nothing more.


There are alternatives to backprop like direct feedback alignment (1). They specifically focus on making the computation more parallel and thus more scaleable for the reasons you mention.

(1) http://papers.nips.cc/paper/6441-direct-feedback-alignment-p...


An interesting backprop alternative here:

https://arxiv.org/abs/1901.09049

Essentially inspired by spiking neurons, it can be implemented in neuromorphic hardware.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: