Sure. I mean part of it will be finding and exploiting symmetries, improvements ...

hansvm · on Aug 18, 2020

> finding and exploiting symmetries

I'm having trouble finding the papers I've skimmed on this (my original search phrase was something about succinct neural network encodings, entropy, yada yada yada), but that's already being done here and there, at least at a high level looking at graph symmetries (results were okayish -- theoretical space bounds aren't substantial improvements, and the compressed representations didn't perform super well). I haven't seen anything interesting yet explicitly dealing with the fact that neural networks represent the real world in some fashion and have a lot of biases imposed on the values the weights can take.

> improvements to it alternatives to backprop

I love seeing these come out. There's the no free lunch theorem and all that jazz, but in practice for real networks this can be a huge win.

> parallelism as well

Not really, at least if I'm understanding the chain of ancestor comments correctly. The arguments are more about the total cost of a given network than the total time to train it. Those are loosely entangled since with low levels of parallelism we're inclined to operate at higher clock speeds or take other energy-inefficient actions, but generally we would expect a parallel algorithm to be no more energy efficient (with respect to total conceptual work performed -- e.g., training a fixed neural network) than an equivalent serial implementation.

breuleux · on Aug 18, 2020

> generally we would expect a parallel algorithm to be no more energy efficient (with respect to total conceptual work performed -- e.g., training a fixed neural network) than an equivalent serial implementation

I think it depends on how you conceptualize the topology in both cases. A serial implementation requires threading all the data through a single point, whereas a parallel implementation can leave data where it is going to be used. Moving data around requires energy, so implementations that maximize locality of data should be more energy efficient. Such implementations would naturally synchronize as little as possible, so they would be highly parallel.

Basically, serial implementations of neural networks require a clock and a form of RAM, including the energy overhead of dispatch, synchronization and data transport, whereas parallel implementations don't: each neuron could just contain whatever little data it needs and nothing more.

dumbfoundded · on Aug 18, 2020

There are alternatives to backprop like direct feedback alignment (1). They specifically focus on making the computation more parallel and thus more scaleable for the reasons you mention.

(1) http://papers.nips.cc/paper/6441-direct-feedback-alignment-p...

marmaduke · on Aug 18, 2020

An interesting backprop alternative here:

https://arxiv.org/abs/1901.09049

Essentially inspired by spiking neurons, it can be implemented in neuromorphic hardware.