Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I encourage reading this, not as self-promotion, but as a first-person history of what it feels like to be too early with a technology.

Someone out there is probably experimenting with something world-changing, and has all the ingredients except for a few more iterations of Moore's Law. It would feel a lot like working on deep learning in 1990. If you think you might be on this path, it's worth studying the history.



Definitely don't read it as a history. It's just a lie. Schmidhuber is laying claim to a lot of things he didn't do. And is taking anything that kind of relates in words to modern techniques and claiming that he invented the technique. Even if practically his papers have nothing to do with what the words mean today and had no influence on the field.

These are basically the only outliers who claim that automatic differentiation was invented by Linnainmaa alone. Many people invented AD at the same time, and Linnainmaa was not the first. Simply naming one person is a huge disservice to the community and shows that this is just propaganda, as much of Schmidhuber's stuff is.

1. First Very Deep NNs -> This is false. Schmidhuber did not create the first deep network in 1991. This dates back at least to Fukushima for the theory and LeCun in 1989 practically.

2. Compressing / Distilling one NN into Another. Lots of people did this before 1991.

3. The Fundamental Deep Learning Problem: Vanishing / Exploding Gradients. They did publish an analysis of this, that's true.

4. Long Short-Term Memory (LSTM) Recurrent Networks. No, this was 1997.

5. Artificial Curiosity Through Adversarial Generative NNs. Absolutely not. Andrew G Barto, Richard S Sutton, and Charles W Anderson. Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems. IEEE Trans. on Systems, Man, and Cybernetics, (5):834– 846, 1983

6. Artificial Curiosity Through NNs That Maximize Learning Progress (1991) I have nothing to say to this. This isn't something that worked back in 1991 and it's not something that works today.

7. Adversarial Networks for Unsupervised Data Modeling (1991) This isn't the same idea as GANs. The idea as presented in the paper doesn't work.

8. End-To-End-Differentiable Fast Weights: NNs Learn to Program NNs (1991). Already existed and the idea as presented in the original paper doesn't work.

9. Learning Sequential Attention with NNs (1990). He uses the word attention but it's not the same mechanism as the one we use today which dates to 2010. This did not invent attention in any way.

10. Hierarchical Reinforcement Learning (1990). Their 1990 does not do hierarchical RL, their 1991 paper does something like it. This is at least contemporary with Learning to Select a Model in a Changing World by Mieczyslaw M.Kokar and Spiridon A.Reveliotis.

Please stop posting the ravings on a person who is trying to steal other people's work.


> 1. First Very Deep NNs -> This is false. Schmidhuber did not create the first deep network in 1991. This dates back at least to Fukushima for the theory and LeCun in 1989 practically.

Quoting the blog post:

"Of course, Deep Learning in feedforward NNs started much earlier, with Ivakhnenko & Lapa, who published the first general, working learning algorithms for deep multilayer perceptrons with arbitrarily many layers back in 1965 [DEEP1]. For example, Ivakhnenko's paper from 1971 [DEEP2] already described a Deep Learning net with 8 layers, trained by a highly cited method still popular in the new millennium [DL2]."

Let's try to be fair and objective here. You may have an axe to grind with Schmidhuber, but that does not give you the right to take things out of context.


"First Very Deep NNs -> This is false. Schmidhuber did not create the first deep network in 1991."

Well, now it seems that you lie here since you avoid the well-known fact that Schmidhuber attributes the first deep NNS back to the 60ies and 70ies. The same goes for many of your other points.


Do we have a few more iterations of Moore's law?


Even if we don't, the progress is not going to stop, for example on:

- lowering the price of each chip - you can get that by more automation.

- lowering the cost of energy used by a chip - you can have that by raise of renewable energy generation and its decentralisation (and again, more automation).

The point is that automation caused by AI will start a reinforcing feedback loop where more and more work can be done more cheaply, speeding up automation itself too.


>The point is that automation caused by AI will start a reinforcing feedback loop where more and more work can be done more cheaply, speeding up automation itself too.

there isn't much evidence that AI has accelerated the rate of automation, and people have been saying this about information technology for the last 4 decades already. By any account, automation and growth contribution of the technologies are low by historical standards.

The primary mechanism that has kept Moore's law alive up until now is miniaturization of transistors and we're going to run into a wall on that front pretty soon.


I will not counterargue your main point because this is indeed a matter of debate from a 'technical' standpoint.

However, in broader economic terms, I think the idea that AI may 'accelerate' the world in general is largely indirect: for instance, by saving time and money in other areas of life (because better tools, cheaper means, infra, etc), people become more able to perform their job. There are obviously diminishing returns to such optimization, as to any natural/economic process.


Automation also often means that useful jobs get turned into bullshit jobs, that stay there e.g. for political reasons, sometimes leading even to decreased efficiency.


Yeah. Who knows how fast the growth is gonna be or how it’s gonna look but people are already working on eg communication-avoiding algorithms for matrix or tensor operations to work best in the new regime. I’m not an expert in this area but if you allow me to paraphrase of someone who is, one of the reason algorithms people have employment is that all of these things get redone over and over to exploit advances in hardware.


Not for clock speed, but yes for parallelism. It might look like the Cerebras [0] wafer-scale monster becoming a commodity you could fire up 1000 of in the cloud.

[0] https://www.cerebras.net/


We've got a few more iterations of Moore's law for sure. After that progress will likely happen in jumps and address non-xtor bottlenecks like memory access. E.g. wafer scale integration, 3D systems, photonics, etc.


For what it's worth I interpret the GP's statement as referring to general foundational progress in whatever field, not Moore's law specifically.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: