>especially as we won’t work on product marketing for AI stuff, from a moral standpoint, but the vast majority of enquiries have been for exactly that.
I intentionally ignored the biggest invention of the 21st century out of strange personal beliefs and now my business is going bankrupt
Yes I find this a bit odd. AI is a tool, what specific part of it do you find so objectionable OP? For me, I know they are never going to put the genie back in the bottle, we will never get back the electricity spent on it, I might as well use it. We finally got a pretty good Multivac we can talk to and for me it usually gives the right answers back. It is a once in a lifetime type invention we get to enjoy and use. I was king of the AI haters but around Gemini 2.5 it just became so good that if you are hating it or criticizing it you aren’t looking at it objectively anymore.
He coined the concept 'singularity' in the sense of machines becoming smarter than humans what a time for him to die with all the advancements we're seeing in artificial intelligence. I wonder what he thought about it all.
>The concept and the term "singularity" were popularized by Vernor Vinge first in 1983 in an article that claimed that once humans create intelligences greater than their own, there will be a technological and social transition similar in some sense to "the knotted space-time at the center of a black hole",[8] and later in his 1993 essay The Coming Technological Singularity,[4][7] in which he wrote that it would signal the end of the human era, as the new superintelligence would continue to upgrade itself and would advance technologically at an incomprehensible rate. He wrote that he would be surprised if it occurred before 2005 or after 2030.
Just to clarify, the “singularity” conjectures a slightly different and more interesting phenomenon, one driven by technological advances, true, but its definition was not those advances.
It was more the second derivative of future shock: technologies and culture that enabled and encouraged faster and faster change until the curve bent essentially vertical…asymptotimg to a mathematical singularity.
An example my he spoke of was that, close to the singularity, someone might found a corporation, develop a technology, make a profit from it, and then have it be obsolete by noon.
And because you can’t see the shape of the curve on the other side of such a singularity, people living on the other side of it would be incomprehensible to people on this side.
Ray Lafferty’s 1965 story “Slow Tuesday Night” explored this phenomenon years before Toffler wrote “Future Shock”
where people can use a "Bobble" to freeze themselves in a stasis field and travel in time... forward. The singularity is some mysterious event that causes all of unbobbled humanity to disappear leaving the survivors wondering, even 10s of millions of years later, what happened. As such it is one of the best pretenses ever in sci-fi. (I am left wondering though if the best cultural comparison is "The Rapture" some Christians believe in making this more of a religiously motivated concept as opposed to sound futurism.)
I've long been fascinated by this differential equation
dx
-- = x^2
dt
which has solutions that look like
x = 1/(t₀-t)
which notably blows up at time t₀. It's a model of an "intelligence explosion" where improving technology speeds up the rate of technological process but the very low growth when t ≪ t₀ could also be a model for why it is hard to bootstrap a two-sided market, why some settlements fail, etc. About 20 years ago I was very interested in ecological accounting and wondering if we could outrace resource depletion and related problems and did a literature search for people developing models like this further and was pretty disappointed not to find much also it did appear as a footnote in the ecology literature here and there. Even papers like
seem to miss it. (Surprised the lesswrong folks haven't picked it up but they don't seem too mathematically inclined)
---
Note I don't believe in the intelligence explosion because what we've seen in "Moore's law" recently is that each generation of chips is getting much more difficult and expensive to develop whereas the benefits of shrinks are shrinking and in fact we might be rudely surprised that the state of the art chips of the new future (and possibly 2024) burn up pretty quickly. It's not so clear that chipmakers would have continued to invest in a new generation if governments weren't piling huge money into a "great powers" competition... That is, already we might be past the point of economic returns.
IMHO Marooned in Realtime is the best Vinge book. Besides being a dual mystery novel, it really explores the implications of bobble technology and how just a few hours of technology development near the singularity can be extreme.
Yep. I like it better than Fire Upon the Deep but I do like both of them. I didn’t like A Deepness in the Sky as it was feeling kinda grindy like Dune. (I wish we could just erase Dune so people could enjoy all of Frank Herbert’s other novels of which I love even the bad ones)
The first time I read A Deepness In The Sky, I was a bit annoyed, because I was excited for the A plot to progress, and it felt like we were spending an awful lot of time on B & C.
On a second read, when I knew where the story was going and didn't need the frisson of resolution, I enjoyed it much more. It's good B & C plot, and it all does tie in. But arguably the pacing is off.
Can you recommend a non-Dune Herbert book? I recall seeing Dosadi when I was a kid in the sci fi section of the library and just never picked it up. I generally like hard sci-fi and my main issue with Dune was that it went off into the weeds too many times.
I like the Dosadi books, Whipping Star, the short stories in Eye, Eyes of Heisenberg, Destination: Void, The Santaroga Barrier (which my wife hates), Under Pressure and Hellstrom's Hive. If I had to pick just one it might be Whipping Star but maybe Under Pressure is the hardest sci-fi.
Whipping Star has some amazing alien vs human discourse (at least, that's my memory from ~20 years ago!). It was the first time I found alien dialog that didn't sound like repackaged English.
I loved 'The Jesus Incident' [0], which he co-authored with Bill Ransom - when I read it as a teenager in the 80s, it felt so 'adult' compared to a lot of the other science fiction I had read to that point.
I later read the prequel and did not like it. I never read the third book in the trilogy.
I _hated_ The Green Brain, but that was mostly because he had all the characters say everything in Portuguese, then repeat themselves in English. It was as if there was an echo in the room.
I'm also a bit sceptical of an intelligence explosion but compute per dollar has increased in a steady exponential way long before Moore's law and will probably continue after it. There are ways to progress other than shrinking transistors.
Even though we understand a lot more about how LLMs work and have cut resource consumption dramatically in the last year we still know hardly anything so it seems quite likely there is a better way to do it.
For one thing dense vectors for language seem kinda insane to me. Change one pixel in a picture and it makes no difference to the meaning. Change one letter in a sentence and you can change the meaning completely so a continuous representation seems fundamentally wrong.
The bobble is a speculative technology that originated in Vernor Vinge's
science fiction. It allows spherical volumes to be enclosed in complete
stasis for controllable periods of time. It was used in _The Peace War_ as
a weapon, and in _Marooned in Realtime_ as a way for humans to tunnel
through the Singularity unchanged.
As far as I know, the bobble is physically impossible. However it may be
possible to simulate its effects with other technologies. Here I am
especially interested in the possibility of tunneling through the
Singularity.
Why would anyone want to do that, you ask? Some people may have long term
goals that might be disrupted by the Singularity, for example maintaining
Danny Hillis's clock or keeping a record of humanity. Others may want to
do it if the Singularity is approaching in an unacceptable manner and they
are powerless to stop or alter it. For example an anarchist may want to
escape a Singularity that is dominated by a single consciousness. A
pacifist may want to escape a Singularity that is highly adversarial. Perhaps just the possibility of tunneling through the Singularity can ease
people's fears about advanced technology in general.
Singularity tunneling seems to require a technology that can defend its
comparatively powerless users against extremely, perhaps even
unimaginably, powerful adversaries. The bobble of course is one such
technology, but it is not practical. The only realistic technology that I
am aware of that is even close to meeting this requirement is
cryptography. In particular, given some complexity theoretic assumptions
it is possible to achieve exponential security in certain restricted
security models. Unfortunately these security models are not suitable for
my purpose. While adversaries are allowed to have computational power that
is exponential in the amount of computational power of the users, they can
only interact with the users in very restricted ways, such as reading or
modifying the messages they send to each other. It is unclear how to use
cryptography to protect the users themselves instead of just their
messages. Perhaps some sort of encrypted computation can hide their
thought processes and internal states from passive monitors. But how does
one protect against active physical attacks?
The reason I bring up cryptography, however, is to show that it IS
possible to defend against adversaries with enormous resources at
comparatively little cost, at least in certain situations. The Singularity
tunneling problem should not be dismissed out of hand as being unsolvable,
but rather deserves to be studied seriously. There is a very realistic
chance that the Singularity may turn out to be undesirable to many of us.
Perhaps it will be unstable and destroy all closely-coupled intelligence.
Or maybe the only entity that emerges from it will have the "personality"
of the Blight. It is important to be able to try again if the first
Singularity turns out badly.
"I do have some early role models. I recall wanting to be a real-life version of the fictional "Sandor Arbitration Intelligence at the Zoo" (from Vernor Vinge's novel A Fire Upon the Deep) who in the story is known for consistently writing the clearest and most insightful posts on the Net. And then there was Hal Finney who probably came closest to an actual real-life version of Sandor at the Zoo, and Tim May who besides inspiring me with his vision of cryptoanarchy was also a role model for doing early retirement from the tech industry and working on his own interests/causes."
A few people have pointed out that Sandor at the Zoo was more likely a reference to someone else, of course: ""The Zoo" etc. was a reference to Henry Spencer, who was known on Usenet for his especially clear posts. He posted from utzoo (University of Toronto Zoology.)"
with respect, we don’t know if he was spot on. Companies shoehorning language models into their products is a far cry from the transformative societal change he describes will happen. nothing like a singularity has yet happened at the scale he describes, and might not happen without more fundamental shifts/breakthroughs in AI research.
What we're seeing right now with LLMs is like music in the late 30s after the invention of the electric guitar. At that point people still have no idea how to use it so, so they were treating it like an amplified acoustic guitar. It took almost 40 years for people to come up with the idea of harnessing feedback and distortion to use the guitar to create otherworldly soundscapes, and another 30 beyond that before people even approached the limit of guitar's range with pedals and such.
LLMs are a game changer that are going to enable a new programming paradigm as models get faster and better at producing structured output. There are entire classes of app that couldn't exist before because there there was a non-trivial "fuzzy" language problem in the loop. Furthermore I don't think people have a conception of how good these models are going to get within 5-10 years.
> Furthermore I don't think people have a conception of how good these models are going to get within 5-10 years.
Pretty sure it's quite the opposite of what you're implying: People see those LLMs who closely resemble actual intelligence on the surface, but have some shortcomings. Now they extrapolate this and think it's just a small step to perfection and/or AGI, which is completely wrong.
One problem is that converging to an ideal is obviously non-linear, so getting the first 90% right is relatively easy, and closer to 100% it gets exponentially harder. Another problem is that LLMs are not really designed in a way to contain actual intelligence in the way humans would expect them to, so any apparent reasoning is very superficial as it's just language-based and statistical.
In a similar spirit, science fiction stories playing in the near future often tend to have spectacular technology, like flying personal cars, in-eye displays, beam travel, or mind reading devices. In the 1960s it was predicted for the 80s, in the 80s it was predicted for the 2000s etc.
tells (among other things) a harrowing tale of a common mistake in technology development that blindsides people every time: the project that reaches an asymptote instead of completion that can get you to keep spending resources and spending resources because you think you have only 5% to go except the approach you've chosen means you'll never get the last 4%. It's a seductive situation that tends to turn the team away from Cassandras who have a clear view.
Happens a lot in machine learning projects where you don’t have the right features. (Right now I am chewing on the problem of “what kind of shoes is the person in this picture wearing?” and how many image classification models would not at all get that they are supposed to look at a small part of the image and how easy it would be to conclude that “this person is on a basketball court so they are wearing sneakers” or “this is a dude so they aren’t wearing heels” or “this lady has a fancy updo and fancy makeup so she must be wearing fancy shoes”. Trouble is all those biases make the model perform better up to a point but to get past that point you really need to segment out the person’s feet.)
You are looking at things like the failure of full self driving due to massive long tail complexity, and extrapolating that to LLMs. The difference is that full self driving isn't viable unless it's near perfect, whereas LLMs and text to image models are very useful even when imperfect. In any field there is a sigmoidal progress curve where things seem to move slowly at first when getting set up, accelerate quickly once a framework is in place, then start to run out of low hanging fruit and have to start working hard for incremental progress, until the field is basically mined out. Given the rate that we're seeing new stuff come out related to LLMs and image/video models, I think it's safe to say we're still in the low hanging fruit stage. We might not achieve better than human performance or AGI across a variety of fields right away, but we'll build a lot of very powerful tools that will accelerate our technological progress in the near term, and those goals are closer than many would like to admit.
AGI (human level intelligence) is not an really an end goal but a point that will be surpassed. So, by looking at it as something asymptotically approaching an ideal 100% is fundamentally wrong. That 100% mark is going to be in the rear view mirror at some point. And it's a bit of an arbitrary mark as well.
Of course it doesn't help that people are a bit hand wavy about what that mark exactly is to begin with. We're very good at moving the goal posts. So that 100% mark has the problem that it's poorly defined and in any case just a brief moment in time given exponential improvements in capabilities. In the eyes of most we're not quite there yet for whatever there is. I would agree with that.
At some point we'll be debating whether we are actually there, and then things move on from there. A lot of that debate is going to be a bit emotional and irrational of course. People are very sensitive about these things and they get a bit defensive when you portray them as clearly inferior to something else. Arguably, most people I deal with don't actually know a lot, their reasoning is primitive/irrational, and if you'd benchmark them against an LLM it wouldn't be that great. Or that fair.
The singularity is kind of the point where most of the improvements to AI are going to come from ideas and suggestions generated by AI rather than by humans. Whether that's this decade or the next is a bit hard to predict obviously.
Human brains are quite complicated but there's only a finite number of neurons in there; a bit under 100 billion. We can waffle a bit about the complexity of their connections. But at some point it becomes a simple matter of throwing more hardware at the problem. With LLMs pushing tens-hundreds of parameters already, you could legitimately ask what a few more doublings in numbers here enable.
I think you're falling for the exact same fallacy that I was describing. Also note that the human level of intelligence is not arbitrary at all: Most LLMs are trained on human-generated data, and since they are statistical models, they won't suddenly come up with truly novel reasoning. They're generally just faster at generating stuff than humans, because they're computers.
In 5 to 10 years we will have likely moved on to the next big model architecture just like it was all about convolutional networks 5 to 10 years ago despite the pivotal paper being published in 2017.
Singularity doesn't necessarily rely on LLMs by any means. It's just that communication is improving and the number of people doing research is increasing. Weak AI is icing on top, let alone LLMs, which are being shoe-horned into everything now. VV clearly adds these two other paths:
o Computer/human interfaces may become so intimate that users
may reasonably be considered superhumanly intelligent.
o Biological science may find ways to improve upon the natural
human intellect.
Yeah this is the angle I look at the most, the Humans+Internet combo.
I don't believe LLMs will really get us much of anywhere, Singularity-wise. They're just ridiculously inefficient in terms of compute (and thus power) needs to even do the basic pattern-prediction they do today. They're neat tools for human augmentation in some cases, but that's about all they contribute.
I think, even prior to the recent explosion of LLM stuff, that the aggregate of Humans and the depth of their interconnections on the Internet is already starting to form at least the beginnings of a sort of Singularity, without any AI-related topics needing to be introduced. The way memes (real memes, not silly jokes) spread around the Internet and shape thoughts across all the users, the way the users bounce ideas off each other and refine them, the way viral advocacy and information sharing works, etc. Basically the Singularity is just going to be the emergent group consciousness and capabilities of the collective Internet-connected set of Humans.
> Within thirty years, we will have the technological means to create superhuman intelligence.
Blackwell.
> o Develop human/computer symbiosis in art: Combine the graphic generation capability of modern machines and the esthetic sensibility of humans. Of course, there has been an enormous amount of research in designing computer aids for artists, as labor saving tools. I'm suggesting that we explicitly aim for a greater merging of competence, that we explicitly recognize the cooperative approach that is possible. Karl Sims [22] has done wonderful work in this direction.
Stable Diffusion.
> o Develop interfaces that allow computer and network access without requiring the human to be tied to one spot, sitting in front of a computer. (This is an aspect of IA that fits so well with known economic advantages that lots of effort is already being spent on it.)
iPhone and Android.
> o Develop more symmetrical decision support systems. A popular research/product area in recent years has been decision support systems. This is a form of IA, but may be too focussed on systems that are oracular. As much as the program giving the user information, there must be the idea of the user giving the program guidance.
Cicero.
> Another symptom of progress toward the Singularity: ideas themselves should spread ever faster, and even the most radical will quickly become commonplace.
Trump.
> o Use local area nets to make human teams that really work (ie, are more effective than their component members). This is generally the area of "groupware", already a very popular commercial pursuit. The change in viewpoint here would be to regard the group activity as a combination organism. In one sense, this suggestion might be regarded as the goal of inventing a "Rules of Order" for such combination operations. For instance, group focus might be more easily maintained than in classical meetings. Expertise of individual human members could be isolated from ego issues such that the contribution of different members is focussed on the team project. And of course shared data bases could be used much more conveniently than in conventional committee operations. (Note that this suggestion is aimed at team operations rather than political meetings. In a political setting, the automation described above would simply enforce the power of the persons making the rules!)
Ingress.
> o Exploit the worldwide Internet as a combination human/machine tool. Of all the items on the list, progress in this is proceeding the fastest and may run us into the Singularity before anything else. The power and influence of even the present-day Internet is vastly underestimated. For instance, I think our contemporary computer systems would break under the weight of their own complexity if it weren't for the edge that the USENET "group mind" gives the system administration and support people!) The very anarchy of the worldwide net development is evidence of its potential. As connectivity and bandwidth and archive size and computer speed all increase, we are seeing something like Lynn Margulis' [14] vision of the biosphere as data processor recapitulated, but at a million times greater speed and with millions of humanly intelligent agents (ourselves).
Twitter.
> o Limb prosthetics is a topic of direct commercial applicability. Nerve to silicon transducers can be made [13]. This is an exciting, near-term step toward direct communcation.
Atom Limbs.
> o Similar direct links into brains may be feasible, if the bit rate is low: given human learning flexibility, the actual brain neuron targets might not have to be precisely selected. Even 100 bits per second would be of great use to stroke victims who would otherwise be confined to menu-driven interfaces.
>> > Within thirty years, we will have the technological means to create superhuman intelligence.
> Blackwell.
I'm fucking sorry but there is no LLM or "AI" platform that is even real intelligence, today, easily demonstrated by the fact that an LLM cannot be used to create a better LLM. Go on, ask ChatGPT to output a novel model that performs better than any other model. Oh, it doesn't work? That's because IT'S NOT INTELLIGENT. And it's DEFINITELY not "superhuman intelligence." Not even close.
Sometimes accurately regurgitating facts is NOT intelligence. God it's so depressing to see commenters on this hell-site listing current-day tech as ANYTHING approaching AGI.
You didn't read him correctly; he's not saying Blackwell is AGI. I believe that he's saying that perhaps Blackwell could be computationally sufficient for AGI if "used correctly."
I don't know where that "computationally sufficient" line is. It'll always be fuzzy (because you could have a very slow, but smart entity). And before we have a working AGI, thinking about how much computation we need always comes down to back of the envelope estimations with radically different assumptions of how much computational work brains do.
But I can't rule out the idea that current architectures have enough processing to do it.
I don't use the A word, because it's one of those words that popular culture has poisoned with fear, anger, and magical thinking. I can at least respect Kurzweil though and he says the human brain has 10 petaflops. Blackwell has 20 petaflops. That would seem to make it capable of superhuman intelligence to me. Especially if we consider that it can focus purely on thinking and doesn't have to regulate a body. Imagine having your own video card that does ChatGPT but 40x smarter.
I think there's a big focus on petaflops and that it may have been a good measure to think about initially, but now we're missing the mark.
If a human brain does its magic with 10 petaflops, and you have 1 petaflop, you should be able to make an equivalent to the human brain that runs at 1/10th of the speed but never sleeps. In other words, once you've reached the same order of magnitude it doesn't matter.
On the other hand, Kurzweil's math really comes down to an argument that the brain is using about 10 petaflops for inference, but it also is changing weights and doing a lot more math and optimization for training (which we don't completely understand). It may (or may not) take considerably more than 10 petaflops to train at the rate humans learn. And remember, humans take years to do anything useful.
Further, 10 petaflops may be enough math, but it doesn't mean you can store enough information or flow enough state between the different parts "of the model."
These are the big questions. If we knew the answers, IMO, we would already have really slow AGI.
Yes I agree there's a lot of interesting problems to solve and things to learn when it comes to modeling intelligence. Vernor Vinge was smart in choosing the wording that we'd have the means to create superhuman intelligence by now, since no one's ever going to agree if we've actually achieved it.
Probably just a question of time constant / zoom on your time axis. When zoomed in up close, an exponential looks a lot like a bunch of piecewise linear components, where big breakthroughs just are a discontinuous changes in slope...
OK. I'm imagining a correlation engine that looks through code as a series of prompts that are used to generate more code from the corpus that is statistically likely to follow.
And now I'm transforming that through the concept of taking a photograph and applying the clone tool via a light airbrush.
Repeat enough times, and you get uncompilable mud.
Saying they definitely won't or they definitely will are equally over-broad and premature.
I currently expect we'll need another architectural breakthrough; but also, back in 2009 I expected no-steering-wheel-included self driving cars no later than 2018, and that the LLM output we actually saw in 2023 would be the final problem to be solved in the path to AGI.
GPT4 does inference at 560 teraflops. Human brain goes 10,000 teraflops. NVIDIA just unveiled their latest Blackwell chip yesterday which goes 20,000 teraflops. If you buy an NVL72 rack of the things, it goes 1,400,000 teraflops. That's what Jensen Huang's GPT runs on I bet.
> GPT4 does inference at 560 teraflops. Human brain goes 10,000 teraflops
AFAICT, both are guesses. The low-end estimate I've seen for human brains are ~ 162 GFLOPS[0] to 10^28 FLOPS[1]; even just the model size for GPT-4 isn't confirmed, merely a combination of human inference of public information with a rumour widely described as a "leak", likewise the compute requirements.
They're not guesses. We know they use A100s and we know how fast an A100 goes. You can cut a brain open and see how many neurons it has and how often they fire. Kurzweil's 10 petaflops for the brain (100e9 neurons * 1000 connections * 200 calculations) is a bit high for me honestly. I don't think connections count as flops. If a neuron only fires 5-50 times a second then that'd put the human brain at .5 to 5 teraflops it seems to me. That would explain why GPT is so much smarter and faster than people. The other estimates like 1e28 are measuring different things.
> They're not guesses. We know they use A100s and we know how fast an A100 goes.
And we don't know how many GPT-4 instances run on any single A100, or if it's the other way around and how many A100s are needed to run a single GPT-4 instance. We also don't know how many tokens/second any given instance produces, so multiple users may be (my guess is they are) queued on any given instance. We have a rough idea how many machines they have, but not how intensively they're being used.
> You can cut a brain open and see how many neurons it has and how often they fire. Kurzweil's 10 petaflops for the brain (100e9 neurons * 1000 connections * 200 calculations) is a bit high for me honestly. I don't think connections count as flops. If a neuron only fires 5-50 times a second then that'd put the human brain at .5 to 5 teraflops it seems to me.
You're double-counting. "If a neuron only fires 5-50 times a second" = maximum synapse firing rate * fraction of cells active at any given moment, and the 200 is what you get from assuming it could go at 1000/second (they can) but only 20% are active at any given moment (a bit on the high side, but not by much).
Total = neurons * synapses/neuron * maximum synapse firing rate * fraction of cells active at any given moment * operations per synapse firing
1e11 * 1e3 * 1e3 Hz * 10% (of your brain in use at any given moment, where the similarly phrased misconception comes from) * 1 floating point operation = 1e16/second = 10 PFLOP
It currently looks like we need more than 1 floating point operation to simulate a synapse firing.
> The other estimates like 1e28 are measuring different things.
Things which may turn out to be important for e.g. Hebbian learning. We don't know what we don't know. Our brains are much more sample-efficient than our ANNs.
Synapses might be akin to transistor count, which is only roughly correlated with FLOPs on modern architectures.
I've also heard in a recent talk that the optic nerve carries about 20 Mbps of visual information. If we imagine a saturated task such as the famous gorilla walking through the people passing around a basketball, then we can arrive at some limits on the conscious brain. This does not count the autonomic, sympathetic, and parasympathetic processes, of course, but those could in theory be fairly low bandwidth.
There is also the matter of the "slow" computation in the brain that happens through neurotransmitter release. It is analog and complex, but with a slow clock speed.
My hunch is that the brain is fairly low FLOPs but highly specialized, closer to an FPGA than a million GPUs running an LLM.
> I don't think connections count as flops. If a neuron only fires 5-50 times a second then that'd put the human brain at .5 to 5 teraflops it seems to me.
That assumes that you can represent all of the useful parts of the decision about whether to fire or not to fire in the equivalent of one floating point operation, which seems to be an optimistic assumption. It also assumes there's no useful information encoded into e.g. phase of firing.
Imagine that there's a little computer inside each neuron that decides when it needs to do work. Those computers are an implementation detail of the flops being provided by neurons, and would not increase the overall flop count, since that'd be counting them twice. For example, how would you measure the speed of a game boy emulator? Would you take into consideration all the instructions the emulator itself needs to run in order to simulate the game boy instructions?
> Imagine that there's a little computer inside each neuron that decides when it needs to do work
Yah, there's -bajillions- of floating point operation equivalents happening in a neuron deciding what to do. They're probably not all functional.
BUT, that's why I said the "useful parts" of the decision:
It may take more than the equivalent of one floating point operation to decide whether to fire. For instance, if you are weighting multiple inputs to the neuron differently to decide whether to fire now, that would require multiple multiplications of those inputs. If you consider whether you have fired recently, that's more work too.
Neurons do all of these things, and more, and these things are known to be functional-- not mere implementation details. A computer cannot make an equivalent choice in one floating point operation.
Of course, this doesn't mean that the brain is optimal-- perhaps you can do far less work. But if we're going to use it as a model to estimate scale, we have to consider what actual equivalent work is.
Yes, but it probably doesn't tell the whole story.
There's basically a few axes you can view this on:
- Number of connections and complexity of connection structure: how much information is encoded about how to do the calculations.
- Mutability of those connections: these things are growing and changing -while doing the math on whether to fire-.
- How much calculation is really needed to do the computation encoded in the connection structure.
Basically, brains are doing a whole lot of math and working on a dense structure of information, but not very precisely because they're made out of meat. There's almost certainly different tradeoffs in how you'd build the system based on the precision, speed, energy, and storage that you have to work with.
That's is based on old assumption of neuron function.
Firstly, Kurzweil underestimates the number connections by order of magnitude.
Secondly, dentritic computation changes things. Individual dentrites and the dendritic tree as a whole can do multiple individual computations. logical operations low-pass filtering, coincidence detection, ... One neuronal activation is potentially thousands of operations per neuron.
Single human neuron can be equivalent of thousands of ANN's.
They might generate improvements, but I’m not sure why people think those improvements would be unbounded. Think of it like improvements to jet engines or internal combustion engines - rapid improvements followed by decades of very tiny improvements. We’ve gone from 32-bit LLM weights down to 16, then 8, then 4 bit weights, and then a lot of messy diminishing returns below that. Moore’s is running on fumes for process improvements, so each new generation of chips that’s twice as fast manages to get there by nearly doubling the silicon area and nearly doubling the power consumption. There’s a lot of active research into pruning models down now, but mostly better models == bigger models, which is also hitting all kinds of practical limits. Really good engineering might get to the same endpoint a little faster than mediocre engineering, but they’ll both probably wind up at the same point eventually. A super smart LLM isn’t going to make sub-atomic transistors, or sub-bit weights, or eliminate power and cooling constraints, or eliminate any of the dozen other things that eventually limit you.
Saying that AI hardware is near a dead end because Moore's law is running out of steam is silly. Even GPUs are very general purpose, we can make a lot of progress in the hardware space via extreme specialization, approximate computing and analog computing.
I'm mostly saying that unless a chip-designing AI model is an actual magical wizard, it's not going to have a lot of advantage over teams of even mediocre human engineers. All of the stuff you're talking about is Moore's Law limited after 1-2 generations of wacky architectural improvements.
Bro, Jensen Huang just unveiled a chip yesterday that goes 20 petaflops. Intel's latest raptorlake cpu goes 800 gigaflops. Can you really explain 25000x progress by the 2x larger die size? I'm sure reactionary America wanted Moore's law to run out of steam but the Taiwanese betrayal made up for all the lost Moore's law progress and then some.
That speedup compared to Nvidia's previous generation came nearly entirely from: 1) a small process technology improvement from TSMC, 2) more silicon area, 3) more power consumption, and 4) moving to FP4 from FP8 (halving the precision). They aren't delivering the 'free lunch' between generations that we had for decades in terms of "the same operations faster and using less power." They're delivering increasingly exotic chips for increasingly crazy amounts of money.
Pro tip: If you want to know who is the king of AI chips, compare FLOPS (or TOPS) per chip area, not FLOPS/chip.
As long as the bottleneck is the fab capacity as wafers per hous, the number of operations per second per chip area determines who will produce more compute with best price. It's a good measure even between different technology nodes and superchips.
Nvidia is leader for a reason.
If manufacturing capacity increases to match the demand in the future, FLOPS or TOPS per Watt may become relevant, but now it's fab capacity.
LLMs are so much more than you are assuming… text, images, code are merely abstractions to represent reality. Accurate prediction requires no less than usefully generalizable models and deep understanding of the actual processes in the world that produced those representations.
I know they can provide creative new solutions to totally novel problems from firsthand experience… instead of assuming what they should be able to do, I experimented to see what they can actually do.
Focusing on the simple mechanics of training and prediction is to miss the forest for the trees. It’s as absurd as saying how can living things have any intelligence? They’re just bags of chemicals oxidizing carbon. True but irrelevant- it misses the deeper fact that solving almost any problem deeply requires understanding and modeling all of the connected problems, and so on, until you’ve pretty much encompassed everything.
Ultimately it doesn’t even matter what problem you’re training for- all predictive systems will converge on general intelligence as you keep improving predictive accuracy.
An LLM is not going to suggest a reasonable improvement to itself, except by sheerest luck.
But then next generation, where the LLM is just the language comprehension and generation model that feeds into something else yet to be invented, I have no guarantees about whether that will be able to improve itself. Depends on what it is.
Yes, eventually one gets a series of software improvements which eventually result in the best possible performance on currently available hardware --- if one can consistently get an LLM to suggest improvements to itself.
Until we get to a point where an AI has the wherewithal to create a fab to make its own chips and then do assembly w/o human intervention (something along the lines of Steve Jobs vision of a computer factory where sand goes in at one end and finished product rolls out the other) it doesn't seem likely to amount to much.
That may happen more easily than you're suggesting. LLMs are masters at generating plausible sounding ideas with no regard to their factual underpinnings. So some of those computational bong hits might come up with dozens of plausible looking suggestions (maybe featuring made up literature references as well).
It would be left to human researchers to investigate them and find out if any work. If they succeed, the LLM will get all the credit for the idea, if they fail, it's them who will have wasted their time.
First known person to present the idea was mathematician and philosopher Nicolas de Condorcet in the late 1700s. Not surprising, because he also laid out most ideals and values of modern liberal democracy as they are now. Amazing philosopher.
He basically invented the idea of ensemble learning (known as boosting in machine learning).
That essay is written by a political scientist. His arguments aren't very persuasive. Even if they were, he doesn't actually cite the person he's writing about, so I have no way to check the primary materials. It's not like this is uncommon either. Everyone who's smart since 1760 has extrapolated the industrial revolution and imagined something similar to the singularity. Malthus would be a bad example and Nietzsche would be a good example. But John von Neumann was a million times smarter than all of them, he named it the singularity, and that's why he gets the credit.
Check out "Sketch for a Historical Picture of the Progress of the Human Mind", by Marquis de Condorcet, 1794. The last chapter, The Tenth epoch/The future progress of the human mind. There he lays out unlimited advance of knowledge, unlimited lifespan for humans, improvement of physical faculties, and then finally improvement of the intellectual and moral faculties.
And this was not some obscure author, but leading figure in the French Enlightenment. Thomas Malthus wrote his essay on population as counterargument.
Butler also expanded this idea in his 1872 novel Erewhon, where he described a seemingly primitive island civilization that turned out to once had greater technology than the West, including mechanical AI, but they abandoned it when they began to fear its consequences. A lot of 20th century SF tropes in the Victorian period.