Run out of training data? They’re going to put these things in humanoids (they are weirdly cheap now) and record high resolution video and other sensor data of real world tasks and train huge multimodal Vision Language Action models etc.
The world is more than just text. We can never run out of pixels if we point cameras at the real world and move them around.
I work in robotics and I don’t think people talking about this stuff appreciate that text and internet pictures is just the beginning. Robotics is poised to generate and consume TONS of data from the real world, not just the internet.
While we may run out of human written text of value, we won't run out of symbolic sequences of tokens: we can trivially start with axioms and do random forward chaining (or random backward chaining from postulates), and then train models on 2-step, 4-step, 8-step, ... correct forward or backward chains.
Nobody talks about it, but ultimately the strongest driver for terrascale compute will be for mathematical breakthroughs in crypography (not bruteforcing keys, but bruteforcing mathematical reasoning).
Yeah, another source of "unlimited data" is genetics. The human reference genome is about 6.5 GB, but these days, they're moving to pangenomes, wanting to map out not just the genome of one reference individual, but all the genetic variation in a clade. Depending on how ambitious they are about that "all", they can be humongous. And unlike say video data, this is arguably a language. We're completely swimming in unmapped, uninterpreted language data.
There’s plenty on eBay? But at the end of your comment you say “a rate cheaper than new” so maybe you mean you’d love to buy a discounted one. But they do seem to be available used.
Right but it is widely acknowledged that despite acceptance (we lack other options) this process eventually degrades the quality of the tool as successive waves of product managers decide “just a little bit more advertisement”.
I wonder how much of their trouble comes from other failures in their plan (avoiding the use of pre-made maps and single city taxi services in favor of a system intended to drive in unseen cities) vs how much comes from vision. There are concerning failure modes from vision alone but it’s not clear that’s actually the reason for the failure. Waymo built an expensive safe system that is a taxi first and can only operate on certain areas, and then they ran reps on those areas for a decade.
Tesla specifically decided not to use the taxi-first approach, which does make sense since they want to sell cars. One of the first major failures of their approach was to start selling pre-orders for self driving. If they hadn’t, they would not have needed to promise it would work everywhere, and could have pivoted to single city taxi services like the other companies, or added lidar.
But certainly it all came from Musk’s hubris, first to set out to solve the self driving in all conditions using only vision, and then to start selling it before it was done, making it difficult to change paths once so much had been promised.
My favourite part of chatgpt voice is I have something in my settings that says something along the lines of "be succinct. Get straight to the point," or whatever.
So every single time I (forget and) voice prompt chatGPT it starts by saying "OK, I'll get straight to the point and answer your question without fluff" or something similar. ie it wastes my time even more than it would normally.
I agree on the voice mode... its really unusable now.
I feel like its been trained only tiktok content and youtube cooking or makeup podcasts in the sense that it tries to be super casual and easy-going to the point where its completely unable to give you actual information.
built something to fix exactly this. skips the realtime chattiness entirely - you speak, it waits until you're done, responds via TTS with actual text-quality answers (no dumbing down). also has claude/gemini if you want different models.
still early but happy to share: tla[at]lexander[dot]com if interested (saw your email in bio)
built something to fix this. skips the realtime entirely - you speak, it waits, responds with text-quality answers via TTS. no forced casualness, no dumbing down. also has claude/gemini.
I’ve come to appreciate that there is a new totally valid (imo) kind of software development one can do now where you simply do not read the code at all. I do this when prototyping things with vibe coding for example for personal use, and I’ve posted at least one such project on GitHub for others who may want to run the code.
Of course as a developer you still have to take responsibility for your code, minimally including a disclaimer, and not dumping this code in to someone else’s code base. For example at work when submitting MRs I do generally read the code and keep MRs concise.
I’ve found that there is a certain kind of coder that hears of someone not reading the code and this sounds like some kind of moral violation to them. It’s not. It’s some weird new kind of coding where I’m more creating a detailed description of the functionality I want and incrementally refining it and iterating on it by describing in text how I want it to change. For example I use it to write GUI programs for Ubuntu using GTK and python. I’m not familiar with python-gtk library syntax or GTK GUI methods so there’s not really much of a point in reading the code - I ask the machine to write that precisely because I’m unfamiliar with it. When I need to verify things I have to come up with ways for the machine to test the code on its own.
Point is I think it’s honestly one new legitimate way of using these tools, with a lot of caveats around how such generated code can be responsibly used. If someone vibe coded something and didn’t read it and I’m worried it contains something dangerous, I can ask Claude to analyze it and then run it in a docker container. I treat the code the same way the author does - as a slightly unknown pile of functions which seem to perform a function but may need further verification.
I’m not sure what this means for the software world. On the face of it it seems like it’s probably some kind of problem, but I think at the same time we will find durable use cases for this new mode of interacting with code. Much the same as when compilers abstracted away the assembly code.
> I’ve come to appreciate that there is a new totally valid (imo) kind of software development one can do now where you simply do not read the code at all
No. If nobody actually reads the code, nobody knows what the app does.
> If someone vibe coded something and didn’t read it and I’m worried it contains something dangerous, I can ask Claude to analyze it and then run it in a docker container
And asking an LLM to "analyze it" is worthless. It will miss things here and make up things there. Running it in Docker does not mean it can't mess you up.
They have been serving enterprise markets for a long time. Back in 2020-2021 when there was a chip shortage, Raspberry Pi shorted their consumer availability to make sure enterprise customers could still get compute modules. The fusible bits on the RP2350 are very much an enterprise feature.
The world is more than just text. We can never run out of pixels if we point cameras at the real world and move them around.
I work in robotics and I don’t think people talking about this stuff appreciate that text and internet pictures is just the beginning. Robotics is poised to generate and consume TONS of data from the real world, not just the internet.
reply