There‘s many approaches being discussed and it will depend on the size of the task. You could just review a plan and assume the output is correct but you need at least behavioural tests to understand what was built fulfilled the requirements. You can split the plan further and further until the changes are small enough to be reviewable. Where I don’t see the benefit is in asking an agent to generate test as it tends to generate many useless unit tests that make reviewing more cumbersome. Writing the tests yourself (or defining them and letting an agent write the code) and not letting implementation agents change the tests is also something worth trying.
The truth is we’re all still experimenting and shovels of all sizes and forms are being built.
That matches my experience too - tests and plans are still the backbone.
What I keep running into is the step before reading tests or code: when a change is large or mechanical, I’m mostly trying to answer "did behavior or API actually change, or is this mostly reshaping?" so I know how deep to go etc.
Would you say you’re able to draw a diagram of the application architecture out of your head or do you treat it as a black box? Do you need an AI to debug issues or not? In my experience with spec driven development, even if reviewing every single PR, it is hard to develop a mental model of the codebase structure unless you invest on it. It might be fine to treat it as a black box, not arguing the opposite but will all software be a black box in the future?
For a completely new project it is a high risk. While the AI is fantastic at brainstorming and writing detailed architecture, it is difficult to get the "big picture" and even more difficult to verify that it is being done correctly or which things can be improved/reused, because on this situation you don't look into the code.
I don't believe people will spend time looking at the code beyond the small blurbs they can read from the command line while talking with the AI, so I agree with you that it ends being treated as a blackbox.
Did an experiment for a server implementation in Java (my strong language), gave the usual instructions and built up the server. When I went to look into the code, it was a far smaller and more concise code base than what I would write myself. AI is treating programming language on the level of a compiler for javascript, it will make the instructions super efficient and uses techniques that on my 30 years experience I'm not able to pair-review because we tend to have our own patterns of programming while these tools use everything, no matter how exotic they will use it to their advantage.
After that experience I don't look at generated source code any longer. For me it is becoming the same as trying to look at compiled binary data.
ElevenLabs and Gemelo.AI are services that both support text input streaming for exactly this use-case. I am not aware of any open-source Incremental TTS (this is the term used in research afaik) model but you can already achieve somthing similar by buffering tokens and sending them to the TTS model on punctuation characters.
ElevenLabs only has streaming output available. I've had a look at both recently and ElevenLabs doesn't have streaming input listed as a feature. Would be cool if it had it, though. You could probably approximate this on a sentence level, but you would need to do some normalisation to make the speech sound even.
I've just been looking for SOTA TTS. I found coqui.ai and elevenlabs.io (and a bunch of others). They're good (and better than older TTS), but I am not fooled by any of them. Do you have recommendations?
The truth is we’re all still experimenting and shovels of all sizes and forms are being built.