Hacker Newsnew | past | comments | ask | show | jobs | submit | ccoreilly's commentslogin

There‘s many approaches being discussed and it will depend on the size of the task. You could just review a plan and assume the output is correct but you need at least behavioural tests to understand what was built fulfilled the requirements. You can split the plan further and further until the changes are small enough to be reviewable. Where I don’t see the benefit is in asking an agent to generate test as it tends to generate many useless unit tests that make reviewing more cumbersome. Writing the tests yourself (or defining them and letting an agent write the code) and not letting implementation agents change the tests is also something worth trying.

The truth is we’re all still experimenting and shovels of all sizes and forms are being built.


That matches my experience too - tests and plans are still the backbone.

What I keep running into is the step before reading tests or code: when a change is large or mechanical, I’m mostly trying to answer "did behavior or API actually change, or is this mostly reshaping?" so I know how deep to go etc.

Agree we’re all still experimenting here.


Would you say you’re able to draw a diagram of the application architecture out of your head or do you treat it as a black box? Do you need an AI to debug issues or not? In my experience with spec driven development, even if reviewing every single PR, it is hard to develop a mental model of the codebase structure unless you invest on it. It might be fine to treat it as a black box, not arguing the opposite but will all software be a black box in the future?


For a completely new project it is a high risk. While the AI is fantastic at brainstorming and writing detailed architecture, it is difficult to get the "big picture" and even more difficult to verify that it is being done correctly or which things can be improved/reused, because on this situation you don't look into the code.

I don't believe people will spend time looking at the code beyond the small blurbs they can read from the command line while talking with the AI, so I agree with you that it ends being treated as a blackbox.

Did an experiment for a server implementation in Java (my strong language), gave the usual instructions and built up the server. When I went to look into the code, it was a far smaller and more concise code base than what I would write myself. AI is treating programming language on the level of a compiler for javascript, it will make the instructions super efficient and uses techniques that on my 30 years experience I'm not able to pair-review because we tend to have our own patterns of programming while these tools use everything, no matter how exotic they will use it to their advantage.

After that experience I don't look at generated source code any longer. For me it is becoming the same as trying to look at compiled binary data.


ElevenLabs and Gemelo.AI are services that both support text input streaming for exactly this use-case. I am not aware of any open-source Incremental TTS (this is the term used in research afaik) model but you can already achieve somthing similar by buffering tokens and sending them to the TTS model on punctuation characters.


ElevenLabs only has streaming output available. I've had a look at both recently and ElevenLabs doesn't have streaming input listed as a feature. Would be cool if it had it, though. You could probably approximate this on a sentence level, but you would need to do some normalisation to make the speech sound even.


Whisper is an STT model, you can use whisperx to transcribe audios locally via the CLI or whisper-turbo.com that runs in the browser.

For TTS coqui has the best UX and models for a lot of languages although quality is not on par with commercial TTS providers.


I've just been looking for SOTA TTS. I found coqui.ai and elevenlabs.io (and a bunch of others). They're good (and better than older TTS), but I am not fooled by any of them. Do you have recommendations?


Gemelo was the other one listed. I doubt you'll get anything sounding more natural than ElevenLabs with the following settings:

* Model: Multilingual v2

* All options and sliders to boost similarity: set to max/yes

* Stability slider: experimentally set to a value where the model sounds natural enough without destabilising sound output


"Gat" is catalan for "cat"


What events make you conclude industry in EU will grind? Is it gas scarcity or is there something else?


Yeah, energy prices overall and specifically lack of methane ("natural gas") which is needed by many industries (i.e. German chemical industry).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: