Hacker Newsnew | past | comments | ask | show | jobs | submit | imiric's commentslogin

This is the first time I hear sentiments against "AI" hype be referred to as hype itself. Yes, there are people ignoring this technology altogether, possibly to their own detriment, but at the stage where we are now it is perfectly reasonable to want to avoid the actual hype.

What I would really urge people to avoid doing is listening to what any tech influencer has to say, including antirez. I really don't care what famous developers think about this technology, and it doesn't influence my own experience of it. People should try out whatever they're comfortable with, and make up their own opinions, instead of listening what anyone else has to say about it. This applies to anything, of course, but it's particularly important for the technology bubble we're currently in.

It's unfortunate that some voices are louder than others in this parasocial web we've built. Those with larger loudspeakers should be conscious of this fact, and moderate their output responsibly. It starts by not telling people what to do.


The author has a point, but I object to this mischaracterization:

> XHTML, being based on XML as opposed to SGML, is notorious for being author-unfriendly due to its strictness

This strictness is a moot point. Most editors will autocomplete the closing tag for you, so it's hardly "unfriendly". Besides, if anything, closing tags are reader-friendly (which includes the author), since they make it clear when an element ends. In languages that don't have this, authors often add a comment like `// end of ...` to clarify this. The article author even acknowledges this in some of their examples ("explicit end tags added for clarity").

But there were other potential benefits of XHTML that never came to pass. A strict markup language would make documents easier to parse, and we wouldn't have ended up with the insanity of parsing modern HTML, which became standardized. This, in turn, would have made it easier to expand the language, and integrate different processors into the pipeline. Technologies like XSLT would have been adopted and improved, and perhaps we would have already had proper HTML modules, instead of the half-baked Web Components we have today. All because browser authors were reluctant to force website authors to fix their broken markup. It was a terrible tradeoff, if you ask me.

So, sure, feel free to not close HTML tags if you prefer not to, and to "educate" everyone that they shouldn't either. Just keep it away from any codebases I maintain, thank you very much.

To be fair, I don't mind not closing empty elements, such as `<img>` or `<br>`. But not closing `<p>` or `<div>` is hostile behavior, for no actual gain.


Img and br are not allowed to be closed.

Worse, due to the aforementioned permissive error handling in HTML parsers, a closing </br> tag will end up inserting a second line break

You close them in the same tag:

    <br/>

This syntax is ignored in HTML. The / is thrown away and has no effect.

This non-closing talisman means that <div/> or <script/> are not closed, and will mess up nesting of elements.


In HTML, yes. But I thought the OP was talking about XHTML?

Wrong.[1]

> if the element is one of the void elements, or if the element is a foreign element, then there may be a single U+002F SOLIDUS character (/)

If you're going to be pedantic, at least be correct about it.

[1]: https://html.spec.whatwg.org/multipage/syntax.html#start-tag...


You left out the rest of the spec:

> On void elements, it does not mark the start tag as self-closing but instead is unnecessary and has no effect of any kind. For such void elements, it should be used only with caution — especially since, if directly preceded by an unquoted attribute value, it becomes part of the attribute value rather than being discarded by the parser.

(The void elements are listed here: https://developer.mozilla.org/en-US/docs/Glossary/Void_eleme... )


All of that is a far cry from elements not being allowed to be closed. Please point to where the spec mentions that. MDN is not the spec.

There can't however be a separate closing tag, which is a reasonable interpretation of the post you are replying to

For ebook production, you need to use xhtml, the epub standard is defined that way. And it is indeed useful to be able to treat them as xml files and use xslt and xquery, etc. with them.

It's not about building a wall. It's about ensuring that the terms of the license chosen by the author are respected.

This is why I think permissive licenses are a mistake for most projects. Unlike copyleft licenses, they allow users to take away the freedoms they enjoy from users of derivative works. It's no surprise that dishonest actors take advantage of this for their own gain. This is the paradox of tolerance.

"AI" companies take this a step further, and completely disregard the original license. Whereas copyleft would somewhat be a deterrent for potential abusers, it's not for this new wave of companies. They can hide behind the already loosely defined legal frameworks, and claim that the data is derivative enough, or impossible to trace back, or what have you. It's dishonest at best, and corrupts the last remnants of public good will we still enjoy on the internet.

We need new legal frameworks for this technology, but since that is a glacial process, companies can get rich in the meantime. Especially shovel salespeople.


It can. The problem is the practice of using open source as a marketing funnel.

There are many projects that love to brag about being open source (it's "free"!), only to lock useful features behind a paywall, or do the inevitable license rug pull after other companies start profiting from the freedoms they've provided them. This is the same tactic used by drug dealers to get you hooked on the product.

Instead, the primary incentive to release a project as open source should be the desire to contribute to the corpus of human knowledge. That doesn't mean that you have to abandon any business model around the project, but that shouldn't be your main goal. There are many successful companies built around OSS that balance this correctly.

"AI" tools and services corrupt this intention. They leech off the public good will, and concentrate the data under the control of a single company. This forces well-intentioned actors to abandon open source, since instead of contributing to human knowledge, their work contributes to "AI" companies. I'm frankly not upset when this affects projects who were abusing open source to begin with.

So GP has a point. Forcing "AI" tools, and even more crucially, the data they collect and use, to be free/libre, would restore the incentive for people to want to provide a public good.

The narrative that "AI" will bring world prosperity is a fantasy promoted by the people who will profit the most. The opposite is true: it will concentrate wealth and power in the hands of a few even more than it is today. It will corrupt the last vestiges of digital freedoms we still enjoy today.

I hope we can pass regulation that prevents this from happening, but I'm not holding my breath. These people are already in power, and governments are increasingly in symbiotic relationships with them.


> The narrative that "AI" will bring world prosperity is a fantasy promoted by the people who will profit the most. The opposite is true: it will concentrate wealth and power in the hands of a few even more than it is today. It will corrupt the last vestiges of digital freedoms we still enjoy today.

This is on point.


Changing modes is a single key stroke away. That's hardly a reason to be slow.

Readline settings depend on what you're already used to. If you're comfortable with vi key bindings, then being in normal mode, navigating with `w`/`b`, deleting a word with `dw`, deleting up to a quote with `dt"`, etc., are all done with muscle memory, and should be much faster than learning the equivalent Emacs bindings, pressing unintuitive key chords, or opening the command in an editor. I don't like opening an editor since it's an interruption, and it hides the output of the previous command.

I wish I could have the full power of Vim in my shells. For example, I miss the delete between characters binding. `di"` or `di'` are great for modifying argument values.


In an Org document that contains Org examples (e.g. if this article had been written in Org), even Emacs gets confused about rendering it. So you might find that sections in example text are evaluated as being part of top-level sections and collapsing is wonky, etc.

I run into this a lot with gptel. I use a main Org file for all my daily notes, and since gptel streams LLM output as Org (which is good), it conflicts with my main file. I have a post-processing function that converts headings into `#` to avoid this, but it's a hack I'd rather not do.


Hmm I'm still not seeing the issue. Why aren't the examples just lists under say an examples header? Or the LLM output? Maybe gptel is expecting output to be in a fresh file or at the top level? It should be a trivial fix to intent a level before inserting.

Karl Voit writes their articles in Org though, it even says so on the footer.

Ah, yes. You mean technology like "AI" that creates a positive impact on people?

Yes.

Hello, my name is George C. Parker and I have a bridge to sell you.

Before I buy, can you confirm the bridge is GDPR-compliant, AI-Act-ready, has a digital product passport, and passed its environmental impact assessment? Otherwise the local compliance officer will fine us before it even collapses.

>Before I buy, can you confirm the bridge is GDPR-compliant, AI-Act-ready, has a digital product passport, and passed its environmental impact assessment?

Great comment! We've added double-plus-good to your Palantir-Trumport account and 2% off your next Amazon purchase!


> The absurd value of LLMs is that they can somehow manage to extract the signal from that noise.

Say what? LLMs absolutely cannot do that.

They rely on armies of humans to tirelessly filter, clean, and label data that is used for training. The entire "AI" industry relies on companies and outsourced sweatshops to do this work. It is humans that extract the signal from the noise. The machine simply outputs the most probable chain of tokens.

So hallucinations definitely matter, especially at scale. It makes the job of humans much, much harder, which in turn will inevitably produce lower quality models. Garbage in, garbage out.


I think you're confused about the training steps for LLMs. What the industry generally calls pre-training is when the LLM learns the job of predicting the most probable next token given a huge volume of data. A large percentage of that data has not been cleaned at all because it just comes directly from web crawling. It's not uncommon to open up a web crawl dataset that is used for pretraining and immediately read something sexual, nonsensical, or both really.

LLMs really do find the signal in this noise because even just pre-training alone reveals incredible language capabilities but that's about it. They don't have any of the other skills you would expect and they most certainly aren't "safe". You can't even really talk to a pre-trained model because they haven't been refined into the chat-like interface that we're so used to.

The hard part after that for AI labs was getting together high quality data that transforms them from raw language machines into conversational agents. That's post-training and it's where the armies of humans have worked tirelessly to generate the refinement for the model. That's still valuable signal, sure, but it's not the signal that's found in the pre-training noise. The model doesn't learn much, if any, of its knowledge during post-training. It just learns how to wield it.

To be fair, some of the pre-training data is more curated. Like collections of math or code.


No, I think you're confused, and doubling down on it, for some reason.

Base models (after pre-training) have zero practical value. They're absolutely useless when it comes to separating signal from noise, using any practical definition of those terms. As you said yourself, their output can be nonsensical, based solely on token probability in the original raw data.

The actual value of LLMs comes after the post-training phase, where the signal is injected into the model from relatively smaller amounts of high quality data. This is the data processed by armies of humans, without which LLMs would be completely worthless.

So whatever capability you think LLMs have to separate signal from noise is exclusively the product of humans. When that job becomes harder, the quality of LLMs will go down. Unless we figure out a way to automate data cleaning/labeling, which seems like an unsolvable problem, or for models to filter it during inference, which is what you're wrongly implying they already do. LLMs could assist humans with cleaning/labeling tasks, but that in itself has many challenges, and is not a solution to the model collapse problem.


I'm not saying that pre-trained only models are useless. They've clearly extracted a ton of knowledge from the corpus. The interface may seem strange because it's not what we're accustom to but they still prove valuable. Code completion models, for example, are just LLMs that have pre-trained exclusively on code. They work very well despite their simplicity because... the model has extracted the signal from the noise.

You have a strange definition of "signal" and "noise".

Code completion models can be useful because they output the most probable chain of tokens given a specific input, same as any LLM. There is no "signal" there besides probability. Besides, even those models are fine-tuned to follow best practices, specific language idioms, etc.

When we talk about "signal" in the context of general knowledge we refer to information that is meaningful and accurate for a specific context and input. So that if the user asks proof of the Earth being flat, the model doesn't give them false information from a random blog. Of course, LLMs still fall short at this, but post-training is crucial to boost the signal away from the noise. There's nothing inherent in the way LLMs work to make them do this. It is entirely based on the quality of the training data.


That's flat out wrong. They have produced several web series, and their videos feature a lot of visual effects. Just because their career focuses on producing web content doesn't mean they're any less talented than someone working on feature films.

I can't comment on whether they're "well respected" in the VFX industry, but you're being misleadingly hostile.


I can't comment on whether they're "well respected" in the VFX industry,

They aren't, because they aren't in the vfx industry.

you're being misleadingly hostile.

No, this is honesty. People who only know vfx through fake youtubers want to defend them, but it's the blind leading the blind for clicks and views.

Just because their career focuses on producing web content doesn't mean they're any less talented than someone working on feature films.

They built their channel criticizing people who work on feature films. Their work is good according to them and acolytes who buy into it, but people who think they represent vfx don't realize this and suddenly it isn't fair to point out the truth.


> No, this is honesty.

No, it's bullshit.

From Wikipedia[1]:

> Corridor Digital LLC is an American independent production studio based in Los Angeles, known for creating pop-culture-related viral online short-form videos since 2010, as well as producing and directing the Battlefield-inspired web series Rush and the YouTube Premium series Lifeline. It has also created television commercials for various companies, including Machine Zone and Google.

You clearly have some bone to pick with them, but they're accomplished VFX artists. Whether they're good or not is a separate matter, but they're not "fake youtubers" or misleading anyone, unlike yourself.

[1]: https://en.wikipedia.org/wiki/Corridor_Digital


Making web videos doesn't mean that they are able to do the visual effects that they criticize.

They don't make "vfx artists react" videos to low grade web series.

They call themselves vfx artists when they have never done that. They make web videos, they criticize professional work and you are completely ignoring that.


You truly do not need to go to bat for these grifters lol. What are you doing.

That is a losing battle.

Even if you manage to make bot usage more expensive, which is all a captcha can do, the content posted by humans in discussions and shared links is increasingly generated by machines.

It's ironic having a community of people object to the same technology they helped build. Enjoy the show, and learn to live with it. It's going to get much worse before it gets any better, if at all.


> "they helped build"

The overwhelming majority of developers have never worked anywhere close to LLM tech. AI is a very small field requiring specialized expertise.


I agree, having never worked on AI or anything privacy invasive for that matter. HN is not a monolith.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: