The manufacturer obviously, but they can sell the car in the first place because this defect risk is quantifiable for their liability insurance provider, who can evaluate how risky said car company is in terms of their manufacturing and how likely it is they'll need to pay out a claim, etc.
For self-driving, that evaluation is almost impossible. Sure it can look good statistically, but for things like brake lines, brake pad material, brake boosters, etc, they are governed by the laws of physics which are more understandable than any self-driving algorithm.
I think with Waymo we're probably at the point where an insurer could have decent stab at what their liability would be if asked to cover AI-related accidents. In fact, given that these cars are on the road and have reportedly been in accidents, I would imagine this is past being a hypothetical concern and now well into the territory of 'solved problem'.
The way I always remember leftmost and rightmost binary search (the C++ equivalents-ish of lower_bound and upper_bound) is to always have a "prior best" and then move the bounds according to the algo
while (l < r)
{
//find the midpoint
auto mp = l + (l-r)/2;
if (nums[mp] == target)
{
prior = target;
#ifdef upper_bound
l = target + 1; // move the left bound up, maybe there's more up there we can look for!
#else
//lower bound, we found the highest known instance of target, but let's look in the exclusive left half a bit more
r = target - 1;
#endif
}
excuse the terrible formatting, it's been a long day grinding leetcode after getting laid off...
Godspeed, fellow LeetCoder. I'm not currently grinding but I still have my handful of practice problems in active Anki rotation.
I have my rightmost code at part III of the miniseries, [1]. It loks quite similar, but I save the -1 for the very end return.
def rightmost_bsearch(L, T):
l, r = 0, len(L)
while l < r:
# Loop invariants. Always true! Comment out in production!
for somewhat_smol in L[0:l]:
assert somewhat_smol <= T # Note: Weak inequality now.
for lorg in L[r:len(L)]:
assert lorg > T
mid = (l + r) // 2
if L[mid] > T:
r = mid
else:
l = mid + 1
return r - 1 # return the first element AFTER L[r:len(L)].
(It should technically be BEFORE, I guess. If it helps all rightmost bsearches are also leftmost bsearches on the reversed array, so AFTER is secretly not wrong)
If this sounds like you, I highly recommend reading "The Problem of the Puer Aeternus".
You can definitely skip a lot of the tedious bits where the author essential copy-pastes other books for analysis, but this is a very common pattern where people tend to hold themselves back because doing the unambitious, rather pedestrian next step forward requires one to face these preconceived notions about oneself, e.g. "I should've done this long ago", etc.
I understand the sentiment, but disagree with the solution. PKMs can be overwhelming if someone nerdy enough to use one ends up using it ineffectively.
The way I do it that I find works well is to have the following:
1. each day, have a journal page for a given day. Content only happens in the journal pages
2. have a series of topics that you tag. This system is up to you, but I usually find something with a hierarchy that is <=3 levels deep is best, e.g. I have "Job Search/2025/Company"
3. for each of the relevant tag pages, have those have some sort of "query" that will pull in all relevant tasks from all the journal pages, sort them by priority / state / deadline so you can see this all in one place (e.g. "What's the next step I have to do for my Nvidia application?" -> easy to answer with this system). Depending on your PKM, the hierarchy enables you to easily answer that question at a higher level, e.g. "What's the next steps I have to do for ALL of my applications?".
In each journal page, you can also write down a "task backlog" so minor tasks that you remember don't take up headspace while you intend to work on other major tasks (e.g. write down "get back to Joel about the Nvidia referral").
Regarding a point other folks have made: treat the journal and these tags as more of a "stream" of things you're doing in your life, instead of a collection of every-expanding obligations or a mausoleum of unexplored ambition.
I built this in Logseq, which seems to be the only one that has an advanced-enough query language to do this in that is possible to do local-only (no mandatory cloud data) in text files. If anyone knows how to build such a system in a different application, I'd be happy to learn! Logseq has been stale for a year or 2 as the authors are working on a much needed near-total rewrite which I'm not sure is ever going to arrive at this point.
It's basically like having N of the most prolific LoC producing colleagues who don't have a great mental model of how the language works that you have to carefully parse all of their PRs.
Honestly, I've seen too many fairly glaring mistakes in all models I've tried that signal that they can't even get the easy stuff right consistently. In the language I use most (C++), if they can't do that, how can I trust them to get all the very subtle things right? (e.g. very often they produce code that holds some form of dangling references, and when I say "hey don't do that", they go back to something very inefficient like copying things all over the place).
I am very grateful they can churn out a comprehensive test suite in gtest though and write other scripts to test / do a release and such. The relief in tedium there is welcome for sure!
At least for C++, I try to use copilot only for generating testing and writing ancillary scripts. tbh it's only through hard-won lessons and epic misunderstandings and screw-ups that I've built a mental model that I can use to check and verify what it's attempting to do.
As much as I am definitely more productive when it comes to some dumb "JSON plumbing" feature of just adding a field to some protobuf, shuffling around some data, etc, I still can't quite trust it to not make a very subtle mistake or have it generate code that is in the same style of the current codebase (even using the system prompt to tell it as such). I've had it make such obvious mistakes that it doubles down on (either pushing back or not realizing in the first place) before I practically scream at it in the chat and then it says "oopsie haha my bad", e.g.
```c++
class Foo
{
int x_{};
public:
bool operator==(Foo const& other) const noexcept
{
return x_ == x_; // <- what about other.x_?
}
};
```
I just don't know at this point how to get it (Gemini or Claude or any of the GPT) to actually not drop the same subtle mistakes that are very easy to miss in the prolific amount of code it tends to write.
That said, saying "cover this new feature with a comprehensive test suite" saves me from having to go through the verbose gtest setup, which I'm thoroughly grateful for.
Just this morning, I had Claude come up with a C++ solution that would have undefined behavior that even a mid-level C++ dev could have easily caught (assuming iterator stability in a vector that was being modified) just by reading the code.
These AI solutions are great, but I have yet to see any solution that makes me fear for my career. It just seems pretty clear that no LLM actually has a "mental model" of how things work that can avoid the obvious pitfalls amongst the reams of buggy C++ code.
This is exactly right. LLMs do not build appropriate world models. And no...python and JS have similar failure cases.
Still, sometimes it can solve a problem like magic. But since it does not have a world model it is very unreliable, and you need to be able to fall back to real intelligence (i.e., yourself).
I agree, but i think the thing we often miss in these discussions is how much LLMs have potential to be productivity multipliers.
Yea, they still need to improve a bit - but i suspect there will be a point at which individual devs could be getting 1.5x more work done in aggregate. So if everyone is doing that much more work, it has potential to "take the job" of someone else.
Yea, software is being needed more and more and more, so perhaps it'll just make us that much more dependent on devs and software. But i do think it's important to remember that productivity always has potential to replace devs, and LLMs imo have huge potential in productivity.
Oh I agree it can be a multiplier for sure. I think it's not "AI will take your job" but rather "someone who uses AI well will take your job if you don't learn it".
At least for C++, I've found it does very mediocre at suggesting project code (because it has the tendency to drop in subtle bugs all over the place, you basically have to carefully review it instead of just writing it yourself), but asking things in copilot like "Is there any UB in this file?" (not that it will be perfect, but sometimes it'll point something out) or especially writing tests, I absolutely love it.
Yea i'm a big fan of using it in Rust for that same reason. I watch it work through compile errors constantly, i can't imagine what it would be like in JS or Python
Sonnet or Opus?
Well, I guess they both still can do that. But I'm just keeping on asking it to review all its code. To make sure it works. Eventually, it'll catch its errors.
Now this isn't a viable way of working if you're paying for this token-by-token, but with the Claude Code $200 plan ... this thing can work for the entire day, and you will get a benefit from it. But you will have to hold its hand quite a bit.
a difference emerges when an agent can run code and examine the results. Most platforms are very cautious about this extension. Recent MCP does define toolsets and can enable these feedback loops in a way that can be adopted by markets and software ecosystems.
(not trolling)
Would that undefined behavior have occurred in idiomatic rust?
Will the ability to use AI to write such a solution correctly be enough motivation to push C++ shops to adopt rust? (Or perhaps a new language that caters to the blindspots of AI somehow)
There will absolutely be a tipping point where the potential benefits outweigh the costs of such a migration.
This is where one can notice that LLM are, after all, just stochastic parrots. If we don't have a reliable way to systematically test their outputs, I don't see many jobs being replaced by AI either.
this is flatly false for two reasons -- one is that all LLMs are not equal. The models and capacities are quite different, by design. Secondly a large number of standardized LLM testing, tests for sequence of logic or other "reasoning" capacity. Stating the fallacy of stochastic parrots is basically proof of not looking at the battery of standardized tests that are common in LLM development.
Even if not all LLMs are equal, almost all of them are based on the same base model: transformers. So the general idea is always the same: predict the next token. It becomes more obvious when you try to use LLMs to solve things that you can't find in internet (even if they're simple).
And the testing does not always work. You can be sure that only 80% of the time it will be really really correct, and that forces you to check everything. Of course, using LLMs makes you faster for some tasks, and the fact that they are able to do so much is super impressive, but that's it.
> undefined behavior that even a mid-level C++ dev could have easily caught (assuming iterator stability in a vector that was being modified)
This is not an AI thing, plenty of "mid-level" C++ developers could have made that same mistake. New code should not be written in C++.
(I do wonder how Claude AI does when coding Rust, where at least you can be pretty sure that your code will work once it compiles successfully. Or Safe C++, if that ever becomes a thing.)
It does alright with Rust, but you can't assume it works as intended if it compiles successfully. The issue with current AI when solving complex or large scale coding problems is usually not syntax, it's logical issues and poor abstraction. Rust is great, but the borrow checker doesn't protect you from that.
I'm able to use AI for Rust code a lot more now than 6 months ago, but it's still common to have it spit out something decent looking, but not quite there. Sometimes re-prompting fixes all the issues, but it's pretty frustrating when it doesn't.
I haven’t tried with the most recent Claude models, but for the last iteration, Gemini was far better at Rust and what I still use to write anything in it. As an experiment, I even fed it a whole ebook on Rust design patterns and a small script (500 lines) and it was able to refactor to use the correct ones, with some minor back and forth to fix build errors!
When I use the "think" mode it retains context for longer. I tested with 5k lines of c compiler code and I could 6 prompts in before it started forgetting or generalizing
I'll say that grok is really excellent at helping my understand the codebase, but some miss-named functions or variables will trip it up..
not from a tech field at all but would it do the context window any good to use "think" mode but discard them once the llm gives the final answer/reply?
is that even possible to disregard genrated token's selectively?
it also doesn't help that many of these companies tend to either limit the context of the chat to the 10 most recent messages (5 back and forths), or rewrite the history summarized in a few sentences. Both ways lose a ton of information, but you can avoid that behaviour by going through the APIs. Especially Azure OpenAI et... on the web is useless, but it's quite capable through custom APs
I think Gemini is just the only one that by default keeps the entire history verbatim.
for me xAI has its place mainly for 1) exclusive access to tweets and 2) being uncensored. and it's decent enough (even if it's not the best) in terms other capabilities
With the recent article on how it was easily manipulated, I wouldn't be so confident it is uncensored, just that its bias is leaning into its owner's beliefs; which isn't great.
Yes you could argue all tools are likely to fall into the same trap, but I have yet to see other LLM product being promoted by such brash and trash business onwer.
I use Grok for similar tasks and usually prefer Grok's explanations. Easier to understand.
For some problems where I've asked Grok to use formal logical reasoning I have seen Grok outperform both Gemini 2.5 Pro and ChatGPT-o3. It is well trained on logic.
I've seen Grok generate more detailed and accurate descriptions of images that I uploaded. Grok is natively multimodal.
There is no single LLM that outperforms all of the others at all tasks. I've seen all of the frontier models strongly outperform each other at specific tasks. If I was forced to use only one, that would be Gemini 2.5 Pro (for now) because it can process a million tokens and generate much longer output than the others.
There's definitely _something_ there, but, as with all philosophies, the internet has taken it and run with it to a fairly absurd degree, to the point where, for many adherents, it's basically a religion.
It's not. Feeding kids, researching vaccines and a bunch of other things that billionaires are funding should not depend on the graces and whims of billionaires, it should be something provided for by the government.
HN crowd is ... mixed, it's perhaps the one last true melting pot we have on the Internet. A curse and a blessing, if you ask me.
You got truly anything here. Europeans that in general tend to lean more towards "democratic socialism" and its various offshoots, American libertarians (which have a large intersection with Musk fanboys), a bunch of extremely rich startup founders, American progressives, conservatives of all kinds, Zionists and Hamas apologists, probably Russian and Chinese psy-ops, accelerationists, preppers... name any ideology and you'll find supporters on HN.
What has changed a bit is that tribalism seems to have taken over from civilized or at least arguments and fact oriented discourse. Personally, I'd prefer if downvotes and especially flags would require one to give a reason so that repeat offenders that just flag and downvote everything they disagree with can get suspended for ruining discussion.
Interesting how you put "hamas apologists" and not pro-palestinians next to Zionists. How would you have felt if it was written "pro-palestinains and genocide-apologists"?
Do you know what a "bubble" is? In fact, do you actually know any pro-palestinain people or do you get media that tells you about them? These are not the same thing. Very neat that you included "from the river to the sea" as right alongside rape. Very telling.
PS you can find street interviews of random isreali's where they will straight up tell you they wish all palestinians were killed with very little prompting. But I guess they just don't count huh?
You never know when it will start spouting it either. That kind of uncertainty in the responses landing in your interface is just not sustainable. Your money is coming from the quality of the content your system is putting out. If it's being used for dentistry, and it randomly spits out white supremacist content, dentists will look for a system that won't do that. Because they asked about, say, intaglio surfaces for a wearable dental appliance. Not a treatise on white genocide.
At this point, to use Grok, you'd be intentionally setting your startup to detonate itself at some random point in the future. That's just not how you make money.
So..
If the 'source' of data is 9gag, 4chan, you will get 'this' material.
If you feed it Tumlr, you will get Harry Potter and rope-porn-thingies.
If you feed it Hitler's speeches, you will get 'that' material.
If you feed it algebra, you will get 'that' material.
Then..
Do we want 'open' or 'curated' LLMs?
And how far from reality are the curated LLMs?
And how far can curated LLMs take us (black Nazis? female US founding fathers?).
Pick your poison I say.. and be careful what you wish for. There is no "perfect" LLM because there is no "perfect" dataset, and Sam-Altman-types-of-humans are definitely deeply flawed. But life is flawed, so our tools are/will be flawed.
The problem was not the source of the training data. xAI confirmed that the system prompt had been modified to make grok talk about South African white genocide.
While they didn’t say who modified it. It’s hard to believe it wasn’t Elon.
> While they didn’t say who modified it. It’s hard to believe it wasn’t Elon.
Is it really that hard to understand how these things happen?
The boss says "remove bias" but the peons don't really know how to do that and the naive approach to unbiasing a thing is to introduce bias in the other direction. And then if you're Google and the boss thinks it has a right-wing bias you crook it and get black Nazis and if you're xAI and the boss thinks it has a left-wing bias you get white genocide.
In both cases the actual problem is when people think bias operates like an arithmetic sum, because it doesn't.
That's precisely how the arithmetic theory of bias operates. That bias doesn't actually work that way is why applying it causes such ridiculous outcomes.
The term "kill the boer" was almost certainly added to the system prompt because Grok would begin talking about specifically that song, unprompted, to millions of people no matter what they were talking about.
This is not a case of trying to remove bias. I don't for a second believe anyone from the demographic using this site acting naive about that either, just have whatever political opinion and don't pretend this is respectable.
Do you have an example of this? I'm curious where C++ exceeds Rust in this regard.