Hacker Newsnew | past | comments | ask | show | jobs | submit | Maxatar's commentslogin

I have not found this to be the case. My company has some proprietary DSLs we use and we can provide the spec of the language with examples and it manages to pick up on it and use it in a very idiomatic manner. The total context needed is 41k tokens. That's not trivial but it's also not that much, especially with ChatGPT Codex and Gemini now providing context lengths of 1 million tokens. Claude Code is very likely to soon offer 1 million tokens as well and by this time next year I wouldn't be surprised if we reach context windows 2-4x that amount.

The vast majority of tokens are not used for documentation or reference material but rather are for reasoning/thinking. Unless you somehow design a programming language that is just so drastically different than anything that currently exists, you can safely bet that LLMs will pick them up with relative ease.


> Claude Code is very likely to soon offer 1 million tokens as well

You can do it today if you are willing to pay (API or on top of your subscription) [0]

> The 1M context window is currently in beta. Features, pricing, and availability may change.

> Extended context is available for:

> API and pay-as-you-go users: full access to 1M context

> Pro, Max, Teams, and Enterprise subscribers: available with extra usage enabled

> Selecting a 1M model does not immediately change billing. Your session uses standard rates until it exceeds 200K tokens of context. Beyond 200K tokens, requests are charged at long-context pricing with dedicated rate limits. For subscribers, tokens beyond 200K are billed as extra usage rather than through the subscription.

[0] https://code.claude.com/docs/en/model-config#extended-contex...


I wouldn't say strictly speaking that I've written no code, but the amount of code I've written since "committing" to using Claude Code since February is absolutely miniscule.

I prefer having Claude make even small changes at this point since every change it makes ends up tweaking it to better understand something about my coding convention, standard, interpretation etc... It does pick up on these little changes and commits them to memory so that in the long run you end up not having to make any little changes whatsoever.

And to drive this point further, even prior to using LLMs, if I review someone's work and see even a single typo or something minor that I could probably just fix in a second, I still insist that the author is the one to fix it. It's something my mentor at Google did with me which at the time I kind of felt was a bit annoying, but I've come to understand their reason for it and appreciate it.


Unfortunately Claude has a context window limit so it’s not going to keep “learning” forever.

Sort of... Claude Code writes to a memory.md file that it uses to store important information across conversations. If I review mine it has plenty of details about things like coding convention, structure, and overall architecture of the application it's working on.

The second thing Claude Code does is when it reaches the end of its context window it /compact the session, which takes a summary of the current session, dumps it into a file, and then starts a new session with that summary. But it also retains logs of all the previous sessions that it can use and search through.

Looking over my session of Claude Code, out of the 256k tokens available, about 50k of these tokens are used among "memory" and session summaries, and 200k tokens are available to work with. The reality is that the vast majority of tokens Claude Code uses is for its own internal reasoning as opposed to being "front-end" facing so to speak.

Additionally given that ChatGPT Codex just increased its context length from 256k to 1 million tokens, I expect Anthropic will release an update within a month or so to catch up with their own 1 million token model.


There’s a few problems with that.

1. The closer the context gets to full the worse it performs.

2. The more context it has the less it weights individual items.

That is Claude might learn you hate long functions and add a line about short functions. When that is the only thing in the function it is likely to follow other very closely. But when it’s 1 piece of such longer context, it is much more likely to ignore it.

3. Tokens cost money even you are currently being subsidized.

4. You have no idea how new models and new system prompt will perform with your current memory.md file.

5. Unlike learning something yourself, anything you teach Claude is likely to start being controlled by your employer. They might not let you take it with you when you go.


> 3. Tokens cost money even you are currently being subsidized.

keep in mind that those 50k memory tokens would likely be cached after the first run and thus significantly cheaper


Caching has so many caveats. The cache expiration window is short, if you change document in the context it clears the cache, if you change anything in the prompt prefix it clears the cache. And there’s no reason to think that Anthropic will keep charging dramatically less for cached tokens on the future once they start trying to make a profit.

My understanding is that Claude Code/Codex both put great effort into utilizing caching.

Yeah of course they do because it saves them more money than they are passing on to you. That doesn’t mean that they are magically able to overcome the tradeoffs inherent to caching. All of the issues I mentioned will still invalidate your cache.

If you know the change you want to make why wouldn't you just make it yourself.

It seems like people who concede control to an AI are mostly people who didn't feel in control of it in the first place while keeping every detail intentional is no longer a priority.


V8 is a JIT compiler that uses the Ignition [1] interpreter and only compiles sections of code down to machine instructions once they've been marked as hot via TurboFan [2].

V8 can also go back and forth from machine instructions back to bytecode if it identifies that certain optimization assumptions no longer hold.

[1] https://v8.dev/docs/ignition

[2] https://v8.dev/docs/turbofan


Not at all. Most of the good parts of boost are now part of the standard library, and the rest of boost that is of high quality have stand-alone implementations, like ASIO, pybind11 (which was heavily influenced by Boost.python), etc...

A lot of the new stuff that gets added into boost these days is basically junk that people contribute because they want some kind of resume padding but that very few people actually use. Often times people just dump their library into boost and then never bother to maintain it thereafter.


Yes it absolutely is valuable to have access to expert opinions and people do pay money to acquire opinions from experts.

But expert advice, even if material, is not the same as insider information.


Well, I think context matters here a lot:

If you go to a random lawyer in Wyoming and ask them to write "expert opinion" then what you'll get would probably be something standard, written by a junior associate, or maybe even produced by ChatGPT.

If the White House orders "expert opinion" on potential Supreme Court ruling then the chances are that the expert asked to prepare it is someone who plays golf with some of the SCOTUS judges.

So those two "expert opinions" might not bear the same weight.


The quality of an opinion has no bearing as to whether that opinion is insider information.

In science randomness is usually used to abstract over a large number of possible paths that result in some outcome without having to reason individually about any specific path or all such paths.

It does not have to mean something inherently non-deterministic or something that can't be modelled, although it certainly is the case that if something is inherently non-deterministic then it would necessarily have to be modelled randomly. Modelling things as a random process is very useful even in cases where the underlying phenomenon has a fully understood and deterministic model; a simple example of this would be chess. It's an entirely deterministic game with perfect information that is fully understood, but nevertheless all the best chess engines model positions probabilistically and use randomness as part of their search.


> Modelling things as a random process is very useful even in cases where the underlying phenomenon has a fully understood and deterministic model

Output of of a pseudorandom generater is a good example.


In the U.S. income is defined as revenue minus expenses:

https://en.wikipedia.org/wiki/Income_(United_States_legal_de...


Interesting, it seems like this might be a UK vs US thing. All the non-dodgy UK results for "income" I found agree with what I thought e.g. "Income less Costs = Profit" [1]

The one exception is HMRC (UK equivalent of IRS) which, for the purposes of corporation tax only, defines income like profit [2] (with some technical differences, but the same spirit). But for other purposes (e.g. personal income tax) even they use it to just literally mean cash received without subtracting off outgoings.

Using it in this net sense seems very odd to me, but maybe that's because I'm British. "Income" and "outgoings" look to me like symmetrical terms, and no one would consider outgoings to be after subtracting off money coming in (would they?!)

[1] https://www.cheapaccounting.co.uk/blog/index.php/income-prof...

[2] https://www.gov.uk/hmrc-internal-manuals/company-taxation-ma...


But it's not completely different and a "≃" would not mean the same thing, in fact it would weaken the statement.

e^x ≃ 1 + x + O(x^2) would only assert that lim (x->0) (e^x)/(1+x) = 1.

However "e^x = 1 + x + O(x^2)" means that for some function r(x) belonging to the set O(x^2), e^x is exactly equal to 1 + x + r(x). Another way to rewrite that equation that eliminates the "abuse of notation" would be:

    e^x − (1 + x) ∈ O(x^2)
The particular r(x) in O(x^2) which makes it strictly equal is being left out, that's true, and usually it's left out for brevity or practical reasons or even because it's not even be known what r(x) is... but nevertheless it is not an asymptotic equation or an approximation, it is exactly equal to the value on the right hand side for some particular r(x) the exact details of which are being omitted for one reason or another.


They did not merge into a single entity, they continue to be separate entities. SpaceX owns xAI, and xAI in turn owns X.


Interesting, I actually find LLMs very useful at debugging. They are good at doing mindless grunt work and a great deal of debugging in my case is going through APIs and figuring out which of the many layers of abstraction ended up passing some wrong argument into a method call because of some misinterpretation of the documentation.

Claude Code can do this in the background tirelessly while I can personally focus more on tasks that aren't so "grindy".


They are good at purely mechanical debugging - throw them an error, they can figure out which line threw it, and therefore take a reasonable stab at how to fix it. Anything where the bug is actually in the code, sure, you'll get an answer. But they are terrible at weird runtime behaviors caused by unexpected data.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: