Hacker Newsnew | past | comments | ask | show | jobs | submit | bkettle's commentslogin

Are you based in Alaska? I’d love to hear more if so; I have contact info on my website. (I grew up there).

The whole point I think, though, is that it doesn’t matter. If an LLM hallucinates a proof that passes the proof checker, it’s not a hallucination. Writing and inspecting the spec is unsolved, but for the actual proof checking hallucinations don’t matter at all.


Isnt the actual proof checking what your traditional formal verification tool does? I would have thought that 99% of the work of formal verification would be writing the spec and verifying that it correctly models the problem you are trying to model.


The specification task is indeed a lot of work. Driving the tool to complete the proof is often also a lot of work. There are many fully automatic proof tools. Even the simplest like SAT solvers run into very hard computational complexity limits. Many “interactive” provers are more expressive and allow a human to help guide the tool to the proof. That takes intuition, engineering, etc.


I think formal verification shines in areas where implementation is much more complex than the spec, like when you’re writing incomprehensible bit-level optimizations in a cryptography implementation or compiler optimization phases. I’m not sure that most of us, day-to-day, write code (or have AI write code) that would benefit from formal verification, since to me it seems like high-level programming languages are already close to a specification language. I’m not sure how much easier to read a specification format that didn’t concern itself with implementation could be, especially when we currently use all kinds of frameworks and libraries that already abstract away implementation details.

Sure, formal verification might give stronger guarantees about various levels of the stack, but I don’t think most of us care about having such strong guarantees now and I don’t think AI really introduces a need for new guarantees at that level.


> to me it seems like high-level programming languages are already close to a specification language

They are not. The power of rich and succinct specification languages (like TLA+) comes from the ability to succinctly express things that cannot be efficiently computed, or at all. That is because a description of what a program does is necessarily at a higher level of abstraction than the program (i.e. there are many possible programs or even magical oracles that can do what a program does).

To give a contrived example, let's say you want to state that a particular computation terminates. To do it in a clear and concise manner, you want to express the property of termination (and prove that the computation satisfies it), but that property is not, itself, computable. There are some ways around it, but as a rule, a specification language is more convenient when it can describe things that cannot be executed.


TLA+ is not a silver bullet, and like all temporal logic, has constraints.

You really have to be able to reduce your models to: “at some point in the future, this will happen," or "it will always be true from now on”

Have probabilistic outcomes? Or even floats [0] and it becomes challenging and strings are a mess.

> Note there is not a float type. Floats have complex semantics that are extremely hard to represent. Usually you can abstract them out, but if you absolutely need floats then TLA+ is the wrong tool for the job.

TLA+ works for the problems it is suitable for, try and extend past that and it simply fails.

[0] https://learntla.com/core/operators.html


> You really have to be able to reduce your models to: “at some point in the future, this will happen," or "it will always be true from now on”

You really don't. It's not LTL. Abstraction/refinement relations are at the core of TLA.

> Or even floats [0] and it becomes challenging and strings are a mess.

No problem with floats or strings as far as specification goes. The particular verification tools you choose to run on your TLA+ spec may or may not have limitations in these areas, though.

> TLA+ works for the problems it is suitable for, try and extend past that and it simply fails.

TLA+ can specify anything that could be specified in mathematics. That there is no predefined set of floats is no more a problem than the one physicists face because mathematics has no "built-in" concept for metal or temperature. TLA+ doesn't even have any built in notions of procedures, memory, instructions, threads, IO, variables in the programming sense, or, indeed programs. It is a mathematical framework for describing the behaviour of discrete or hybrid continuous-discrete dynamical systems, just as ODEs describe continuous dynamical systems.

But you're talking about the verfication tools you can run on TLA+ spec, and like all verification tools, they have their limitations. I never claimed otherwise.

You are, however, absolutely right that it's difficult to specify probabilistic properties in TLA+.


> No problem with floats or strings as far as specification goes. The particular verification tools you choose to run on your TLA+ spec may or may not have limitations in these areas, though.

I think it's disingenuous to say that TLA+ verifiers "may or may not have limitations" wrt floats when none of the available tools support floats. People should know going in that they won't be able to verify specs with floats!


I'm not sure how a "spec with floats" differs from a spec with networks, RAM, 64-bit integers, multi-level cache, or any computing concept, none of which exists as a primitive in mathematics. A floating point number is a pair of integers, or sometimes we think about it as a real number plus some error, and TLAPS can check theorems about specifications that describe floating-point operations.

Of course, things can become more involved if you want to account for overflow, but overflow can get complicated even with integers.


Those things, unlike floats, have approximable-enough facsimiles that you can verify instead. No tools support even fixed point decimals.

This has burned me before when I e.g needed to take the mean of a sequence.


You say no tools but you can "verify floats" with TLAPS. I don't think that RAM or 64-bit integers have facsimiles in TLA+. They can be described mathematically in TLA+ to whatever level of detail you're interested in (e.g. you have to be pretty detailed when describing RAM when specifying a GC, and even more when specifying a CPU's memory-access subsystem), but so can floating point numbers. The least detailed description - say, RAM is just data - is not all that different from representing floats as reals (but that also requires TLAPS for verification).

The complications in describing machine-representable numbers also apply to integers, but these complications can be important, and the level of detail matters just as it matters when representing RAM or any other computing concept. Unlike, say, strings, there is no single "natural" mathematical representation of floating point numbers, just as there isn't one for software integers (integers work differently in C, Java, JS, and Zig; in some situations you may wish to ignore these differences, in others - not). You may want to think about floating point numbers as a real + error, or you may want to think about them as a mantissa-exponent pair, perhaps with overflow or perhaps without. The "right" representation of a floating point number highly depends on the properties you wish to examine, just like any other computing construct. These complications are essential, and they exist, pretty much in the same form, in other languages for formal mathematics.


P.S. for the case of computing a mean, I would use Real rather than try to model floating point if the idiosyncracies of a particular FP implementation were important. That means you can't use TLC. In some situations it could suffice to represent the mean as any number (even an integer) that is ≥ min and ≤ max, but TLC isn't very effective even for algorithms involving non-tiny sets of integers when there's "interesting" arithmetic involved.

I don't know the state of contemporary model checkers that work with theories of reals and/or FP, and I'm sure you're much more familar with that than me, but I believe that when it comes to numeric computation, deductive proofs or "sampling tests" (such as property-based testing) are still more common than model-checking. It could be interesting to add a random sampling mode to TLC that could simulate many operations on reals using BigDecimal internally.


> TLA+ can specify anything that could be specified in mathematics.

You are talking about the logic of TLA+, that is, its mathematical definition. No tool for TLA+ can handle all of mathematics at the moment. The language was designed for specifying systems, not all of mathematics.


There are excellent probabilistic extensions to temporal logic out there that are very useful to uncover subtle performance bugs in protocol specifications, see e.g. what PRISM [1] and Storm [2] implement. That is not within the scope of TLA+.

Formal methods are really broad, ranging from lightweight type systems to theorem proving. Some techniques are fantastic for one type of problem but fail at others. This is quite natural, the same thing happens with different programming paradigms.

For example, what is adequate for a hard real-time system (timed automata) is useless for a typical CRUD application.

[1] https://www.prismmodelchecker.org

[2] https://www.stormchecker.org


I really do wish that PRISM can one day add some quality of life features like "strings" and "functions"

(Then again, AIUI it's basically a thin wrapper over stochastic matrices, so maybe that's asking too much...)


> TLA+ is not a silver bullet, and like all temporal logic, has constraints. > > You really have to be able to reduce your models to: “at some point in the future, this will happen," or "it will always be true from now on”

I think people get confused by the word "temporal" in the name of TLA+. Yes, it has temporal operators. If you throw them away, TLA+ (minus the temporal operators) would be still extremely useful for specifying the behavior of concurrent and distributed systems. I have been using TLA+ for writing specifications of distributed algorithms (e.g., distributed consensus) and checking them for about 6 years now. The question of liveness comes the last, and even then, the standard temporal logics are barely suitable for expressing liveness under partial synchrony. The value of temporal properties in TLA+ is overrated.


The "temporal" in TLA+ isn't about □ and ⬦. It's about ' and the abstraction-refinment relation with stuttering at its core (contrasted with o and its non-stuttering meaning). Of course, you can't really specify anything in TLA+ without □ (unless you rely on TLC, which inserts the □ for you).

You cannot specify much in TLA+ without ' and □, andtThe "temporal" part of TLA+ - i.e. the TLA logic - is essential; but saying it's like "all temporal logics" is ignoring the abstraction-refinement relation, which is the heart of TLA+ (that's what ⇒, basic implication, in TLA+ means) and other temporal logics miss.

Of course, you could hypothetically use the + part of TLA+, the formalised set theory, to specify everything, but that would be very inconvenient.


What you said certainly works, but I'm not sure computability is actually the biggest issue here?

Have a look at how SAT solvers or Mixed Integer Linear Programming solvers are used.

There you specify a clear goal (with your code), and then you let the solvers run. You can, but you don't need to, let the solvers run all the way to optimality. And the solvers are also allowed to use all kinds of heuristics to find their answers, but that doesn't impact the statement of your objective.

Compare that to how many people write code without solvers: the objective of what your code is trying to achieve is seldom clearly spelled out, and is instead mixed up with the how-to-compute bits, including all the compromises and heuristics you make to get a reasonable runtime or to accommodate some changes in the spec your boss asked for at the last minute.

Using a solver ain't formal verification, but it shows the same separation between spec and implementation.

Another benefit of formal verification, that you already imply: your formal verification doesn't have to determine the behaviour of your software, and you can have multiple specs simultaneously. But you can only have a single implementation active at a time (even if you use a high level implementation language.)

So you can add 'handling a user request must terminate in finite time' as a (partial) spec. It's an important property, but it tells you almost nothing about the required business logic. In addition you can add "users shouldn't be able to withdraw more than they deposited" (and other more complicated rules), and you only have to review these rules once, and don't have to touch them again, even when you implement a clever new money transfer routine.


Peter Norvig once proposed to consider a really large grammar, with trillion rules, which could simulate some practically small applications of more complex systems. Many programs in practice don't need to be written in Turing-complete languages, and can be proven to terminate.


Writing in a language that guarantees termination is not very interesting in itself, as every existing program could automatically be translated into a non-Turing-complete language where the program is proven to terminate, yet behaves exactly the same: the language is the same as the original, only loops/rectursion ends the program after, say, 2^64 iterations. This, in itself, does not make programs any easier to analyse. In fact, a language that only has boolean variables, no arrays, no recursion, and loops of depth 2 at most is already instractable to verify. It is true that programs in Turing-complete languages cannot generally be verified in efficiently, but most non-Turing-complete languages also have this property.

Usually, when we're interested in termination proofs, what we're really interested in is a proof that the algorithm makes constant progress that converges on a solution.


I think the interesting progress in programs can generally be achieved for many programs, which take input and produce output and then terminate. For servers, which wait for requests, the situation seem to be different.


This sounds very interesting. Do you have a reference?


Saw something like that once, couldn't find recently, sorry. Ask Peter?..


Can TLA+ prove anything about something you specify but don't execute?


TLA+ is just a language for writing specifications (syntax + semantics). If you want to prove anything about it, at various degrees of confidence and effort, there are three tools:

- TLAPS is the interactive proof system that can automate some proof steps by delegating to SMT solvers: https://proofs.tlapl.us/doc/web/content/Home.html

- Apalache is the symbolic model checker that delegates verification to Z3. It can prove properties without executing anything, or rather, executing specs symbolically. For instance, it can do proofs via inductive invariants but only for bounded data structures and unbounded integers. https://apalache-mc.org/

- Finally, TLC is an enumerative model checker and simulator. It simply produces states and enumerates them. So it terminates only if the specification produces a finite number of states. It may sound like executing your specification, but it is a bit smarter, e.g., when checking invariants it will never visit the same state twice. This gives TLC the ability to reason about infinite executions. Confusingly, TLC does not have its own page, as it was the first working tool for TLA+. Many people believe that TLA+ is TLC: https://github.com/tlaplus/tlaplus


> It may sound like executing your specification, but it is a bit smarter,

It's more than just "a bit smarter" I would say, and explicit state enumeration is nothing at all like executing a spec/program. For example, TLC will check in virtually zero time a spec that describes a nondeterministic choice of a single variable x being either 0 or 1 at every step (as there are only two states). The important aspect here isn't that each execution is of infinite length, but that there are an uncountable infinity of behaviours (executions) here. This is a completely different concept from execution, and it is more similar to abstract interpretation (where the meaning of a step isn't the next state but the set of all possible next states) than to concrete interpretation.


You can write proofs in TLA+ about things you don't exectute and have them checked by the TLA+ proof assistant. But the most common aspect of that, which pretty much every TLA+ spec contains is nondeterminism, which is basically the ability to describe a system with details you don't know or care about. For example, you can describe "a program that sorts an array" without specifying how and then prove, say, that the median value ends up in the middle. The ability to specify what a program or a subroutine does without specifying how is what separates the expressive power of specification from programming. This extends not only to the program itself but to its environment. For example, it's very common in TLA+ to specify a network that can drop or reorder messages nondeterministically, and then prove that the system doesn't lose data despite that.


> To give a contrived example, let's say you want to state that a particular computation terminates. To do it in a clear and concise manner, you want to express the property of termination (and prove that the computation satisfies it), but that property is not, itself, computable. There are some ways around it, but as a rule, a specification language is more convenient when it can describe things that cannot be executed.

Do you really think it is going to be easier for the average developer to write a specification for their program that does not terminate

vs

Giving them a framework or a language that does not have for loop?

Edit: If by formal verification you mean type checking. That I very much agree.


Maybe it's difficult for the average developer to write a formal specification, but the point of the article is that an AI can do it for them.


Yes. I feel like people who are trying to push software verification have never worked on typical real-world software projects where the spec is like 100 pages long and still doesn't fully cover all the requirements and you still have to read between the lines and then requirements keep changing mid-way through the project... Implementing software to meet the spec takes a very long time and then you have to invest a lot of effort and deep thought to ensure that what you've produced fits within the spec so that the stakeholder will be satisfied. You need to be a mind-reader.

It's hard even for a human who understands the full business, social and political context to disambiguate the meaning and intent of the spec; to try to express it mathematically would be an absolute nightmare... and extremely unwise. You would literally need some kind of super intelligence... And the amount of stream-of-thought tokens which would have to be generated to arrive at a correct, consistent, unambiguous formal spec is probably going to cost more than just hiring top software engineers to build the thing with 100% test coverage of all main cases and edge cases.

Worst part is; after you do all the expensive work of formal verification; you end up proving the 'correctness' of a solution that the client doesn't want.

The refactoring required will invalidate the entire proof from the beginning. We haven't even figured out the optimal way to formally architect software that is resilient to requirement changes; in fact, the industry is REALLY BAD at this. Almost nobody is even thinking about it. I am, but I sometimes feel like I may be the only person in the world who cares about designing optimal architectures to minimize line count and refactoring diff size. We'd have to solve this problem first before we even think about formal verification of 'most software'.

Without a hypothetical super-intelligence which understands everything about the world; the risk of misinterpreting any given 'typical' requirement is almost 100%... And once we have such super-intelligence, we won't need formal verification because the super-intelligence will be able to code perfectly on the first attempt; no need to verify.

And then there's the fact that most software can tolerate bugs... If operationally important big tech software which literally has millions of concurrent users can tolerate bugs, then most software can tolerate bugs.


Software verification has gotten some use for smart contracts. The code is fairly simple, it's certain to be attacked by sophisticated hackers who know the source, and the consequence of failure is theft of funds, possibly in large amounts. 100% test coverage is no guarantee that an attack can't be found.

People spend gobs of money on human security auditors who don't necessarily catch everything either, so verification easily fits in the budget. And once deployed, the code can't be changed.

Verification has also been used in embedded safety-critical code.


If the requirements you have to satisfy arise out of a fixed, deterministic contract (as opposed to a human being), I can see how that's possible in this case.

I think the root problem may be that most software has to adapt to a constantly changing reality. There aren't many businesses which can stay afloat without ever changing anything.


The whole perspective of this argument is hard for me to grasp. I don't think anyone is suggesting that formal specs are an alternative to code, they are just an alternative to informal specs. And actually with AI the new spin is that they aren't even a mutually exclusive alternative.

A bidirectional bridge that spans multiple representations from informal spec to semiformal spec to code seems ideal. You change the most relevant layer that you're interested in and then see updates propagating semi-automatically to other layers. I'd say the jury is out on whether this uses extra tokens or saves them, but a few things we do know. Chain of code works better than chain of thought, and chain-of-spec seems like a simple generalization. Markdown-based planning and task-tracking agent workflows work better than just YOLOing one-shot changes everywhere, and so intermediate representations are useful.

It seems to me that you can't actually get rid of specs, right? So to shoot down the idea of productive cooperation between formal methods and LLM-style AI, one really must successfully argue that informal specs are inherently better than formal ones. Or even stronger: having only informal specs is better than having informal+formal.


> A bidirectional bridge that spans multiple representations from informal spec

Amusingly, what I'm hearing is literally "I have a bridge to sell you."


There's always a bridge, dude. The only question is whether you want to buy one that's described as "a pretty good one, not too old, sold as is" or if you'd maybe prefer "spans X, holds Y, money back guarantee".


I get it. Sometimes complexity is justified. I just don't feel this particular bridge is justified for 'mainstream software' which is what the article is about.


I agree that trying to produce this sort of spec for the entire project is probably a fool's errand, but I still see the value for critical components of the system. Formally verifying the correctness of balance calculation from a ledger, or that database writes are always persisted to the write ahead log, for example.


I used to work adjacent to a team who worked from closely-defined specs for web sites, and it used to infuriate the living hell out of me. The specs had all sorts of horrible UI choices and bugs and stuff that just plain wouldn't work when coded. I tried my best to get them to implement the intent of the spec, not the actual spec, but they had been trained in one method only and would not deviate at any cost.


Yeah, IMO, the spec almost always needs refinement. I've worked for some companies where they tried to write specs with precision down to every word; but what happened is; if the spec was too detailed, it usually had to be adjusted later once it started to conflict with reality (efficiency, costs, security/access restrictions, resource limits, AI limitations)... If it wasn't detailed enough, then we had to read between the lines and fill in a lot of gaps... And usually had to iterate with the stakeholder to get it right.

At most other companies, it's like the stakeholder doesn't even know what they want until they start seeing things on a screen... Trying to write a formal spec when literally nobody in the universe even knows what is required; that's physically impossible.

In my view, 'Correct code' means code that does what the client needs it to do. This is downstream from it doing what the client thinks they want; which is itself downstream from it doing what the client asked for. Reminds me of this meme: https://www.reddit.com/r/funny/comments/105v2h/what_the_cust...

Software engineers don't get nearly enough credit for how difficult their job is.


How do you or the client know that the software is doing what they want?


What formal verification system did they use? Did they even execute it?


There are many really important properties to enforce even on the most basic CRUD system. You can easily say things like "an anonymous user must never edit any data, except for the create account form", or "every user authorized to see a page must be listed on the admin page that lists what users can see a page".

People don't verify those because it's hard, not for lack of value.


Yes, in fact there is research on type systems to ensure information flow control, avoiding unauthorized data access by construction.

Concrete Semantics [1] has a little example in §9.2.

[1] http://concrete-semantics.org/concrete-semantics.pdf


Yeah fair enough. I can definitely see the value of property-based verification like this and agree that useful properties could be easy to express and that LLMs could feasibly verify them. I think full verification that an implementation implements an entire spec and nothing else seems much less practical even with AI, but of course that is just one flavor of verification.


Even

> "an anonymous user must never edit any data, except for the create account form"

Can quickly end up being

> "an anonymous user must never edit any data, except for the create account form, and the feedback form"

And a week later go to

> "an anonymous user must never edit any data, except for the create account form, the feedback form, and the error submission form if they end up with a specific type of error"

And then during christmas

> > "an anonymous user must never edit any data, except for the create account form, the feedback form, and the error submission form if they end up with a specific type of error, and the order submission form if they visit it from this magic link. Those visiting from the magic link, should not be able to use the feedback form (marge had a bad experience last christmas going through feedbacks from the promotional campaign)"


It is still a small rule, with plenty of value. It's nowhere near the size of the access control for the entire site. And it's also not written down by construction.

It changing with time doesn't make any of that change.


Yes, except their cookie preferences to comply with european law. Oh, and they should be able to change their theme from light/dark but only that. Oh and maybe this other thing. Except in situations where it would conflict with current sales promotions. Unless they're referred by a reseller partner. Unless it's during a demo, of course. etc, etc, etc.

This is the sort of reality that a lot of developers in the business world deals with.


Compare the spec with the application here: https://concerningquality.com/model-based-testing/

I think we've become used to the complexity in typical web applications, but there's a difference between familiar and simple (simple vs. easy, as it were). The behavior of most business software can be very simply expressed using simple data structures (sets, lists, maps) and simple logic.

No matter how much we simply it, via frameworks and libraries or whatever have you, things like serialization, persistence, asynchrony, concurrency, and performance end up complicating the implementation. Comparing this against a simpler spec is quite nice in practice - and a huge benefit is now you can consult a simple in-memory spec vs. worrying about distributed system deployments.


> especially when we currently use all kinds of frameworks and libraries that already abstract away implementation details.

This is my issue with algorithm driven interviewing. Even the creator of Homebrew got denied by Google because he couldn't do some binary sort or whatever it even was. He made a tool used by millions of developers, but apparently that's not good enough.


Google denies qualified people all the time. They would much rather reject a great hire than take a risk on accepting a mediocre one. I feel for him but it's just the nature of the beast. Not everyone will get in.


This language sounds like chauvinism leading to closed-mindedness and efficiency. Of course there are tradeoffs to chauvinism, as Googlers possess the mind to notice. But a Googler does not need to worry about saying ambiguous truths without understanding their emotions to the masses, for they have Google behind them. With the might of the G stick, they can hammer out words with confidence.


I've heard this before. Why do you think algorithm questions are effective for finding "good" hires? Are they?


The intent isn't to find good hires per se, but to whittle down the list of applicants to a manageable number in a way that doesn't invite discrimination lawsuits.

Same as why companies in the past used to reject anyone without a degree. But then everyone got a degree, leaving it to no longer be an effective filter, hence things like algorithm tests showing up to fill the void.

Once you've narrowed the list, then you can worry about figuring out who is "good" through giving the remaining individuals additional attention.


> Same as why companies in the past used to reject anyone without a degree.

They still do, and its a shame some of the smartest most capable developers I know have no degree.


They certainly don't filter out toxic people who make others leave companies because they poison the well.


I have a suspicion that "good candidate" is being gerrymandered. What might have been "good" in 1990 might have become irrelevant in 2000+ or perhaps detrimental. I say that as someone who is actually good at algorithm questions himself. I think GP, as well as other Google defenders, are parroting pseudo-science.


I agree. But also if it works to get you jobs there, why wouldn't you defend it? I mean I might be inclined to do so as well, it guarantees me a place even if I lack soft skills for the role.


AWS has said that having formal verification of code lets them be more aggressive in optimization while being confidant it still adheres to the spec. They claim they were able to double the speed of IAM API auth code this way.


> there's a section of road near where I live that's dangerous, and we all know it's dangerous

Clearly not enough people know it’s dangerous or how dangerous it is, or one of them would do something about it


It's a rural area with limited resources. We drive slow for that turn.


Semgrep has supported uv for months now (I added it).


This free tradition in software is I think one of the things that I love so much, but I don't see how it can continue with LLMs due to the extremely high training costs and the powerful hardware required for inference. It just seems like writing software will necessarily require paying rent to the LLM hosts to keep up. I guess it's possible that we'll figure out a way to do local inference in a way that is accessible to everyone in the way that most other modern software tools are, but the high training costs make that seem unlikely to me.

I also worry that as we rely on LLMs more and more, we will stop producing the kind of tutorials and other content aimed at beginners that makes it so easy to pick up programming the manual way.


There's a Stephen Boyd quote that's something like "if your optimization problem is too computationally expensive, just go on vacation to Greece for a few weeks and by the time you get back, computers might be fast enough to solve it." With LLMs there's sort of an equivalent situation with cost: how mindblowing would it be able to train this kind of LLM at all even just 4 years ago? And today you can get a kindergartener level chat model for about $100. Not hard to imagine the same model costing $10 of compute in a few years.

There's also a reasonable way to "leapfrog" the training cost with a pre-trained model. So if you were doing nanochat as a learning exercise and had no money, the idea would be to code it up, run one or two very slow gradient descent iterations on your slow machine to make sure it is working, then download a pre-trained version from someone who could spare the compute.


But in this case the reason is simple: the core algorithm is O(n^2), this not going to be improved over a few weeks.


> today you can get a kindergartener level chat model for about $100. Not hard to imagine the same model costing $10 of compute in a few years.

No, it's extremely hard to imagine since I used one of Karpathy's own models to have a basic chat bot like six years ago. Yes, it spoke nonsense; so did my GPT-2 fine tune four years ago and so does this.

And so does ChatGPT

Improvement is linear at best. I still think it's actually a log curve and GPT3 was the peak of the "fun" part of the curve. The only evidence I've seen otherwise is bullshit benchmarks, "agents" that increase performance 2x by increasing token usage 100x, and excited salesmen proclaiming the imminence of AGI


Apparently 800 million weekly users are finding ChatGPT useful in its present state.


1. According to who? Open AI? 2. Its current state is "basically free and containing no ads". I don't think this will remain true given that, as far as I know, the product is very much not making money.


Yes, that number is according to OpenAI. They released that 800m number at DevDay last week.

The most recent leaked annualized revenue rate was $12bn/year. They're spending a lot more than that but convincing customers to hand over $12bn is still a very strong indicator of demand. https://www.theinformation.com/articles/openai-hits-12-billi...


Part of that comes from Microsoft API deals. Part of that will most certainly come because the vast network of companies buy subscriptions to help "Open" "AI" [1].

Given the rest of circular deals, I'd also scrutinize if it applies to the revenue. The entanglement with the Microsoft investments and the fact that "Open" "AI" is a private company makes that difficult to research.

[1] In a U.S. startup, I went through three CEOs and three HR apps, which mysteriously had to change for no reason but to accommodate the new CEO's friends and their startups.


When people take the time to virtue signal "Open" "AI" or the annoyingly common M$ on here I wonder often why they are wasting their precious time on earth doing that


It is in my style guide. I see that you optimized your time by omitting the full stop at the end of the sentence.


Totally fine, you're not alone. The kids call it a "typing quirk" as a justification if you're interested in learning more.


Even with linear progression of model capability, the curve for model usefulness could be exponential, especially if we consider model cost which will come down.

For every little bit a model a smarter and more accurate there are exponentially more real world tasks it could be used for.


This. It looks like one of the keys to maintaining open source is to ensure OSS developers have access to capable models. In the best of worlds, LLM vendors would recognize that open source software is the commons that feeds their models and ensure it flourishes.

In the real world...


Maybe this isn't possible for LLMs yet, but open source versions of AlphaZero have been trained on peer-to-peer networks.

https://zero.sjeng.org/

https://katagotraining.org/


Do you mean 500Wh rather than 500kWh? 500 kWh would be around 10 EVs worth, and looks like it costs around 700k [0], but 500Wh seems to be a common size for portable power stations [1]

[0] https://www.backupbatterypower.com/products/516-kwh-industri...

[1] https://www.ankersolix.com/products/535?variant=497024349310...


I think they mean backwards-compatible syntax-wise, rather than actually allowing this feature to be used on existing code. If I’m understanding correctly they would prefer for the Python grammar to stay the same (hence the comment about updating parsers and IDEs).

But I don’t think I really agree, the extensible annotation syntaxes they mention always feel clunky and awkward to me. For a first-party language feature (especially used as often as this will be), I think dedicated syntax seems right.


I found out yesterday that it is now the law that federal employees are guaranteed back pay after a shutdown, thanks to the Government Employee Fair Treatment Act of 2019 passed after the 2019 shutdown.

https://www.congress.gov/bill/116th-congress/senate-bill/24


I was honestly quite surprised to see such a bipartisan vote in the house. [0] (Senate was a voice-vote, so no details available there.)

Then again, a lot has changed over that... ~6.5 years.

[0] https://clerk.house.gov/Votes/201928


I actually think that Cloudflare has made publishing on the internet _more_ accessible for many individuals. I’ve helped a few people get personal websites running on Cloudflare pages and run my own there—it’s free and extremely easy. They could obviously pull the plug at any point, but with static sites it’s easy to avoid lock-in. If it weren’t for Cloudflare and other services that give free, easy hosting, I suspect there would be even fewer of the non-commercial small-internet sites that you value.


There have been places that host personal and hobby websites for free for at least the last 30 years. Some older ones have left, and newer ones keep coming along. Cloudflare didn't make this any more accessible.


but most of them are dogshit tho


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: