Hacker Newsnew | past | comments | ask | show | jobs | submit | jeffjeffbear's commentslogin

Could be good as a modern version of https://zapatopi.net/treeoctopus/

It also reminded me of the https://en.wikipedia.org/wiki/Spaghetti-tree_hoax

One of those things that seem POSSIBLE, but then again...


I haven't looked into it in years, but would the inverse of a block bi-diagonal matrix have some semiseperable structure? Maybe that would be good to look into?


just to be clear, semiseparate in this context means H = D + CC', where D is block diagonal and C is tall & skinny?

If so, it would be nice if this were the case, because you could then just use the Woodbury formula to invert H. But I don't think such a decomposition exists. I tried to exhaustively search through all the decompositions of H that involved one dummy variable (of which the above is a special case) and I couldn't find one. I ended up having to introduce two dummy variables instead.


> just to be clear, semiseparate in this context means H = D + CC', where D is block diagonal and C is tall & skinny?

Not quite, it means any submatrix taken from the upper(lower) part of the matrix has some low rank. Like a matrix is {3,4}-semiseperable if any sub matrix taken from the lower triangular part has at most rank 3 and any submatrix taken from the upper triangular part has at most rank 4.

The inverse of an upper bidiagonal matrix is {0,1}-semiseperable.

There are a lot of fast algorithms if you know a matrix is semiseperable.

edit: link https://people.cs.kuleuven.be/~raf.vandebril/homepage/public...


thanks for the explanation! sorry i had misread the AI summary on "semiseparable".

i need to firm my intuition on this first before i can say anything clever, but i agree it's worth thinking about!


> Haven't used Linux in forever, but middle-click to paste was like the one thing that consistently worked everywhere.

That's because it was an X11 thing, and everyone used X11.


X11 doesnt really define those things. Policy, not mechanism.


X heavily relied on the primary and secondary selections for performing operations in lieu of an explicit clipboard. It is built into the protocol. The only policy is where that paste binding defaulted to.


X doesn't even enforce primary or secondary selections, they have no special meaning to the protocol. What is built on the protocol is this mechanism to do clipboard-like things. Even how many actual clipboard thingies you have is policy and not builtin into the protocol.


While X11 didn't define it, the defaults were such that it would be harder to write a program that didn't do that then one that did in many cases.


Not at all. Unless you specifically coded this handler for the middle button and wrote code for fetching selections and all, you would not get this behavior. It would be easier for the middle button to do nothing.

You may be thinking of toolkits like Gtk+ or Qt which implement this behavior, but it is really just a convention shared by many desktop toolkits rather than anything defined by X11.


That is also the one good thing about Window's commandline, you use right click there to copy and paste which is nice. The rest sucks.


I cannot stand the Windows user experience in their command line. The Linux method actually has to software registries that allow for different content to be copied and pasted.

Oh I used CTRL+C to copy something but I need something copied first, highlight paste with middle mouse and paste with CTRL+P.

On Windows you must destroy the content of the CTRL+C and replace it with what the middle mouse can do, go back to the first source to copy and paste again.


You want a clipboard manager/history. You are using middle button paste as a work around for how hard it is to find a good clipboard manager (I'm not sure if one exists...)


I have and use all three on Linux. I only use Windows at work the IT is strict.


I don't really love this solution since it runs into all the usual linked list issues, and is only 'allocation free' in the sense that the pointers are allocated with the structure if doing the intrusive thing they are talking about. Using the std::vector of pointers approach isn't going to be using crazily more memory.

Myself, I like to just allocate a too big block and shove everything into that then deal with indicies into that array if I care about performance. you can even flatten the tree in a way to get better locality if you care about that.


whether you allocate a big array and store indices into it or whether you keep pointers to dynamically allocated memory is orthogonal to how you actually represent the hierarchy.


I would really like to see more testing with a deeper hierarchy and alpha and beta nonzero.


> I think it really depends on what kind of anime you’re talking about

Does it? If I draw a naked stick figure with boobs and say it is 14, is that morally wrong? At what point should a person care? Their point is that a drawing doesn't hurt people right?


Just because it’s hard to spot the point where it becomes immoral doesn’t mean it’s not immoral. I can’t tell you at what point a person should care, and I wouldn’t want to be the judge of that. My point is that saying they’re looking at “anime” is really downplaying what’s happening. I don’t personally believe the drawings we’re referring to hurt anyone, but that had nothing to do with my argument anyway. Many people will be disgusted by it, and others will not, meanwhile most people seem to be okay with mainstream anime.


> If I draw a naked stick figure with boobs and say it is 14, is that morally wrong? At what point should a person care?

No and I'm sure every judge in Britain would throw that case out.

> Their point is that a drawing doesn't hurt people right?

It can in certain circumstances encourage a market or normalise abusive behaviour.


> It can in certain circumstances encourage a market or normalise abusive behaviour.

Just like the printed word. Books should be banned and burned. We should start with Orwell since his writing has been used as a manual for so much abusive behaviour.


Hate speech is also illegal in the UK, yes.


>"It can in certain circumstances encourage..."

Anything can be bad in "certain circumstances". They should go get busy with some real crime.


> Anything can be bad in "certain circumstances"

Can it? In the same way? It feels like your argument comes down to handwaving. Circumstantial law is hardly a novel thing.


> Can it? In the same way? It feels like your argument comes down to handwaving. Circumstantial law is hardly a novel thing.

I think that was their point: your argument seems handwavey, because anything can be bad "in certain circumstances".

Hold the door for someone? Seems nice. But you could be insulting them by doing so. Or letting a virus in by having the door open too long. Or wasting energy and contributing to climate change by letting the conditioned air out. Indeed, under certain circumstances, it's bad.


Sure, many things can be "bad" if you are happy to go with increasingly absurd reasoning, but I Think that's quite an unfair misrepresentation of both what I said above and of the arguments that were raised in parliament before this law was introduced. Insulting someone by holding a door open might be "bad" but could you really argue for legislating against it? Bringing in the word bad moves the goalposts quite a bit in order to frame the original position as equally limp and absurd.


They have some more details at https://github.com/DGoettlich/history-llms/blob/main/ranke-4...

Basically using GPT-5 and being careful


I wonder if they know about this, basically training on LLM output can transmit information or characteristics not explicitly included https://alignment.anthropic.com/2025/subliminal-learning/

I’m curious, they have the example of raw base model output; when LLMs were first identified as zero shot chatbots there was usually a prompt like “A conversation between a person and a helpful assistant” that preceded the chat to get it to simulate a chat.

Could they have tried a prefix like “Correspondence between a gentleman and a knowledgeable historian” or the like to try and prime for responses?

I also wonder about the whether the whole concept of “chat” makes sense in 18XX. We had the idea of AI and chatbots long before we had LLMs so they are naturally primed for it. It might make less sense as a communication style here and some kind of correspondence could be a better framing.


we were considering doing that but ultimately it struck us as too sensitive wrt the exact in context examples, their ordering etc.


Thank you that helps to inject a lot of skepticism. I was wondering how it so easily worked out what Q: A: stood for when that formatting took off in the 1940s


that is simply how we display the questions, its not what the model sees - we show the chat-template in the SFT section of the prerelease notes https://github.com/DGoettlich/history-llms/blob/main/ranke-4...


Ok so it was that. The responses given did sound off, while it has some period-appropriate mannerisms, and has entire sections basically rephrased from some popular historical texts, it seems off compared to reading an actual 1900s text. The overall vibe just isn't right, it seems too modern, somehow.

I also wonder that you'd get this kind of performance with actual, just pre-1900s text. LLMs work because they're fed terabytes of text, if you just give it gigabytes you get a 2019 word model. The fundamental technology is mostly the same, after all.


what makes you think we trained on only a few gigabytes? https://github.com/DGoettlich/history-llms/blob/main/ranke-4...


This explains why it uses modern prose and not something from the 19th century and earlier


I just don't like having to send HTML and have the backend deal with what are really frontend problems. Sending JSON is great since you can serialize most reasonable data types into it and then the backend has no responsibility for how it is rendered which helps for having mobile apps use the same backend as the website. Sending HTML just seems nuts since if you change your design you would have to change the backend too.



if you require a backend/frontend split, you're maybe not in the htmx use case

if you can imagine having just one "end", maybe you can use htmx


Isn't finetuning the point of the T5 style models, since they perform better for smaller parameter counts?


It’ll be a major pain in the ass to replicate exactly what they did to make it long context and multimodal. Sucks too because the smol Gemma 3s with same parameter count were neither.


> https://huggingface.co/google/t5gemma-2-1b-1b

From here it looks like it still is long context and multimodal though?

>Inputs and outputs Input:

Text string, such as a question, a prompt, or a document to be summarized

Images, normalized to 896 x 896 resolution and encoded to 256 tokens each

Total input context of 128K tokens Output:

Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document

Total output context up to 32K tokens


If you are finetuning the model you need to replicate the training conditions so you don't remove those capabilities. If you just finetune a multi-modal model on text it will lose some of the vision capabilities as the text part of the model will drift from the vision, audio, etc. models. A similar thing happens with finetuning reasoning models.

Even if you did finetune the models with text and images then you could run into issues with using different descriptions for images to what it was trained with. Though you could probably work around that by getting the model to describe the images, but you'll still need to audit the results to correct any issues or add what you are training for.

You can also run into overfitting if your data does not include enough variations along a given training set that the original model had access to.

Using different training parameters could also affect the models capabilities. Just knowing things like the input context isn't enough.


This is the thing that kills me about SFT. It was sensible when most of the compute in a model was in pretraining and the RL was mostly for question answering. Now that RL is driving model capabilities it doesn't make much sense.

On the other hand, RL on deployed systems looks promising to essentially JIT optimize models. Experiments with model routers and agentic rag have shown good results.


This is very true. However, I wonder how much of this can be mitigated by using training data from other open-source models like Olmo3 for textual data, Emu3.5 for vision?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: