I haven't looked into it in years, but would the inverse of a block bi-diagonal matrix have some semiseperable structure? Maybe that would be good to look into?
just to be clear, semiseparate in this context means H = D + CC', where D is block diagonal and C is tall & skinny?
If so, it would be nice if this were the case, because you could then just use the Woodbury formula to invert H. But I don't think such a decomposition exists. I tried to exhaustively search through all the decompositions of H that involved one dummy variable (of which the above is a special case) and I couldn't find one. I ended up having to introduce two dummy variables instead.
> just to be clear, semiseparate in this context means H = D + CC', where D is block diagonal and C is tall & skinny?
Not quite, it means any submatrix taken from the upper(lower) part of the matrix has some low rank. Like a matrix is {3,4}-semiseperable if any sub matrix taken from the lower triangular part has at most rank 3 and any submatrix taken from the upper triangular part has at most rank 4.
The inverse of an upper bidiagonal matrix is {0,1}-semiseperable.
There are a lot of fast algorithms if you know a matrix is semiseperable.
X heavily relied on the primary and secondary selections for performing operations in lieu of an explicit clipboard. It is built into the protocol. The only policy is where that paste binding defaulted to.
X doesn't even enforce primary or secondary selections, they have no special meaning to the protocol. What is built on the protocol is this mechanism to do clipboard-like things. Even how many actual clipboard thingies you have is policy and not builtin into the protocol.
Not at all. Unless you specifically coded this handler for the middle button and wrote code for fetching selections and all, you would not get this behavior. It would be easier for the middle button to do nothing.
You may be thinking of toolkits like Gtk+ or Qt which implement this behavior, but it is really just a convention shared by many desktop toolkits rather than anything
defined by X11.
I cannot stand the Windows user experience in their command line. The Linux method actually has to software registries that allow for different content to be copied and pasted.
Oh I used CTRL+C to copy something but I need something copied first, highlight paste with middle mouse and paste with CTRL+P.
On Windows you must destroy the content of the CTRL+C and replace it with what the middle mouse can do, go back to the first source to copy and paste again.
You want a clipboard manager/history. You are using middle button paste as a work around for how hard it is to find a good clipboard manager (I'm not sure if one exists...)
I don't really love this solution since it runs into all the usual linked list issues, and is only 'allocation free' in the sense that the pointers are allocated with the structure if doing the intrusive thing they are talking about. Using the std::vector of pointers approach isn't going to be using crazily more memory.
Myself, I like to just allocate a too big block and shove everything into that then deal with indicies into that array if I care about performance. you can even flatten the tree in a way to get better locality if you care about that.
whether you allocate a big array and store indices into it or whether you keep pointers to dynamically allocated memory is orthogonal to how you actually represent the hierarchy.
> I think it really depends on what kind of anime you’re talking about
Does it? If I draw a naked stick figure with boobs and say it is 14, is that morally wrong? At what point should a person care? Their point is that a drawing doesn't hurt people right?
Just because it’s hard to spot the point where it becomes immoral doesn’t mean it’s not immoral. I can’t tell you at what point a person should care, and I wouldn’t want to be the judge of that. My point is that saying they’re looking at “anime” is really downplaying what’s happening.
I don’t personally believe the drawings we’re referring to hurt anyone, but that had nothing to do with my argument anyway. Many people will be disgusted by it, and others will not, meanwhile most people seem to be okay with mainstream anime.
> It can in certain circumstances encourage a market or normalise abusive behaviour.
Just like the printed word. Books should be banned and burned. We should start with Orwell since his writing has been used as a manual for so much abusive behaviour.
> Can it? In the same way? It feels like your argument comes down to handwaving. Circumstantial law is hardly a novel thing.
I think that was their point: your argument seems handwavey, because anything can be bad "in certain circumstances".
Hold the door for someone? Seems nice. But you could be insulting them by doing so. Or letting a virus in by having the door open too long. Or wasting energy and contributing to climate change by letting the conditioned air out. Indeed, under certain circumstances, it's bad.
Sure, many things can be "bad" if you are happy to go with increasingly absurd reasoning, but I Think that's quite an unfair misrepresentation of both what I said above and of the arguments that were raised in parliament before this law was introduced. Insulting someone by holding a door open might be "bad" but could you really argue for legislating against it? Bringing in the word bad moves the goalposts quite a bit in order to frame the original position as equally limp and absurd.
I’m curious, they have the example of raw base model output; when LLMs were first identified as zero shot chatbots there was usually a prompt like “A conversation between a person and a helpful assistant” that preceded the chat to get it to simulate a chat.
Could they have tried a prefix like “Correspondence between a gentleman and a knowledgeable historian” or the like to try and prime for responses?
I also wonder about the whether the whole concept of “chat” makes sense in 18XX. We had the idea of AI and chatbots long before we had LLMs so they are naturally primed for it. It might make less sense as a communication style here and some kind of correspondence could be a better framing.
Thank you that helps to inject a lot of skepticism. I was wondering how it so easily worked out what Q: A: stood for when that formatting took off in the 1940s
Ok so it was that. The responses given did sound off, while it has some period-appropriate mannerisms, and has entire sections basically rephrased from some popular historical texts, it seems off compared to reading an actual 1900s text. The overall vibe just isn't right, it seems too modern, somehow.
I also wonder that you'd get this kind of performance with actual, just pre-1900s text. LLMs work because they're fed terabytes of text, if you just give it gigabytes you get a 2019 word model. The fundamental technology is mostly the same, after all.
I just don't like having to send HTML and have the backend deal with what are really frontend problems. Sending JSON is great since you can serialize most reasonable data types into it and then the backend has no responsibility for how it is rendered which helps for having mobile apps use the same backend as the website. Sending HTML just seems nuts since if you change your design you would have to change the backend too.
It’ll be a major pain in the ass to replicate exactly what they did to make it long context and multimodal. Sucks too because the smol Gemma 3s with same parameter count were neither.
If you are finetuning the model you need to replicate the training conditions so you don't remove those capabilities. If you just finetune a multi-modal model on text it will lose some of the vision capabilities as the text part of the model will drift from the vision, audio, etc. models. A similar thing happens with finetuning reasoning models.
Even if you did finetune the models with text and images then you could run into issues with using different descriptions for images to what it was trained with. Though you could probably work around that by getting the model to describe the images, but you'll still need to audit the results to correct any issues or add what you are training for.
You can also run into overfitting if your data does not include enough variations along a given training set that the original model had access to.
Using different training parameters could also affect the models capabilities. Just knowing things like the input context isn't enough.
This is the thing that kills me about SFT. It was sensible when most of the compute in a model was in pretraining and the RL was mostly for question answering. Now that RL is driving model capabilities it doesn't make much sense.
On the other hand, RL on deployed systems looks promising to essentially JIT optimize models. Experiments with model routers and agentic rag have shown good results.
This is very true. However, I wonder how much of this can be mitigated by using training data from other open-source models like Olmo3 for textual data, Emu3.5 for vision?
reply