The rotary embeddings bit is neat. I wonder if a complex representation would si...

johndough · on May 16, 2024

Some implementations use a complex rotary encoding, but it makes it a bit harder to port to platforms or frameworks which do not support complex numbers natively.

6gvONxR4sf7o · on May 16, 2024

The tensor cores that do the bulk of the flops on the bulk of the gpus people use are just various sizes of floats, i think. We're in a funny position where progress in models and progress in hardware are kind of linked.

As far as expressive power goes, it shouldn't make a difference for the models in common use, but I could totally imagine models where it improves readability.