Even if all wire format encoding is utf8, you wouldn't be able to decode these new high codepoints into systems that are semantically utf16. Which is Java and JS at least, hardly "obsolete" targets to worry about.
And even Swift is designed so the strings can be utf8 or utf16 for cheap objc interop reasons.
Discarding compatibility with 2 of the top ~5 most widely used languages kind of reflects how disconnected the author of this is from the technical realities if any fixed utf8 was feasible outside of the most toy use cases.
This is really helpful - thanks. I write a CRDT library for text editing. I should probably restrict the characters that I transport to the "Unicode Assignables" subset. I can't think of any sensible reason to let people insert characters like U+0000 into a collaborative text document.
You're right, but I didn't realize that till later. Except for the original "Parable of the Sower" was from Jesus not Olivia. But I also thought of Olivia's first.
Same @ my tests w/ video game trivia questions: they might not be extremely popular facts and most humans would struggle to answer them ad-hoc but the facts are in Wikipedia and I'm pretty certain Wikipedia is in the 15T tokens of the training material.
Got me jobs, helped me hire other people, got me a ticket to some of the big technology debates and then helped me win one or two. Gave me a place to write cat obituaries and heavy-metal reviews. Launched Feb 27, 2003 (20 years last month) and I haven't regretted it for a microsecond.