More

timbray · 2025-10-10T17:04:23 1760115863

+1 on Newsblur. I use it every day and it has flaws but nothing that really gets in my way.

timbray · 2025-08-24T00:01:06 1755993666

The tests for the go code at https://github.com/timbray/RFC9839 are in effect test vectors.

RustyRussell · 2025-08-24T06:02:21 1756015341

I want to implement this. My code is in C.

How does this help me check my implementation? I guess I could ask ChatGPT to convert your tests to my code, but that seems the long way around.

djoldman · 2025-08-24T14:46:27 1756046787

https://github.com/timbray/RFC9839/blob/main/unichars.go

I don't know rust at all but I can pretty quickly understand:

    var unicodeAssignables = []runePair{
     {0x20, 0x7E},       // ASCII
     {0xA, 0xA},         // newline
     {0xA0, 0xD7FF},     // most of the BMP
     {0xE000, 0xFDCF},   // BMP after surrogates
     {0xFDF0, 0xFFFD},   // BMP after noncharacters block
     {0x9, 0x9},         // Tab
     {0xD, 0xD},         // CR
     {0x10000, 0x1FFFD}, // astral planes from here down
     {0x20000, 0x2FFFD},
     {0x30000, 0x3FFFD},
     {0x40000, 0x4FFFD},
     {0x50000, 0x5FFFD},
     {0x60000, 0x6FFFD},
     {0x70000, 0x7FFFD},
     {0x80000, 0x8FFFD},
     {0x90000, 0x9FFFD},
     {0xA0000, 0xAFFFD},
     {0xB0000, 0xBFFFD},
     {0xC0000, 0xCFFFD},
     {0xD0000, 0xDFFFD},
     {0xE0000, 0xEFFFD},
     {0xF0000, 0xFFFFD},
     {0x100000, 0x10FFFD},
    }

timbray · 2025-07-06T22:03:36 1751839416

Yeah, for example it's how Java stores strings to this day. But I think it's more or less never transmitted over the Network.

esrauch · 2025-07-06T22:41:20 1751841680

Even if all wire format encoding is utf8, you wouldn't be able to decode these new high codepoints into systems that are semantically utf16. Which is Java and JS at least, hardly "obsolete" targets to worry about.

And even Swift is designed so the strings can be utf8 or utf16 for cheap objc interop reasons.

Discarding compatibility with 2 of the top ~5 most widely used languages kind of reflects how disconnected the author of this is from the technical realities if any fixed utf8 was feasible outside of the most toy use cases.

timbray · 2025-07-06T21:29:42 1751837382

Relevant: https://www.ietf.org/archive/id/draft-bray-unichars-15.html - IETF approved and will have an RFC number in a few weeks.

Tl;dr: Since we're kinda stuck with Uncorrected UTF-8, here are the "characters" you shouldn't use. Includes a bunch of stuff the OP mentioned.

chrismorgan · 2025-07-07T03:04:13 1751857453

The most important bit of that is the “Unicode Assignables” subset <https://www.ietf.org/archive/id/draft-bray-unichars-15.html#...>:

  unicode-assignable =
     %x9 / %xA / %xD /               ; useful controls
     %x20-7E /                       ; exclude C1 controls and DEL
     %xA0-D7FF /                     ; exclude surrogates
     %xE000-FDCF /                   ; exclude FDD0 nonchars
     %xFDF0-FFFD /                   ; exclude FFFE and FFFF nonchars
     %x10000-1FFFD / %x20000-2FFFD / ; (repeat per plane)
     %x30000-3FFFD / %x40000-4FFFD /
     %x50000-5FFFD / %x60000-6FFFD /
     %x70000-7FFFD / %x80000-8FFFD /
     %x90000-9FFFD / %xA0000-AFFFD /
     %xB0000-BFFFD / %xC0000-CFFFD /
     %xD0000-DFFFD / %xE0000-EFFFD /
     %xF0000-FFFFD / %x100000-10FFFD

josephg · 2025-07-07T00:41:25 1751848885

This is really helpful - thanks. I write a CRDT library for text editing. I should probably restrict the characters that I transport to the "Unicode Assignables" subset. I can't think of any sensible reason to let people insert characters like U+0000 into a collaborative text document.

timbray · on June 2, 2024

That crossed my mind when I saw the piece show up on HN. But I think they're already running more or less at capacity.

timbray · on June 2, 2024

You're right, but I didn't realize that till later. Except for the original "Parable of the Sower" was from Jesus not Olivia. But I also thought of Olivia's first.

timbray · on June 2, 2024

A high-quality leather sofa these days is closer to $15K than $1500, ouch.

timbray · on April 19, 2024

I dunno, my Wikipedia entry is about right.

tosh · on April 19, 2024

Same @ my tests w/ video game trivia questions: they might not be extremely popular facts and most humans would struggle to answer them ad-hoc but the facts are in Wikipedia and I'm pretty certain Wikipedia is in the 15T tokens of the training material.

timbray · on Jan 19, 2024

Wow, didn't know about that, thanks. But the query has to be "timbray" not tim bray

timbray · on March 17, 2023

Got me jobs, helped me hire other people, got me a ticket to some of the big technology debates and then helped me win one or two. Gave me a place to write cat obituaries and heavy-metal reviews. Launched Feb 27, 2003 (20 years last month) and I haven't regretted it for a microsecond.

[https://tbray.org/ongoing/]