This idea was used to great effect in Matt Might’s “Parsing with Derivatives” pa...

banish-m4 · on May 12, 2024

Excellent paper. I recall it goes past simple/regular languages and can be used to parse anything without first building a lexer.

galaxyLogic · on May 11, 2024

Looks very interesting. Does it mean that RegExp Derivatives can be used to parse languages which basic regular expressions cannot?

As an example basic HTML cannot (?) be parsed by RegExp because tag-pairs can contain tag-pairs:

   <div> <div> </div> </div>

eludes RegExp matching, it seems to me, because a typical standard RegExp would only match "<div> <div> </div>" and would not see the 2nd </div>.

Can RegExp Derivatives do it better?

sigil · on May 12, 2024

HTML can be described by a context-free grammar [0], but not by a regular grammar [1]. If a language can be described by a regular grammar, you can parse it with a regular expression -- that's where the "regular" in RegExp comes from!

Derivatives of RegExps don't automatically unlock parsing of context-free grammars, afaik. For that you need recursion. They do however unlock some very elegant parser designs.

[0] https://en.wikipedia.org/wiki/Context-free_grammar

[1] https://en.wikipedia.org/wiki/Regular_grammar

banish-m4 · on May 12, 2024

Read up on context sensitive grammars.