Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This idea was used to great effect in Matt Might’s “Parsing with Derivatives” paper [0]! And it featured prominently in the Compilers class he taught at the University of Utah.

[0] https://matt.might.net/papers/might2011derivatives.pdf



Excellent paper. I recall it goes past simple/regular languages and can be used to parse anything without first building a lexer.


Looks very interesting. Does it mean that RegExp Derivatives can be used to parse languages which basic regular expressions cannot?

As an example basic HTML cannot (?) be parsed by RegExp because tag-pairs can contain tag-pairs:

   <div> <div> </div> </div>   
eludes RegExp matching, it seems to me, because a typical standard RegExp would only match "<div> <div> </div>" and would not see the 2nd </div>.

Can RegExp Derivatives do it better?


HTML can be described by a context-free grammar [0], but not by a regular grammar [1]. If a language can be described by a regular grammar, you can parse it with a regular expression -- that's where the "regular" in RegExp comes from!

Derivatives of RegExps don't automatically unlock parsing of context-free grammars, afaik. For that you need recursion. They do however unlock some very elegant parser designs.

[0] https://en.wikipedia.org/wiki/Context-free_grammar

[1] https://en.wikipedia.org/wiki/Regular_grammar


Read up on context sensitive grammars.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: