peg/leg parser generator

Monday August 3, 2009

I’ve been on the hunt for a good parser generator for quite a while now. bison/yacc is starting to get a bit long in the tooth, but despite quite hard I have not been able to find anything good to replace it with.

All I wanted was something simple that could generate C or C++ code, and ideally would combine the lexer and parser into one process. It turns out there’s not that many of these around, and the ones that do exist, well, for some reason the authors don’t seem to want them to actually be able to run on anyone’s computer. (don’t even start me on ANTLR…)

But then I found peg. And it’s cousin, leg. peg uses Parsing Expression Grammars, which allow you to do cool stuff like have optional or repeated sections. You’ll probably just want to use leg for most yacc-style parsing, as it adds the ability for things to have values associated with them. See calc.leg (included) for a good example.

So being the helpful soul that I am, I’ve taken the liberty of putting up some nice easy Win32 binaries of it here, for you all to download. I also fixed a nasty stack overflow bug in there.

Here’s some useful rules you’ll want to use, if you’re parsing most C-esque yaccy things.

-	= ( [ \t] | EOL )*		# eat any whitespace
--  = &[^a-zA-Z0-9_] -		# used after any keyword, to enforce a break between the next keyword
EOF = !.
EOL = [\n\r]

Use the first rule (the “minus” rule) directly after any punctuation. (e.g. sum = expr ‘*’- expr). Use the second rule (the “double minus” rule) directly after any keywords (e.g. statement = ‘while’— expr).

Enjoy! [97.34KB] - peg/leg Win32 binaries [47.47KB] - peg/leg Win32 source

— Kayamon

Monday August 3, 2009

  1. Did you see Coco …

    We used it for our internal scripting language instead of lexx/yacc and loved it.

    — John Brandwood

    · Sep 2, 12:59 PM · #

  2. Thank you for the windows source. Have you tried a re2c and lemon combo? That was my next option until I found peg.

    — Ryan

    · Feb 20, 09:42 AM · #

(optional, only used to notify you of replies)