Hello Racc
30 May 2007
When I said HelloCup I was looking at a yacc based parser in a language that didn't require me to handle my dirty pointers. Another alternative to play with is Ruby which now has a yaccish parser built in to the standard library - inevitably called racc.
Racc has an interesting interplay between ruby and grammar syntax. You define the grammar with a racc file which will generate a parser class.
Again I'll do my simple hello world case. The input text is
item camera item laser
I'll populate item objects inside a catalog, using the following model classes.
class Item
attr_reader :name
def initialize name
@name = name
end
end
class Catalog
extend Forwardable
def initialize
@items = []
end
def_delegators :@items, :size, :<<, :[]
end
Forwardable is a handy library that allows me to
delegate methods to an instance variable. In this case I delegate a
bunch of methods to the @items list.
I test what I read with this.
class Tester < Test::Unit::TestCase
def testReadTwo
parser = ItemParser.new
parser.parse "item camera\nitem laser\n"
assert_equal 2, parser.result.size
assert_equal 'camera', parser.result[0].name
assert_equal 'laser', parser.result[1].name
end
def testReadBad
parser = ItemParser.new
parser.parse "xitem camera"
fail
rescue #expected
end
end
To build the file and run the tests I use a simple rake file.
# rakefile... task :default => :test file 'item.tab.rb' => 'item.y.rb' do sh 'racc item.y.rb' end task :test => 'item.tab.rb' do require 'rake/runtest' Rake.run_tests 'test.rb' end
The racc command needs to be installed on your
system. I did it the easy way on Ubuntu with
apt-get. It takes the input file and creates one named
inputFileName.tab.rb.
The parser grammar class is a special format, but one that's pretty familiar to yaccish people. For this simple example it looks like this:
#file item.y.rb...
class ItemParser
token 'item' WORD
rule
catalog: item | item catalog;
item: 'item' WORD {@result << Item.new(val[1])};
end
The tokens clause declares the token's we get from the lexer. I
use the string 'item' and WORD as a
symbol. The rule clause starts the production rules which are in the
usual BNF form for yacc. As you might expect I can write actions
inside curlies. To refer to the elements of the rule I use the
val array, so val[1] is the equivalent to
$2 in yacc (ruby uses 0 based array indexes, but I've
forgiven it). Should I wish to return a value from the rule
(equivalent to yacc's $$) I assign
it to the variable result.
The most complicated part of using racc is to sort out the lexer.
Racc expects to call a method that yields tokens, where each token is a
two-element array with the first element being the type of token
(matching the token declaration) and the second element the value
(what shows up in val - usually the text). You mark the
end of the token stream with [false, false]. The sample
code with racc uses regular expression matching on a string. A better
choice for most cases is to use StringScanner, which is
in the standard ruby library.
I can use this scanner to convert a string into an array of tokens.
#file item.y.rb....
---- inner
def make_tokens str
require 'strscan'
result = []
scanner = StringScanner.new str
until scanner.empty?
case
when scanner.scan(/\s+/)
#ignore whitespace
when match = scanner.scan(/item/)
result << ['item', nil]
when match = scanner.scan(/\w+/)
result << [:WORD, match]
else
raise "can't recognize <#{scanner.peek(5)}>"
end
end
result << [false, false]
return result
end
To integrate the scanner into the parser, racc allows you to
place code into the generated parser class. You do this by adding code
to the grammar file. The declaration ---- inner marks the
code to go inside the generated class (you can also put code at the
head and foot of the generated file). I'm calling a parse
method in my test, so I need to implement that.
#file item.y.rb.... ---- inner attr_accessor :result def parse(str) @result = Catalog.new @tokens = make_tokens str do_parse end
The do_parse method initiates the generated
parser. This will call next_token to get at the next
token, so we need to implement that method and include it in the
inner section.
#file item.y.rb.... ---- inner def next_token @tokens.shift end
This is enough to make racc work with the file. However as I play with it I find the scanner more messy than I would like. I really just want it to tell the lexer what patterns to match and what to return with them. Something like this.
#file item.y.rb.... ---- inner def make_lexer aString result = Lexer.new result.ignore /\s+/ result.keyword 'item' result.token /\w+/, :WORD result.start aString return result end
To make this work I write my own lexer wrapper over the base functionality provided by StringScanner. Here's the code to set up the lexer and and handle the above configuration.
class Lexer...
require 'strscan'
def initialize
@rules = []
end
def ignore pattern
@rules << [pattern, :SKIP]
end
def token pattern, token
@rules << [pattern, token]
end
def keyword aString
@rules << [Regexp.new(aString), aString]
end
def start aString
@base = StringScanner.new aString
end
To perform the scan I need to use StringScanner to compare the rules against the input stream.
class Lexer...
def next_token
return [false, false] if @base.empty?
t = get_token
return (:SKIP == t[0]) ? next_token : t
end
def get_token
@rules.each do |key, value|
m = @base.scan(key)
return [value, m] if m
end
raise "unexpected characters <#{@base.peek(5)}>"
end
I can then alter the code in the parser to call this lexer instead.
#file item.y.rb.... ---- inner def parse(arg) @result = Catalog.new @lexer = make_lexer arg do_parse end def next_token @lexer.next_token end
As well as giving me a better way to define the rules, this also allows the grammar to control the lexer because it's only grabbing one token at a time - this would give me a mechanism to implement lexical states later on.
On the whole racc is pretty easy to set up and use - providing you know yacc. The documentation is on the minimal side of sketchy. There's a simple manual on the website and some sample code. There's also a very helpful presentation on racc. I also got a few tips from our Mingle team who've used it for a nifty customization language inside Mingle.

