## An CFG for Roman numerals

In one of my assignments for **Compiler** course at my university, I have to deal with this problem: b**uild an attributed grammar tree in order to recognize Roman numbers, and convert it into Arabic format**. Well, it doesn’t seem a simple mission.

For the people who forgot everything which was taught in 3rd grade school, Roman numerals use 7 letter I, V, X, L, C, D, M to represent 1, 5, 10, 50, 100, 500, and 1000. Of course there are some other rules to make sure a string of these letters is a legal representation. To express larger values, Roman numerals use some special symbols as described here.

To begin with, I have to find an CFG which recognize a Roman numeral string. The first thing I think about is list all of possible roman numbers, and i got this:

S -> I | II | III | IV | V | VI | VII ….

Well, it’s still a choice, but not really a smart choice. Although the numbers which roman numerals can represent is finite, but listing all of possible chances like above is really a hard work.

So I remember a book that i’ve read for long time ago: *Dive into Python*. It has a chapter discussing on RE (Regular Expression) only, and in this chapter, authors gave an example about how to recognize Roman numbers using RE. As you know, RE can be seen as a “subset” of CFG, so it seem i can utilize it to solve my problem. So cool… lol…

The RE is as follow:

pattern = '^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$'

And I have to convert it to CFG. Not a problem:

G = {{S, T, H, K, U}, {I, V, X, L, C, D, M}, P, {S}}

with the production law set P following:

S -> T H K U

T -> M | MM | MMM | MMMM | epsilon // recognize the thousand part.

H -> CM | CD | (D|epsilon)(epsilon|C|CC|CCC) // hundred part

K -> XC | XL | (L|epsilon)(epsilon|X|XX|XXX) // ten-part

U -> IX | IV | (V|epsilon)(epsilon|I|II|III) // unit part

This grammar can recognize values which are less than MMMMDCCCLXXXVIII only, but it’s pretty good enough. I’m not sure this is the best way to describe that RE in CFG, but at least, i did solve my problem with it. My remain works are just add some attributes and semantic rule to convert it into arabic form. Pretty easy if you attended a course about Compiler ^^.

Thanks much to the guys writing *Dive into Python*, one of the most interesting hand-on-lab book I have ever read. It helped me save times for other stuffs ^^.

## Recent Comments