Saturday, 15 March 2008

10 steps to beginning to parsing with antlr

“I’d love to get started with antlr – but I need some help to get started.” Does this sound familiar?

What is antlr?

Antlr stands for “ANother Tool for Language Recognition”. In short it enables you to take some source code written in a particular language and build a parse tree for it.

That sounds hard. Why should I bother?

Good question. Steve Yegge answers it in his blog here. Very simply:



  1. In case you have a tricky coding style issue that eclipse can’t handle


  2. You want to write a javadoc style code documentation generator that parses source


  3. You have a situation that requires a unique and complex refactoring


  4. You display source code and you want to add syntax colouring


  5. You need to send bizarre syntax to a router as part of your project


  6. You need to do a redesign of your complex code base


  7. You need in a source code base to transfer one coding idiom to another


The essence of his argument is that it is ‘rich programmer food’ – ie something more intellectually engaging to make you grow as a programmer.

Are there any good books on it?

I can recommend this one – written by the author of antlr.





So how do I get started?

Load up eclipse – and do the following:





1. Download antlr 3.0

2. Start a new Eclipse java project

3. Create the following source to be parsed:


package parseable;

public class Class1 {
/**
* @param args
*/
public static void main(String[] args) {
System.out.println("Hello world!");
}
}


4. Add in the antlr jars

antlrworks-1.1.7.jar



5. Get the language grammar file for Java

java.g



6. Change the following line of java.g from this

options {k=2; backtrack=true; memoize=true;}

To this

options {k=2; backtrack=true; memoize=true; output=AST;ASTLabelType=CommonTree; }



7. Generate your java lexer and parser

java -classpath antlrworks-1.1.7.jar org.antlr.Tool Java.g



8. Drop them into your eclipse project

JavaLexer.java

JavaParser.java





9. Add the following parser code


package astprinter;

import generated.JavaLexer;
import generated.JavaParser;
import org.antlr.runtime.ANTLRFileStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.Lexer;
import org.antlr.runtime.Token;
import org.antlr.runtime.tree.CommonTree;
import org.antlr.runtime.tree.CommonTreeAdaptor;
import org.antlr.runtime.tree.TreeAdaptor;

public class ASTPrinter {

public static void main(String[] args) throws Exception {
new ASTPrinter().init();
}

public void init() throws Exception {
// Read the source

CharStream c = new ANTLRFileStream(
"C:/Documents and Settings/User/workspace/"+
"antlr beginners guide/src/parseable/Class1.java");

// create the lexer attached to stdin
Lexer lexer = new JavaLexer(c);

// create the buffer of tokens between the lexer and parser
CommonTokenStream tokens = new CommonTokenStream(lexer);

// create the parser attached to the token buffer
// and tell it which debug event listener to use
JavaParser parser = new JavaParser(tokens);

// launch the parser using the treeadaptor
parser.setTreeAdaptor(adaptor);

// Get the compilation unit item
JavaParser.compilationUnit_return ret = parser.compilationUnit();

// Get the associated tree
CommonTree tree = (CommonTree) ret.getTree();

// Print the tree
printTree(tree, 1);
}

static final TreeAdaptor adaptor = new CommonTreeAdaptor() {
public Object create(Token payload) {
return new CommonTree(payload);
}
};

public void printTree(CommonTree t, int indent) {
System.out.println(t.toString());
printTreeHelper(t, indent);
}

private void printTreeHelper(CommonTree t, int indent) {
if (t != null) {
StringBuffer sb = new StringBuffer(indent);
for (int i = 0; i < indent; i++)
sb = sb.append(" ");
for (int i = 0; i < t.getChildCount(); i++) {
//if (t.getChild(i).getType()==4)
System.out.println(sb.toString() + t.getChild(i).toString()
+ " [" + JavaParser.tokenNames[t.getChild(i).getType()]
+ "]");
printTreeHelper((CommonTree) t.getChild(i), indent + 1);
}
}
}
}



10. Run it – and voila – basic parse tree of your syntax.nil
package ['package']
parseable [Identifier]
; [';']
public ['public']
class ['class']
Class1 [Identifier]
{ ['{']
public ['public']
static ['static']
void ['void']
main [Identifier]
( ['(']
String [Identifier]
[ ['[']
] [']']
args [Identifier]
) [')']
{ ['{']
System [Identifier]
. ['.']
out [Identifier]
. ['.']
println [Identifier]
( ['(']
"Hello world!" [StringLiteral]
) [')']
; [';']
} ['}']
} ['}']


Any thoughts – let me know.