What is antlr?
Antlr stands for “ANother Tool for Language Recognition”. In short it enables you to take some source code written in a particular language and build a parse tree for it.
That sounds hard. Why should I bother?
Good question. Steve Yegge answers it in his blog here. Very simply:
- In case you have a tricky coding style issue that eclipse can’t handle
- You want to write a javadoc style code documentation generator that parses source
- You have a situation that requires a unique and complex refactoring
- You display source code and you want to add syntax colouring
- You need to send bizarre syntax to a router as part of your project
- You need to do a redesign of your complex code base
- You need in a source code base to transfer one coding idiom to another
The essence of his argument is that it is ‘rich programmer food’ – ie something more intellectually engaging to make you grow as a programmer.
Are there any good books on it?
I can recommend this one – written by the author of antlr.
So how do I get started?
Load up eclipse – and do the following:
1. Download antlr 3.0
2. Start a new Eclipse java project
3. Create the following source to be parsed:
package parseable;
public class Class1 {
/**
* @param args
*/
public static void main(String[] args) {
System.out.println("Hello world!");
}
}
4. Add in the antlr jars
antlrworks-1.1.7.jar
5. Get the language grammar file for Java
java.g
6. Change the following line of java.g from this
options {k=2; backtrack=true; memoize=true;}
To this
options {k=2; backtrack=true; memoize=true; output=AST;ASTLabelType=CommonTree; }
7. Generate your java lexer and parser
java -classpath antlrworks-1.1.7.jar org.antlr.Tool Java.g
8. Drop them into your eclipse project
JavaLexer.java
JavaParser.java
9. Add the following parser code
package astprinter;
import generated.JavaLexer;
import generated.JavaParser;
import org.antlr.runtime.ANTLRFileStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.Lexer;
import org.antlr.runtime.Token;
import org.antlr.runtime.tree.CommonTree;
import org.antlr.runtime.tree.CommonTreeAdaptor;
import org.antlr.runtime.tree.TreeAdaptor;
public class ASTPrinter {
public static void main(String[] args) throws Exception {
new ASTPrinter().init();
}
public void init() throws Exception {
// Read the source
CharStream c = new ANTLRFileStream(
"C:/Documents and Settings/User/workspace/"+
"antlr beginners guide/src/parseable/Class1.java");
// create the lexer attached to stdin
Lexer lexer = new JavaLexer(c);
// create the buffer of tokens between the lexer and parser
CommonTokenStream tokens = new CommonTokenStream(lexer);
// create the parser attached to the token buffer
// and tell it which debug event listener to use
JavaParser parser = new JavaParser(tokens);
// launch the parser using the treeadaptor
parser.setTreeAdaptor(adaptor);
// Get the compilation unit item
JavaParser.compilationUnit_return ret = parser.compilationUnit();
// Get the associated tree
CommonTree tree = (CommonTree) ret.getTree();
// Print the tree
printTree(tree, 1);
}
static final TreeAdaptor adaptor = new CommonTreeAdaptor() {
public Object create(Token payload) {
return new CommonTree(payload);
}
};
public void printTree(CommonTree t, int indent) {
System.out.println(t.toString());
printTreeHelper(t, indent);
}
private void printTreeHelper(CommonTree t, int indent) {
if (t != null) {
StringBuffer sb = new StringBuffer(indent);
for (int i = 0; i < indent; i++)
sb = sb.append(" ");
for (int i = 0; i < t.getChildCount(); i++) {
//if (t.getChild(i).getType()==4)
System.out.println(sb.toString() + t.getChild(i).toString()
+ " [" + JavaParser.tokenNames[t.getChild(i).getType()]
+ "]");
printTreeHelper((CommonTree) t.getChild(i), indent + 1);
}
}
}
}
10. Run it – and voila – basic parse tree of your syntax.nil
package ['package']
parseable [Identifier]
; [';']
public ['public']
class ['class']
Class1 [Identifier]
{ ['{']
public ['public']
static ['static']
void ['void']
main [Identifier]
( ['(']
String [Identifier]
[ ['[']
] [']']
args [Identifier]
) [')']
{ ['{']
System [Identifier]
. ['.']
out [Identifier]
. ['.']
println [Identifier]
( ['(']
"Hello world!" [StringLiteral]
) [')']
; [';']
} ['}']
} ['}']
Any thoughts – let me know.