Context Free Grammar Visualization Using SVG


Abstract


Context free grammars are usually described in Backus Naur Form (BNF) and are a common part of computer programming language specifications. Visual representations of CFG productions are sometimes used in documentation to give a high-level overview of the different parts of a language that is easy to understand, yet comprehensive, in that it describes the full power of the language. There is a standard flow graph notation for visualizing grammar productions. This flow graph notation is very similar to a finite state machine graph with one start and one final state. Examples of the use of this type of visualization include the Oracle SQL language documentation [19] and the ANTLRworks IDE for the ANTLR compiler compiler language [5]. The World Wide Web consortium (W3C) uses a standard notation for the BNF grammar included in all of its language specifications documents.

This paper describes an application for CFG visualization SVG. Grammar productions are parsed using ANTLR and an SVG graph is generated using Java and the Velocity template language. The grammar visualization tool is used to present an augmented view of W3C specifications. This augmented view displays a W3C specification side-by-side with a visual representation of each grammar production of the specification's language. This application works within a web browser and uses Javascript to bring SVG to life by adding interactivity and animations. The user can move between grammar productions by clicking on the production references that appear in the current flow graph, or by selecting an item from a full list of grammar productions. The W3C specification document is kept in sync with the currently displayed graph, so that the user can read the textual description and related documentation from the specficiation document while looking at the production graph.


Table of Contents

Introduction
Background Information
Visualizing Production Rules
Grammar Production Visualization Implementation
Grammar Production Language
SVG Code Generation
Visualization Templates
Velocity Template Language for SVG Generation
Web Page Parsing with Java and Regular Expressions
Final Step - Web Application inside an Apache Tomcat Web Server as a JEE Servlet
Development Environment
Bibliography

Grammar productions are an essential part of the description of any programming language and are usually included in programming language specifications. Productions can be also represented in a more visual way in the form of a diagram, and such diagrams are often included in programming language documentation. The online docuemntation for the Oracle database product for example includes such visual diagrams for each element of the Oracle SQL programming language.


Grammar production graphs are a very useful documentation aid as demonstrated in the above example. They represent and intuitive, easy-to-understand view of what is otherwise very dry, terse, technical material.

The XML Language Specification, issued by the World Wide Web Consortium (W3C) represents a complete description of the XML language. A core part of the specification is the BNF grammar definition of the XMl language. The entire specifications document is structured around the various productions of the BNF grammar, with productions separated into groups, which appear in sections alongside detailed descriptions of the meaning of each element represented by each production rule. While the rules are listed in their BNF form, the graphical form of productions is not included. This paper describes a project, whose aim is to provide an enhanced view of the XML (and other) language specification documents from the W3C. This enhanced view inlcudes a navigable graphical representation of the BNF productions and appears alongisde the original specifications document. The graphs of all the rules from an individual section of the document are displayed alongisde that section of the original document and the user is able to navigate through the specification by clicking on references to other production rules, and observing a synced-up view of the main specifications document and the current production graph.


The screenshot above shows a full view of the application window, which includes the annotated version of the W3C XML Language Specification displayed in a browser window. The graphical represnetation of the rules from the current section of the specifications document appears in the top part of the window with the original specifications document below it. The top section of the window includes a drop-down box, which can be used to navigate between the sections of document and display a particular section of the specification alongisde its productions' graphs.

The production graphs are visualized using SVG - an open web standard for vector graphics on the web. SVG is an excellent choice for a web application such as this, due to its good integration with HTML and JavaScript. SVG code can be directly embedded inside HTML and its structure is fully exposed for programmatic manipulation using JavaScript - the standard language for web programming inside the browser.

Grammar productions consist of only a handful of basic operations, each of which requires a distinct form of visual representation. All grammar productions are built using combiantions of these basic operators. As a first step a set of templates, one for each basic operator, were produced. These templates consisted of static SVG code and are shown and described here.