Due: Monday, September 25, 2017, 11:59 pm

Part I: Lexical Analysis

The first thing we have to do with any programming language is build a lexical analyzer that converts a text string into a series of tokens. There are lots of tools for handling this. For this assignment, you'll make use of the Flex lexical analysis tool and build a few different parsers.


Flex is an open source lexical analysis tool. The way it works is you first create an input file that defines symbols and production rules that describe how to parse a text file. Then you run Flex and it produces a C file. You can then compile the C file and run it on some input.

As an example, consider the following Hello World program for Flex. The line beginning with blah defines a rule that says to replace instances of blah with hello world.

			 * Hello World: replace "blah" with "hello world"
			 blah    printf("hello world");
			 int main ( int argc, char *argv[] ) {

If you type the above into a file called replace.yy and then execute the following commands, you get the resulting behavior.

$ flex replace.yy
$ gcc -o repl lex.yy.c -ll
$ echo "blah and another blah" | ./repl
hello world and another hello world

The documentation for flex provides many more examples and a description of the syntax.

Note that flex is based on regular expressions, which are similar to the extended BNF notation we've gone over in class. There are lots of documentation for regular expressions, and they can be very useful in many situations.

  1. Using flex, make a program called encode that takes any character in a-z or A-Z and shifts it 13 spaces forward in the alphabet, with wraparound from z back to a. Run it on a text file of your choice. You can test that it is working correctly by running the output of encode through encode again. The result should be the original document.

  2. Using Flex, make a program that reads in a text file and tells you not only the number of rows and characters, but also how many of each vowel [a, e, i, o, u] are in the file.

  3. Make a program that strips an html file of all tags and comments. It should also strip the file of all whitespace except for a blank line wherever there used to be a <p> tag. Test it on this web page. Here is a downloadable version.

    Note that this is challenging to do in a way that make the output reasonably formatted. As an extension, you can make your program do more sophisticated things like replace &gt; with >.

  4. Implement a parser for Clite. For now, the output of your program should just be a sequence of strings, one per line. Please use the following strings.

    Integer-<value> A sequence of one or more digits.
    Float-<value> An integer, a dot, and another integer.
    Keyword-<word> if | else | while | for | int | float
    Identifier-<name> Legal names/identifiers.
    Assignment =
    Comparison-<symbol> == | < | > | <= | >=
    Operator-<symbol> + | - | * | /
    Open-bracket {
    Close-bracket }
    Open-paren (
    Close-paren )

    Ideally, your program should also ignore comments. You can test your program on this example. You should get something like this result.

Part II: Selected Langauges

In this part of the project, you are expected to select two languages from the four you studied last week. I recommend you to select SQL cautiously, since you may find it is difficult to use SQL to implement some features in the following projects.


  1. Organize your wiki page. The wiki page for this course should have a good structure. There should be a top-level link for the course and two subdiretories for the two selected languages. For each language, there should be a link for every project including Project 1. You can either reorganize the wiki page for Project 1 into the appropriate subdirectories, or create a new page for each language and copy its description from Project 1.
  2. For each of the two lauguages, create a wiki page for Project 2.
  3. The first section of each wiki page lists out the language resources. Spend some time searching for useful resources, documentation, and formal descriptions of the language. Organize them in a useful way. Think of it as your reference manual for the rest of the semester, so make it useful for you.
  4. Write a "hello world" program for each language. The snapshot of the program and its outputs are the second section of the wiki page.


  1. Get comments working properly in your parser for Clite. Be sure to try out a variety of examples.
  2. Make the encoder more complex.
  3. Make the parser for analyzing a document to some more interesting things.
  4. A perfectly acceptable extension for any assignment going forward is to do a third language. The third language should have its own subdirectory. Under this subdirectory, there should be a set of links for all projects.


The submission of this project has three components:

  1. Codes: Your source codes for tasks of part I and the "hello world" programs of the two selected languages should be submitted to the fileserver. If you pick a third language as your extension, its "hello world" program should be submitted to the fileserver too. Each task of part I should have a .yy file, .c file, and executable file. The filename should specify the task name (e.g., task1.yy, task1.yy.c, task1). The name of the "hello world" program should be HelloWorld.xx. Note that the quality of your comments counts toward your grade.
  2. README: You need to submit a README file to the fileserver for your parsers of this project. The README file can be a .txt file. It should be well-organized in a way that readers can easily know the usage and the outputs of every parser. In addition, the known bugs of each parser should also be indicated in the README file.
  3. Wiki write-up: The write-up of this project should have the following elements.
    • Title of the project and your name
    • Resources for the selected languages
    • The snapshot of the "hello world" programs
    • The outputs of the "hello world" programs

    To check whether you've made your write-up well organized, you may ask yourself the following questions:

    • Is there a main page that indicates the main purpose of the page collection (e.g. this is a set of pages describing how a particular language implements standard programming language features)?
    • Does the main page have a title (e.g. Ying Li CS 333, Ying’s Explorations in R, Ying’s Notes in JS)?
    • On the main page, are there clearly organized links to supporting pages?
    • Does each supporting page have a title that indicates its role in the set of pages?

    Here is a recommended layout.

    	Ying Li CS333
    	|_Ying’s Explorations in R
    	| |
    	| |_P1
    	| |_P2
    	| |_...
    	|_Ying’s Notes in JS
    	| |
    	| |_P1
    	| |_P2
    	| |_...

Please note that it is your responsibility to explicitly indicate the extensions you have taken. If you take the extensions in parsers, please indicate explicitly in the codes and README file. If you take the extension in selected languages, please indicate explicitly in your wiki write-up.

© 2017 Ying Li. Page last modified: .