Project 8: Assembler
Due November 28, 2017
The purpose of this project is to write a 2-pass assembler that converts a mnemonic assembly language into the machine code for the CPU you built in project 7. This is the third part of three coordinated projects.
The overall task is to create a two-pass assembler for your CPU. An assembler converts a set of instructions written in a simple mnemonic language into machine code appropriate for loading onto your machine. In this case, you will generate an MIF file for the ROM that can be read by Quartus when compiling your CPU. An advanced assembler may also generate a separate MIF for the RAM in case there are variables defined in the assembly code that need to be preloaded into memory.
You are free to use whatever language you like to write the assembler. My example code will be provided in Python, and its string handling capabilities make it fairly simple to use for this task. Dictionaries also come in very handy (hint).
Reference the CPU Design when building your assembler.
The assembly language you need to support is given in the assembly language design document.
Download the assembler template. It
contains a function that reads and tokenizes a file. A token is a
string separated by spaces from its neighbors. It also contains
python functions for coverting a decimal number into an 8-bit unsigned
or 2's complement binary string.
Tokenizing a file means the function reads through the file and creates a token for each individual element of the file, separating numbers and symbols into individual strings. The tokenizer also converts all characters to lower case, meaning the assembler is case insensitive. The output of the tokenize function is a list of lists. Each sub-list corresponds to single line in the file and contains a list of the individual strings for that line.
For example, the file:
start: movei 8 RA movei 8 RB
becomes the list:
[ ['start:'], ['movei', '8', 'ra'], ['movei', '8', 'rb'] ]
Symbols in assembly language are used by branching operations. Rather
than require the programmer to know the address of each instruction,
the programmer can place symbols within the code and use those symbols
as targets for branch instructions.
The first pass of an assembler needs to figure out what line number corresponds to each symbol. In the second pass, the assembler generates the machine code for each instruction and fills in the address values for branch instructions from the symbol table.
If the assembly language allowed symbols to be attached to locations in the data RAM, the first pass would also have to calculate those as well. It would, for example, put the first such symbol at location 0, the second at location 1, and so on.
The output of the first pass through the code should be a dictionary with the symbols as the keys and the line number as the value. The assembler should count only actual machine instruction line numbers. You may want your assembler to detect duplicate symbols and report an error.
The first pass can also remove lines with labels from the tokens, since they are no longer necessary.
The second pass should take in the tokens and the label dictionary and
generate the set of machine instructions. The first token in each
line should be the instruction. The interpretation of the rest of the
tokens on each line is instruction dependent.
The output of the second pass should be a list of machine instructions, where each instruction is a string of 1s and 0s representing the 16-bit binary instruction.
Create a main function that opens a file, tokenizes it, runs the two
passes, and then prints out the machine instructions in an MIF
format appropriate for use by Quartus.
Note that the python expression
print "%02X : %s;" % (line, instr)
will print out the numeric value in the variable line as a 2-digit hex number and then replace the %s with the string in the variable instr.
- Write the fibonnacci program from last week in assembly. Compile it and demonstrate that it works in simulation.
- Write a recursive program that sums numbers from 1 to N in assembly and demonstrate that it works, up to the range of the numbers that can be represented.
- Demonstrate your programs, generated by your assembler, on the board and set up the output port so it writes to the four 7-segment displays. Use programs that test the capabilities of your CPU.
- Make your assembler more intelligent so that it can catch errors, tell the user what the error is, and possibly suggest corrections.
Extend the instruction set in some way that makes writing a program
easier. For example, set up a method that can handle functions with
Consider the following possible assembly code.
movei 15 RA movei 20 RB call mul10 RA RB pop RA oport RA halt mul10: AA AB R # function label with two arguments and a return value push RA push RB loada RA AA loada RB AB add RA RB RA storea RA R pop RB pop RA return
Consider modifying the assembler so that it handles pushing a return value space, RA and RB onto the stack before executing the CALL to mul10. Then the mul10 label sets up the symbols AA AB and R as offsets to the current value of the stack pointer and stores the stack pointer value in RE before executing the next instruction (push RA). It is the start of a simple stack frame for a function.
- Create some test programs that evaluate all of the CPUs capabilities.
Create a wiki page with your writeup. For each task, write a short description of the task, in your own words.
- Include a description of the design of your assembler.
- Demonstrate that the assembler is generating correct code.
- Describe the programs you implemented. Demonstrate them working in simulation or on the board.
- Include a description, and pictures, of any extensions.
Give your wiki page the label cs232s17project8.
Put your VHDL, python, and assembly files in zip file in your private subdirectory on the Courses server. If you have any issues with the server, try using vpn.colby.edu in a browser.