view doc/general/ede.tex @ 29:83e80c2c489c

seperated working emu code from broken emu code. wrote dbg interface
author james <jb302@eecs.qmul.ac.uk>
date Sun, 13 Apr 2014 22:42:57 +0100
parents a542cd390efd
children
line wrap: on
line source
%% LyX 2.0.6 created this file.  For more info, see http://www.lyx.org/.
%% Do not edit unless you really know what you are doing.
\documentclass[english]{article}
\usepackage{lmodern}
\renewcommand{\sfdefault}{lmss}
\renewcommand{\ttdefault}{lmtt}
\renewcommand{\familydefault}{\sfdefault}
\usepackage[T1]{fontenc}
\usepackage[utf8]{luainputenc}
\usepackage{listings}
\usepackage{color}
\usepackage{graphicx}

\makeatletter

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands.
%% A simple dot to overcome graphicx limitations
\newcommand{\lyxdot}{.}


\makeatother

\usepackage{babel}
\begin{document}

\title{EDE: ELB816 Development Environment}


\author{James Bowden (110104485)}
\maketitle
\begin{abstract}
The ELB816 Development Environment consists of an assembler, emulator
and debugger for the ELB816 microprocessor system. This report details
the design and usage of each of its elements.
\end{abstract}
\newpage{}

\tableofcontents{}

\newpage{}


\part{Introduction and Specification}

\bigskip


\section{Motivations}

The ELB816 architecture is designed to be a ``simple to understand
8-bit microprocessor system to help learn about microprocessor electronics.''

\bigskip

The combination of an ELB816 emulator, debugger and assembler could
be used as a set of tools for learning or teaching microprocessor
programming without the intricacies of real-world commercial microprocessors
getting in the way of a fundamental understanding of the subject.

\bigskip

A PC based emulator would allow students to quickly develop and debug
programs written in a simple assembly language on any modern desktop
or laptop and an MCS-51 port running on an 8052 would allow students
to test programs in an actual circuit.


\section{Project Aims}
\begin{itemize}
\item Develop an assembler for the ELB816 assembly language.
\item Develop an emulated programmable microprocessor system based on the
ELB816 architecture.
\item Develop a debugger that allows interactive debugging of programs running
on the emulator.
\end{itemize}

\section{Methodology}


\subsection{Assembler}
\begin{description}
\item [{Language:}] Python
\item [{Priority:}] First
\end{description}
The assembler will be developed before anything else so that it can
subsequently be used to assemble test programs during development
of the emulator. 

\newpage{}


\subsection{Emulator}
\begin{description}
\item [{Language:}] C
\item [{Priority:}] Second
\end{description}
The emulator will use only standard libraries in order to ensure it
is portable between compilers and platforms. Specifically GCC for
x86 and Keil C51 for Intel MCS-51. The emulator will first be developed
on Linux to facilitated rapid development. It will be ported to MCS-51
once it is complete


\subsection{Debugger}
\begin{description}
\item [{Language:}] C/Python
\item [{Priority:}] Second
\end{description}
The debug interface will be developed along side the emulator. It
will consist of a simple text based interface built into the emulator
that will read commands using C's \lstinline[basicstyle={\ttfamily}]!stdio.h!
library. This means that on Linux the commands will be issued using
\lstinline[basicstyle={\ttfamily}]!STDIN! and on the MCS-51 version
they will be issued over a serial interface. Python will be used to
provide a cleaner interface for common debug procedures such as writing
programs to memory and setting break-points. 

\bigskip

The remainder of this report is split into three parts, one for each
component of the project, and will attempt to demonstrate the design
and usage of each of these components. 

\newpage{}


\part{Assembler}

The assembler is written in pure Python 2 using only the standard
library. It assembles the assembly the language described in the ELB816
specification with a few minor differences. These differences are:
\begin{itemize}
\item In-line arithmetic must be wrapped in curved brackets eg. start with
'(' and end with ')'. This is a limitation of the design of the program
and to change it would require a large amount of code to be re-written.
\item The only directives that have been implemented are \lstinline[basicstyle={\ttfamily}]!ORG!,
\lstinline[basicstyle={\ttfamily}]!EQU!, \lstinline[basicstyle={\ttfamily}]!DB!
and \lstinline[basicstyle={\ttfamily}]!DS!. The other directives
listed in the specification have not been implemented, but there omission
is only due to time constraints and they could easily be implemented
in a later version.
\item Macros have not been implemented also due to time constraints.
\end{itemize}
The assembler consists of two files: 
\begin{itemize}
\item \lstinline[basicstyle={\ttfamily}]!language.py! which contains the
language definition in an index and some functions to help encode
instructions.
\item \lstinline[basicstyle={\ttfamily}]!assembler.py! which contains the
first and second pass functions and handles opening source files and
writing binary files.
\end{itemize}
The following sections details the design and behavior of the assembler.
However it must be noted that these are abstract and high level descriptions
that do not fully explain minor routines, but give an overview of
the entire process. The full source code is attached in the Appendix
and should be referenced for a deeper understanding of the program's
operation. The final section is a short programmers manual demonstrating
the assembler's features.

\newpage{}


\section{Data Structures}
\begin{itemize}
\item \lstinline[basicstyle={\ttfamily}]!reserved arguments!
\end{itemize}
This structure contains a list of string representations of the reserved
word arguments for the instruction set. These all equate to registers
or register pointers. The full list is as follows:

\begin{lstlisting}[basicstyle={\ttfamily},captionpos=b,frame=tb,framexbottommargin=1em,framextopmargin=1em,keywordstyle={\color{blue}},tabsize=4]
a, c, bs, ie, flags, 
r0, r1, r2, r3, 
dptr, dpl, dph,  
sp, sph, spl,
@a+pc, @a+dptr, @dptr
\end{lstlisting}

\begin{itemize}
\item \lstinline[basicstyle={\ttfamily}]!relative instructions! 
\end{itemize}
This structure contains a list of string representations of the mnemonics
of instructions that use relative addressing. The full list is as
follows:

\begin{lstlisting}[basicstyle={\ttfamily},captionpos=b,frame=tb,framexbottommargin=1em,framextopmargin=1em,keywordstyle={\color{blue}},tabsize=4]
djnz, cjne, sjmp, jz,
jnz, jc, jnc, jpo, 
jpe, js, jns
\end{lstlisting}

\begin{itemize}
\item \lstinline[basicstyle={\ttfamily}]!instruction index! 
\end{itemize}
This structure contains an index of all possible instructions in the
instruction set, along with the the corresponding opcode and instruction
width. This is implemented using a combination of Python's dictionary,
tuple and list objects. Its structure is demonstrated below:

\begin{lstlisting}[basicstyle={\ttfamily},captionpos=b,frame=tb,framexbottommargin=1em,framextopmargin=1em,keywordstyle={\color{blue}},tabsize=4]
mnemonic: (arg type, arg type, ...): [opcode, width]
\end{lstlisting}


Each mnemonic has an entry in the parent index which returns another
index of possible argument formats for that mnemonic with their corresponding
opcode and length. Argument types can be either be one of the reserved
arguments or one of the following values: \lstinline[basicstyle={\ttfamily}]!address!,
\lstinline[basicstyle={\ttfamily}]!pointer!, \lstinline[basicstyle={\ttfamily}]!data!
or \lstinline[basicstyle={\ttfamily}]!label! . Width is represented
in number of bytes, ie. \lstinline[basicstyle={\ttfamily}]!width = 3!
means 1 byte of opcode and 2 bytes of arguments.
\begin{itemize}
\item \lstinline[basicstyle={\ttfamily}]!label index! 
\end{itemize}
This structure is used to store an index of label definitions.
\begin{itemize}
\item \lstinline[basicstyle={\ttfamily}]!equate index! 
\end{itemize}
This structure is used to store an index of equated strings.

\newpage{}


\section{Functions}
\begin{itemize}
\item \lstinline[basicstyle={\ttfamily}]!first_pass(source file)! 
\end{itemize}
This function pre-processes a source file and stores it in a format
containing the necessary data for the \lstinline[basicstyle={\ttfamily}]!second_pass()!
function to assemble it. It processes labels and \lstinline[basicstyle={\ttfamily}]!EQU!
directives by storing strings and their corresponding values in indexes
and replacing any subsequent appearances of the string with the value.
It prepares \lstinline[basicstyle={\ttfamily}]!ORG! and \lstinline[basicstyle={\ttfamily}]!DB!
statements for the \lstinline[basicstyle={\ttfamily}]!second_pass()!.
It uses the \lstinline[basicstyle={\ttfamily}]!tokenize()! function
to determine the argument symbols and operand bit string. Finally
it uses the \lstinline[basicstyle={\ttfamily}]!instruction index!
to determine the instruction width.
\begin{itemize}
\item \lstinline[basicstyle={\ttfamily}]!second_pass(asm, label index)! 
\end{itemize}
This function takes the pre-processed assembly code and \lstinline[basicstyle={\ttfamily}]!label index!
output by \lstinline[basicstyle={\ttfamily}]!first_pass()! as input.
First it checks for \lstinline[basicstyle={\ttfamily}]!ORG! and \lstinline[basicstyle={\ttfamily}]!DB!
statements and handles them if necessary. Then it replaces any labels
that were used before they were defined and therefore not replaced
on by \lstinline[basicstyle={\ttfamily}]!first_pass()! . It uses
the \lstinline[basicstyle={\ttfamily}]!instruction index ! to determine
the opcode and the width of the instruction, then it writes the opcode
and operand to the file. If the combined width of the opcode and operand
is greater than the instruction width the function raises an error. 
\begin{itemize}
\item \lstinline[basicstyle={\ttfamily}]!tokenize(mnemonic, arguments)!
\end{itemize}
This function processes an instruction in order to produce a hashable
symbol that represents the format of its arguments. This symbol is
used to look up opcodes in the \lstinline[basicstyle={\ttfamily}]!instruction index!.
It also detects string representations of numbers in the arguments
and stores a C type struct representation of the operands to be returned
along with the symbol. It does this with the help of the \lstinline[basicstyle={\ttfamily}]!stoi()!
function and Python's \lstinline[basicstyle={\ttfamily}]!struct!
module .
\begin{itemize}
\item \lstinline[basicstyle={\ttfamily}]!stoi(string)! 
\end{itemize}
This function is a general purpose function that is actually used
throughout the code, although mainly in the \lstinline[basicstyle={\ttfamily}]!tokenize()!
function. It takes a string as an input and tries to convert it to
an integer using Pythons integer representation syntax. It can recognize
decimal, octal, hexadecimal and binary numbers which are denoted with
different prefixes. If it receives a string it can not represent as
an integer it returns the string 'NaN', (Not a Number)

\bigskip

Below is an abstract representation of each functions logical process.
The \lstinline[basicstyle={\ttfamily}]!first_pass()! and \lstinline[basicstyle={\ttfamily}]!second_pass()!
are represented in pseudo-code, however \lstinline[basicstyle={\ttfamily}]!stoi()!
and \lstinline[basicstyle={\ttfamily}]!tokenize()! are more easily
understood when represented as flowcharts. 

\newpage{}


\subsection{\lstinline[basicstyle={\ttfamily}]!first_pass!}

\begin{lstlisting}[basicstyle={\small\ttfamily},captionpos=b,frame=tb,framexbottommargin=3em,framextopmargin=3em,keywordstyle={\color{blue}},language=Python,showstringspaces=false,tabsize=4]
first_pass(source file):

	address = 0

	for statement in source file:

		remove comments
	
		for word in statement:
		
			if word is in equate index:
				replace word with equated value
			else if word is in label index:
				replace word with address at label
	
			if first word == 'org'
				address = second word
			else if last character of first word == ':':
				remove ':'
				add word = address to label index
				next statement
			else if second word == 'equ'
				add first word = third word to equate index
				next statement
	
		mnemonic = first word
		arguments = [second word ... last word]
	
		symbol, constant = tokenize(arguments)
		if mnemonic == 'db':
			address = address + width of constant
			next statement
		
        width = instruction index[mnemonic][symbol][width]
		address = address + width
	
		append [mnemonic, argument, symbol, constant] to asm

	return asm, label index
\end{lstlisting}
\newpage{}


\subsection{\lstinline[basicstyle={\ttfamily}]!second_pass! }

\begin{lstlisting}[basicstyle={\small\ttfamily},breaklines=true,captionpos=b,frame=tb,framexbottommargin=3em,framextopmargin=3em,keywordstyle={\color{blue}},language=Python,tabsize=4]
second_pass(file, asm, label index):

	address = 0

	for line in asm:

		file offset = address
		
		mnemonic, arguments, symbol, constant = line
	
		if mnemonic == 'org':
			address = first argument
			next line
		else if mnemonic == 'db':
			write constant to file
			address = address + width of constant
			next line
	
		for argument in arguments:
			if argument is a label:
				replace argument with address at label
				symbol, data = tokenize(argument)
				append data to constant
	
		op, width = instruction index[mnemonic][symbol]
	
		write op to file

		if width of constant - width + 1 > 0:
			raise error
		else if:
			write constant to file
			address = address.+ width
	
	return file
\end{lstlisting}
\newpage{}


\subsection{\lstinline[basicstyle={\ttfamily}]!tokenize!}

\bigskip

\includegraphics[scale=0.57]{/home/jmz/qm/ede/doc/images/assembler/tokenize}

\newpage{}


\subsection{\lstinline[basicstyle={\ttfamily}]!stoi! }

\bigskip
\begin{description}
\item [{\includegraphics[scale=0.7]{/home/jmz/qm/ede/doc/images/assembler/stoi}}]~
\end{description}
\newpage{}


\section{Assembly language manual}

\newpage{}


\part{Emulator}


\section{Core microprocessor emulation}

The core of the emulator is written in C using only standard libraries.
It executes the machine code output by the assembler according to
the ELB816 specification. It consists of the following files:
\begin{itemize}
\item \lstinline[basicstyle={\ttfamily}]!iset.c! and \lstinline[basicstyle={\ttfamily}]!iset.h!
\end{itemize}
These files contain the emulator instruction functions and function
look-up table.
\begin{itemize}
\item \lstinline[basicstyle={\ttfamily}]!mem.c! and \lstinline[basicstyle={\ttfamily}]!mem.h!
\end{itemize}
These files contain the emulators memory structure and memory access
functions.
\begin{itemize}
\item \lstinline[basicstyle={\ttfamily}]!emu.c!
\end{itemize}
This file contains the program's \lstinline[basicstyle={\ttfamily}]!main()!
function. It initializes the emulator and executes the programs fetch/decode/execute
cycle. 

\bigskip

Below is a high level description of the content of each of these
files which should demonstrate how the emulator works. There is also
a large amount of material relevant to the emulator's design in the
appendix, which will be referenced when applicable.


\subsection{\lstinline[basicstyle={\ttfamily}]!iset.c! and \lstinline[basicstyle={\ttfamily}]!iset.h!}

Each mnemonic in the ELB816 instruction set has a function defined
in these files. Each function is responsible for execution of all
the instructions that use its corresponding mnemonic. The function
look-up table is an array of pointers to these functions, where a
pointer's position in the list corresponds to the opcode of the instruction
to be executed.

\bigskip

\newpage{}


\subsection{\lstinline[basicstyle={\ttfamily}]!mem.c! and \lstinline[basicstyle={\ttfamily}]!mem.h!}

The figures bellow illustrate the emulator's memory layout as defined
in the \lstinline[basicstyle={\ttfamily}]!mem.h! header file.

\bigskip

\lstinline[basicstyle={\ttfamily}]!mem.c! contains functions that
can be used to access this memory from the rest of the code.

\newpage{}


\subsection{\lstinline[basicstyle={\ttfamily}]!emu.c!}

This file contains the emulator's set-up and control procedures. It
includes all of the projects header files and controls the execution
of the functions contained in them.

\bigskip

It first executes a number of initialization procedures and then passes
control over to the main fetch/decode/execute cycle. This procedure
is shown below as a flowchart. To understand this it you must be familiar
with C's function pointer syntax.

\bigskip

\centerline{\includegraphics[scale=0.7]{/home/jmz/qm/ede/doc/images/emulator/fetch_decode_exe}}

\newpage{}


\section{Peripherals}

\newpage{}
\end{document}