Lesson 3. Assembler Programming.

Prev || Home || Next

bar.gif (11170 bytes)

Requirements for programming in assembly language.

Needed software
Utilization of the MASM
Linker use

Format of a program in assembler.

Internal format
External format
Practical example of a program

Assembly process.

Segments
Table of symbols

Types of instructions.

Data movement
Logic and arithmetic operations
Jumps, loops and procedures
NEEDED SOFTWARE
In order to be able to create a program several tools are needed:

First an editor to create the source program. Second a compiler, which is
nothing more than a program that "translates" the source program into an object program. And third, a linker that generates
the executable program from the object program.

The editor can be any text editor at hand, and as a compiler we will use the
MASM, macro assembler from Microsoft, since it is the most common, and as a
linker we will use the Link program.

The extension used so that MASM recognizes the source programs in assembler is. ASM; once translated the source program, the
MASM creates a file with the .OBJ extension, this file contains an "intermediate format" of the program, called this because
it is neither executable nor is it a program in source language. The linker generates, from a . OBJ or a combination of several
of these files, an executable program, whose extension usually is .EXE though it can also be .COM, depending on the form it was
assembled.

This tutorial describes the way to work with the 5.0 or later version of the
MASM. The main difference of this version with the ones before it is the way in
which the code, data, and stack segments are declared. The structure of the
programming is the same.

UTILIZATION OF THE MASM
Once the object program has been created it must be passed to the MASM to create the intermediate code, which remains in a file
with an .OBJ extension. The command to do this is:

MASM Name_File; [Enter]

Where Name_File is the name of the source program with the .ASM extension that will be translated. The semicolon used after
the name of the file indicates to the macro assembler to directly generate the intermediate code, and in case of omitting it,
the MASM will ask for the name of the file it will translate, the
name of the file which will be generated as well as options of information
listing that it can give to the translator.

It is possible to execute the MASM using various parameters to obtain a determined
goal. The entire list can be found in the manual of the program. I will only
remind you in this tutorial to pass such parameters to the MASM.

All parameters come after the symbol "/". It is possible to utilize several
parameters at a time. Once all the parameters have been typed in, the name of
the file to be assembled is written. For example, if we want the MASM to assemble a program called "test", and we also want
it to display the number of source lines and processed symbols, then we do this with the /v parameter. Then to also tell us
if a mistake occured and on which line it occurred, the /z parameter is used.
The entire command would be:

MASM /v /z test;
USE OF THE LINKER
The MASM can only create programs in .OBJ format. These are not executable by themselves. It is necessary to have a linker
which generates the
executable code.

The use of the linker is very similar to the use of the MASM, and it is only
typed on the DOS indicator:

LINK Name_File;

Where Name_File is the name of the intermediate program, .OBJ. This generates a file directly with the name of the intermediate
program and the .EXE extension.
INTERNAL FORMAT OF A PROGRAM
In order to communicate in any language, including programming languages, it is
necessary to follow a set of rules, or on the contrary we would no be able to
express what we wish.

In this section we will see some of the rules we must follow to write a program in assembly language. We will focus on the way
to write the instructions so that the assembler will be able to interpret them.

Basically the format of a code line in assembly language has four parts:

*Label, variable or constant: This is not always defined; if it is defined it is
necessary to use separators to differentiate it from the other parts, usually spaces
or some special symbol.


EXTERNAL FORMAT OF A PROGRAM
Apart from defining certain rules so that the assembler can understand an
instruction, it is necessary to give it certain information of the resources to
be used, for example the memory segments which will be used, initial data of the program and also where does our code begin and
where it ends.

A simple program could be the following;

.MODEL SMALL
.CODE
Program:
MOV AX,4C00H
INT 21H
.STACK
END Program

The program does not really do anything. It only puts the 4C00H value on the AX register, so that the 21H interruption ends
the program. It does however give us an idea of the external format of an assembler program.

The .MODEL directive defines the kind of memory which will be used; the .CODE directive indicates that what is next is our
program; the Program label
indicates to the assembler the beginning of the program; the .STACK directive
asks the assembler to reserve a space of memory for the stack operations; the
"END Program" instruction marks the end of the program.


PRACTICAL EXAMPLE OF A PROGRAM
This is an example of a program which will write a chain on the screen:

.MODELSMALL
.CODE
Program:
MOV AX, @DATA
MOV DS, AX
MOV DX, Offset Text
MOV AH,9
INT 21H
MOV AX,4C00H
INT 21H
.DATA
Text DB'Message on screen.$'
.STACK
END Program

The first steps are the same as the ones from the previous program: the memory
model is defined, it is indicated where the program code begins and where the
instructions begin.

Next @DATA is placed on the AX register to later pass it to the DS register
since a constant cannot be copied directly to a segment register. The content of
@DATA is the number of the segment which will be used for the information. Then a value given by "Offset Text is kept on the
DX register, which gives us the address where the chain of characters is found on the data segment.


Then it uses the 9 option, given by the value of AH, of the 21H interruption to display the positioned chain of the address which
contains DX. Lastly it uses the 4CH option of the 21H interruption to end the execution of the program, even though we loaded
the 4C00H value to the AX register the 21H interruption only takes as an option the content of the AH register.

The .DATA directive indicates to the assembler that what is written next it must store it on the memory segment destined for
the data. The DB directive is used to Define Bytes, this is, to assign a value to a certain identifier. In this
case "Text", be it a constant or a chain of characters, which in this last case
it will have to be between simple quotation marks ' and end with the "$" symbol.

SEGMENTS
The architecture of the x86 processors forces us to use memory segments to manage the information. The size of these segments
is 64kb.

The reason for using these segments is that, considering that the maximum size of a number that the processor can manage is given
by a word of 16 bits or register, it would not be possible to access more than 65536 localities of
memory using only one of these registers, but now, if the PC's memory is divided into groups or segments, each one of 65536
localities, and we use an address on an exclusive register to find each segment, and then we make each address of a specific
slot with two registers, it is possible for us to access a quantity of 4294967296 bytes of memory, which is, in the present day,
more memory than what we will see installed in a PC.


In order for the assembler to be able to manage the data, it is necessary that
each piece of information or instruction be found in the area that corresponds
to its respective segments. The assembler accesses this information, taking into
account the localization of the segment, given by the DS, ES, SS and CS
registers and inside the register the address of the specified piece of
information. It is because of this that when we create a program using the Debug on each line that we assemble, something like
this appears:

1CB0:0102 MOV AX,BX

Where the first number, 1CB0, corresponds to the memory segment being used, the second one refers to the address inside this
segment, and the instructions which will be stored from that address follow.

The way to indicate to the assembler with which of the segments we will work
with is with the .CODE, .DATA and .STACK directives.

The assembler adjusts the size of the segments taking as a base the number of
bytes each assembled instruction needs, since it would be a waste of memory to use the whole segments. For example, if a program
only needs 10kb to store data, the data segment will only be of 10kb and not the 64kb it can handle.

SYMBOLS CHART
Each one of the parts on code line in assembler is known as token, for example
on the code line:

MOV AX,Var

we have three tokens, the MOV instruction, the AX operator, and the VAR
operator. What the assembler does to generate the OBJ code is to read each one
of the tokens and look for it on an internal "equivalence" chart known as the
reserved words chart, which is where all the mnemonic meanings we use as
instructions are found.

Following this process, the assembler reads MOV, looks for it on its chart and
identifies it as a processor instruction. Likewise it reads AX and recognizes it
as a register of the processor, but when it looks for the Var token on the
reserved words chart, it does not find it, so then it looks for it on the
symbols chart which is a table where the names of the variables, constants and labels used in the program where their addresses
on memory are included and the sort of data it contains, are found.

Sometimes the assembler comes upon a token which is not defined by the program, therefore it then passes a second time through
the source program to verify all references to that symbol and places it on the symbols chart. There are symbols which the
assembler will not find since they do not belong to that segment and the program does not know in what part of the memory it
will find that segment, and at this time the linker comes into action, which will create the structure necessary for the loader
so that the segment and the token be defined when the program is loaded and before it is executed.

DATA MOVEMENT
In any program it is necessary to move the data in the memory and in the CPU
registers. There are several ways to do this: it can copy data in the memory to
some register, from register to register, from a register to a stack, from a
stack to a register, to transmit data to external devices as well as vice versa.

This movement of data is subject to rules and restrictions. The following are
some of them:

*It is not possible to move data from a memory locality to another directly; it
is necessary to first move the data of the origin locality to a register and then
from the register to the destination locality.

*It is not possible to move a constant directly to a segment register; it first
must be moved to a register in the CPU.

It is possible to move data blocks by means of the movs instructions, which
copies a chain of bytes or words; movsb which copies n bytes from a locality to
another; and movsw copies n words from a locality to another. The last two
instructions take the values from the defined addresses by DS:SI as a group of
data to move and ES:DI as the new localization of the data.

To move data there are also structures called batteries, where the data is
introduced with the push instruction and are extracted with the pop instruction.

In a stack the first data to be introduced is the last one we can take, this is,
if in our program we use these instructions:

PUSH AX
PUSH BX
PUSH CX

To return the correct values to each register at the moment of taking them from the stack it is necessary to do it in the
following order:

POP CX
POP BX
POP AX

For the communication with external devices the out command is used to send
information to a port and the in command to read the information received from a port.

The syntax of the out command is:

OUT DX,AX

Where DX contains the value of the port which will be used for the communication and AX contains the information which will be
sent.

The syntax of the in command is:

IN AX,DX

Where AX is the register where the incoming information will be kept and DX
contains the address of the port by which the information will arrive.

LOGIC AND ARITHMETIC OPERATIONS
The instructions of the logic operations are: and, not, or and xor. These work
on the bits of their operators.

To verify the result of the operations we turn to the cmp and test instructions.

The instructions used for the algebraic operations are: to add add, to subtract
sub, to multiply mul and to divide div.

Almost all the comparison instructions are based on the information contained in the flag register.

Normally the flags of this register which can be directly handled by the programmer are the data direction flag DF, used to
define the operations about chains. Another one which can also be handled is the IF flag by means of the sti and cli
instructions, to activate and deactivate the interruptions.

JUMPS, CYCLES AND PRODEDURES
The unconditional jumps in a written program in assembler language are given by the jmp instruction; a jump is to alter the flow
of the execution of a
program by sending the control to the indicated address.

A loop, known also as iteration, is the repetition of a process a certain number
of times until a condition is fulfilled. These loops are used.

bar.gif (11170 bytes)

Prev || Home || Next