Lesson 2. Basic Concepts.

Prev || Home || Next

bar.gif (11170 bytes)

Information in the computers.

Information units
Numeric systems
Converting binary numbers to decimal
Converting decimal numbers to binary
Hexadecimal system

Data representation Methods in a computer.

ASCII code
BCD method
Floating point representation

Working with the assembly language.

Program creation process
CPU registers
Assembler structure
Our first program
Storing and loading the programs

Information Units
In order for the PC to process information, it is necessary that this information
be in special cells called registers.

The registers are groups of 8 or 16 flip-flops.

A flip-flop is a device capable of storing two levels of voltage, a low one,
regularly 0.5 volts, and another one, regularly 5 volts. The low level of
energy in the flip-flop is interpreted as off or 0, and the high level as on or
1. These states are usually known as bits, which are the smallest information
unit in a computer.

A group of 16 bits is known as word; a word can be divided in groups of 8 bits
called bytes, and the groups of 4 bits are called nibbles.

Numeric systems
The numeric system we use daily is the decimal system, but this system is not
convenient for machines since the information is handled codified in the shape
of on or off bits; this way of codifying takes us to the necessity of knowing the
positional calculation which will allow us to express a number in any base where we need it.

It is possible to represent a determined number in any base through the
following formula:

Where n is the position of the digit beginning from right to left and numbering
from zero. D is the digit on which we operate and B is the used numeric base.

Converting binary numbers to decimals
When working with assembly language we come upon the necessity of converting numbers from the binary system, which is used by
computers, to the decimal system used by people.

The binary system is based on only two conditions or states, be it on(1) or
off(0), thus its base is two.

For the conversion process we can use the positional value formula:

For example, if we have the binary number of 10011, we take each digit from
right to left and multiply it by the base, elevated to the new position they
occupy:

Binary: 1 1 0 0 1

Decimal: 1*2^0 + 1*2^1 + 0*2^2 + 0*2^3 + 1*2^4

= 1 + 2 + 0 + 0 + 16 = 19 decimal.

The ^ character is used in computation as an exponent symbol and the * character is used to represent multiplication.
Converting decimal numbers to binary
There are several methods to convert decimal numbers to binary however only one will be analyzed here. Naturally a conversion
with a scientific calculator is much easier, but one cannot always count with one, so it is convenient to at least know one
formula to do it.

The method that will be explained uses the successive division of two, keeping
the residue as a binary digit and the result as the next number to divide.

Let us take for example the decimal number of 43.
43/2=21 and its remainder is 1

21/2=10 and its remainder is 1

10/2=5 and its remainder is 0

5/2=2 and its remainder is 1

2/2=1 and its remainder is 0

1/2=0 and its remainder is 1

Building the number from the bottom up, we find that the binary result is 101011

Hexadecimal system
On the hexadecimal base we have 16 digits which go from 0 to 9 and from the
letter A to the F, these letters represent the numbers from 10 to 15. Thus we
count 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E, and F.

The conversion between binary and hexadecimal numbers is easy. The first thing done to do a conversion of a binary number to
a hexadecimal is to divide it in groups of 4 bits, beginning from the right to the left. In case the last group,
the one most to the left, is under 4 bits, the missing places are filled with
zeros.

Taking as an example the binary number of 101011, we divide it in 4 bits groups and we are left with:

10;1011

Filling the last group with zeros (the one from the left):

0010;1011

Afterwards we take each group as an independent number and we consider its
decimal value:

0010=2;1011=11

But since we cannot represent this hexadecimal number as 211, we have to substitute all the values greater than 9 by their
respective representation in hexadecimal, with which we obtain:

2BH, where the H represents the hexadecimal base.

In order to convert a hexadecimal number to binary it is only necessary to invert the steps: the first hexadecimal digit is taken
and converted to binary, and then the second, and so on.

ASCII code
ASCII is an acronym of American Standard Code for Information Interchange.

This code assigns the letters of the alphabet, decimal digits from 0 to 9 and
some additional symbols a binary number of 7 bits, putting the 8th bit in its off
state or 0.

This way each letter, digit or special character occupies one byte in the
computer memory.

We can observe that this method of data representation is very inefficient on the numeric aspect, since in binary format one
byte is not enough to represent
numbers from 0 to 255, but on the other hand with the ASCII code one byte may represent only one digit.

Due to this inefficiency, the ASCII code is mainly used in the memory to
represent text.

BCD Method
BCD is an acronym for Binary Coded Decimal.

In this notation groups of 4 bits are used to represent each decimal digit from
0 to 9. With this method we can represent two digits per byte of information.

Even though this method is much more practical for number representation in the memory compared to the ASCII code, it still less
practical than the binary since with the BCD method we can only represent digits from 0 to 99. On the other hand in binary format
we can represent all digits from 0 to 255.

This format is mainly used to represent very large numbers in merchantile
applications since it facilitates operations avoiding mistakes.

Floating point representation
This representation is based on scientific notation, that is, to represent a
number in two parts: its base and its exponent.

As an example, the number 1234000, can be represented as 1.123*10^6, in this last notation the exponent indicates to us the
number of spaces that the decimal point must be moved to the right to obtain the original result.

In case the exponent was negative, it would indicate to us the number of
spaces that the decimal point must be moved to the left to obtain the original
result.

Program creation process
For the creation of a program it is necessary to follow five steps: design of the
algorithm, coding of the algorithm, translation to machine language, test and
depuration of the program.

On the design stage the problem to be solved is established and the best
solution is proposed, creating schematic diagrams used for the better
solution proposal.
The coding of the program consists in writing the program in some
programming language; assembly language in this specific case, taking as a
base the proposed solution on the prior step.
The translation to machine language is the creation of the object program,
in other words, the written program as a sequence of zeros and ones that can
be interpreted by the processor.
The last stage is the elimination of detected faults on the program on the
test stage. The correction of a fault normally requires the repetition of all
the steps from the first or second.

To create a program in assembler two options exist, the first one is to a program such as
MASM or Macro Assembler by Microsoft, and the second one is to use the debugger - on this first section we will use this latter
since it is found in all PC's with MS-DOS.

Debug can only create files with a .COM extension, and because of the
characteristics of these kinds of programs they cannot be larger that 64 kb. They also must start with displacement, offset,
or 0100H memory direction inside the specific segment.

CPU Registers
The CPU has 4 internal registers, each one of 16 bits. The first four, AX, BX,
CX, and DX are general use registers and can also be used as 8 bit registers, if
used in such a way it is necessary to refer to them for example as: AH and AL,
which are the high and low bytes of the AX register. This nomenclature is also
applicable to the BX, CX, and DX registers.

Registers known by their specific names:

AX Accumulator
BX Base register
CX Counting register
DX Data register
DS Data segment register
ES Extra segment register
SS Battery segment register
CS Code segment register
BP Base pointers register
SI Source index register
DI Destiny index register
SP Battery pointer register
IP Next instruction pointer register
F Flag register

It is possible to visualize the values of the internal registers of the CPU using
the Debug program. To begin working with Debug, at the prompt type the following:

C:/> Debug [Enter]

On the next line a dash will appear, this is the indicator of Debug, at this
moment the instructions of Debug can be introduced using the following command:

-r[Enter]

All the contents of the internal registers of the CPU are displayed; an
alternative of viewing them is to use the "r" command using as a parameter the name of the register whose value you want to see.
For example:

-rbx

This instruction will only display the content of the BX register and the Debug
indicator changes from "-" to ":"


When the prompt is like this, it is possible to change the value of the register
which was seen by typing the new value and [Enter], or the old value can be left by pressing [Enter] without typing any other
value.

It is possible to change the value of the flag register, and use it as a control
structure in our programs as we will later see. Each bit of the register has a
special name and meaning. The following list describes the value of each bit, on
or off and its relation with the operations of the processor:

Overflow

NV = there is no overflow
OV = there is an overflow

Direction

UP = forward
DN = backward

Interrupts

DI = deactivated
EI = activated

Sign

PL = positive
NG = negative

Zero

NZ = it is not zero
ZR = it is zero

Auxiliary Carry

NA = there is no auxiliary carry
AC = there is an auxiliary carry

Parity

PO = uneven parity
PE = even parity

Carry

NC = there is no carry
CY = there is a carry

 

In assembly language code lines have two parts, the first one is the name of the instruction which is to be executed, and the
second one are the parameters of the command. For example:

add ah,bh

Here "add" is the command to be executed, in this case an addition, and "ah" as
well as "bh" are the parameters.

The name of the instructions in this language is made up of two, three, or four
letters. These instructions are also called mnemonic names or operation codes,
since they represent a function the processor will pe
There are some commands which do not require parameters for their operation, as well as others that require just one parameter.

Sometimes instructions are used as follows:

add al,[170]

The brackets in the second parameter indicate to us that we are going to work
with the content of the memory cell number 170 and not with the 170 value, this is known as direct directioning.

Our first program
We are going to create a program that will illustrate what we have been
seeing. The program will add two values that we will directly introduce
into the code:

The first step is to initiate Debug, this step only consists of typing
debug[Enter] at the command prompt.

To assemble a program in Debug, the "a" (assemble) command is used; when this command is used, the address where you want the
assembling to begin can be given as a parameter, if the parameter is omitted the assembling will be initiated at the locality
specified by CS:IP, usually 0100h, which is the
locality where programs with .COM extension must be initiated. This will be
the place we will use since only Debug can create this specific type of
program.


Even though at this moment it is not necessary to give the "a" command a
parameter, it is recommendable to do so to avoid problems once the CS:IP
registers are used, therefore we type:

-a0100[Enter]

When this is done something like this will appear on the screen: 0C1B:0100 and the cursor will be positioned to the right of
these numbers. Note that the first four digits, in hexadecimal, can be different, but the last four must be 0100, since it is
the address we indicated as the beginning. Now we can introduce the instructions:

0C1B:0100 mov ax,0002; puts the 0002 value on the ax register
0C1B:0103 mov bx,0004; puts the 0004 value on the bx register
0C1B:0106 add ax,bx; the content of bx is added to the content of ax
0C1B:0108 INT 20; provokes the termination of the program.
0C1B:010A

It is not necessary to write the comments which go after the ";", but it is very helpful. Once the last
command has been typed, INT 20, [Enter] is pressed without writing anything
more, to see the Debug prompt again.

The last written line is not a proper assembler instruction, instead it is a
call for an operative system interruption (these interruptions will be dealt
with more in depth on a later chapter). For the moment it only necessary to know they save us a great deal of lines and are
very useful to access operative
system functions.

To execute the program we wrote the "g" command is used. When used we will see a message that says: "Program terminated
normally". Naturally, with a message like this one, we cannot be sure the program has done the addition, but there is a simple
way to verify it. By using the "r" command of Debug we can see the contents of all the registers of the processor, simply type:

-r[Enter]

Each register with its respective actual value will appear on the screen:

AX=0006BX=0004CX=0000DX=0000SP=FFEEBP=0000SI=0000DI-0000
DS=0C1BES=0C1BSS=0C1BCS=0C1BIP=010A NV UP EI
PL NZ NA PO NC
0C1B:010A OF DB oF

The possibility that the registers contain different values exists, but AX and
BX must be the same, since they are the ones we just modified.

Another way to see the values, while the program is executed, is to use the
address where we want the execution to end and show the values of the registers as a parameter for "g", in this case it would
be: g108. This instruction
executes the program then stops on the 108 address and shows the contents of the registers.

A follow up of what is happening in the registers can be done by using the "t"
command (trace). The function of this command is to execute line by line what
was assembled, showing each time the contents of the registers.

To exit Debug use the "q" (quit) command.

Storing and loading programs
It would not seem practical to type an entire program each time it is needed.
To avoid this it is possible to store a program on disk, with the
enormous advantage that by being already assembled it will not be necessary to run Debug again to execute it.

The steps to save a program that it is already stored in memory are:

1. Obtain the length of the program by subtracting the final address from the
initial address, naturally in hexadecimal system. You can use Debug's H
command to accomplish the math. -H 102h 01Ah would give you both the
sum and difference of these numbers.
2. Give the program a name and extension. Use Debug's N (name) command. Such as -N program.com.
3. Put the length of the program into the CX register. -R CX E8 (E8 for length).
4. Order Debug to write the program to disk with the W (write) command.
-W

By using the program from the prior chapter, we will have a
clearer idea of how to take these steps:

When the program is finally assembled it would look like this:

0C1B:0100 mov ax,0002
0C1B:0103 mov bx,0004
0C1B:0106 add ax,bx
0C1B:0108 INT 20
0C1B:010A
-h 10a 100
020a 000a
-n test.com
-rcx
CX 0000
:000a
-w
Writing 000A bytes

To obtain the length of a program the "h" command is used, since it will show us the addition and subtraction of two numbers
in hexadecimal. To obtain the
length of ours, we give it as parameters the value of our program's final
address (10A), and the program's initial address (100). The first result the
command shows us is the addition of the parameters and the second is the
subtraction.

The "n" command allows us to name the program.

The "rcx" command allows us to change the content of the CX register to the
value we obtained from the size of the file with "h", in this case 000a, since
the result of the subtraction of the final address from the initial address.

Lastly, the "w" command writes our program on the disk, indicating how many
bytes it wrote.

To save an already loaded file two steps are necessary:

Give the name of the file to be loaded.
Load it using the "l" (load) command.

To obtain the correct result of the following steps, it is necessary that the
above program be already created.

Inside Debug we write the following:

-n test.com
-l
-u 100 109
0C3D:0100 B80200 MOV AX,0002
0C3D:0103 BB0400 MOV BX,0004
0C3D:0106 01D8 ADD AX,BX
0C3D:0108 CD20 INT 20

The last "u" command is used to verify that the program was loaded into memory. It disassembles the code and shows it
disassembled. The parameters indicate to Debug from where and to where to disassemble.

Debug always loads the programs into memory on the address 100H, unless otherwise
indicated.

bar.gif (11170 bytes)

Prev || Home || Next