Most computer programs today are written in a high-level language , such as Java, C, C++, or FORTRAN. A programming language is considered high-level if its statements resemble English language statements. For example, all of the languages just mentioned have some form of an if statement, which says, ``if some condition holds, then take some action.''
Computer scientists have invented hundreds of high-level programming languages although relatively few of these have been put to practical use. Some of the widely used languages have special features that make them suitable for one type of programming application or another. COBOL (COmmon Business-Oriented Language), for example, is still widely used in commercial applications. FORTRAN (FORmula TRANslator) is still preferred by some engineers and scientists. C and C++ are still the primary languages used by systems programmers.
In addition to having features that make them suitable for certain types of applications, high-level languages use symbols and notation that make them easily readable by humans. For example, arithmetic operations in Java make use of familiar operators such as ``+'' and ``-'' and ``/'', so that arithmetic expressions look more or less the way they do in algebra. So, to take the average of two numbers you might use the expression
(a + b) / 2
The problem is that computers cannot directly understand such expressions. In order for a computer to run a program, the program must first be translated into its machine language , which is the instruction set understood by its CPU or microprocessor. Each type of microprocessor has its own particular machine language. That's why typically when you buy a piece of software it runs either on a Macintosh, which uses the PowerPC chip, or on a Windows machine, which uses the Pentium chip, but not on both. The fact that a program can run on just one type of chip is known as platform dependence .
In general, machine languages are based on the binary code, a two-valued system that is well suited for electronic devices. In a binary representation scheme everything is represented as a sequence of 1's and 0's, which corresponds closely to the computer's electronic ``on'' and ``off'' states. For example, the number 13 would be represented as 1101. Similarly, a particular address in the computer's memory might be represented as 01100011, and an instruction in the computer's instruction set might be represented as 001100.
The instructions that make up a computer's machine language are very simple and basic. In most cases, a single instruction carries out a single machine operation. For example, a typical machine language might include instructions for ADD, SUBTRACT, DIVIDE, and MULTIPLY, but it wouldn't contain an instruction for AVERAGE. Therefore the process of averaging two numbers would have to be broken down into two or more steps. A machine language instruction itself might have something similar to the following format, in which an opcode is followed by several operands, which refer to locations in the computer's primary memory. The following instruction says ADD the number in LOCATION1 to the number in LOCATION2 and store the result in LOCATION3:
Opcode | Operand 1 | Operand 2 | Operand 3 |
---|---|---|---|
011110 | 110110 | 111100 | 111101 |
(ADD) | (LOCATION 1) | (LOCATION 2) | (LOCATION 3) |
Given the primitive nature of machine language, an expression like the one above, (a + b)/2, would have to be translated into a sequence of several machine language instructions which, in binary code, might look as follows:
011110110110111100111101
000101000100010001001101
001000010001010101111011
In the early days of computing, before high-level languages were developed, computers had to be programmed directly in their machine languages, an extremely tedious and error prone process. Imagine how difficult it would be to detect an error that consisted of putting a 0 in the above program where a 1 should occur!
Fortunately we no longer have to worry about machine languages,
because special programs can be used to translate a high-level or
source code program into machine language or
object code . In general, a program that
translates source code to object code is known as a
translator (Fig ). Thus,
with suitable translation software for Java or C++ we can write
programs as if the computer could understand Java or C++ directly.
Source code translators come in two varieties. An interpreter translates a single line of source code directly into machine language and executes the code -- which means runs it on the computer -- before going on to the next line of source code. A compiler translates the entire source code program into executable object code. The object code can then be run directly without further translation.
There are advantages and disadvantages to both approaches. Interpreted programs generally run less efficiently than compiled programs because they must translate and execute each line of the program at the same time. Once compiled, an object program is just executed without any need for further translation. It is also much easier to optimize compiled code to make it run more efficiently. But interpreters are generally easier to write and provide somewhat better error messages when things go wrong. Some languages, such as BASIC, LISP, and PERL, are mostly used in interpreted form, although compilers are also available for these languages. Programs written in COBOL, FORTRAN, C, C++, and PASCAL are compiled. As we will see, Java programs use both compilation and interpretation in their translation process.