Section1.3Programming Languages, Compilers, and Assemblers
Computers and microcontrollers are useless without operating instructions, which are written in various programming languages. There is a hierarchy of language types, from machine language to high-level programming languages such as C. This hierarchy is depicted in FigureΒ 1.3.1.
Instructions form the basis of all digital computing. A piece of binary data, called an instruction, contains information about the specific operation to be carried out. There are arithmetic operations such as addition and subtraction, branch operations such as jump to another piece of code, data transfer operations such as loading or copying data from one place to another, bit instructions such as clearing and shifting data, and control instructions such as putting the microcontroller into a sleep mode.
Families of microcontrollers and microprocessors each have their own specific instructions that they are capable of carrying out. Each instruction is composed of a string of binary numbers containing
The word size (number of bits) of an instruction must be long enough to contain the opcode and all operands. For every \(n\) opcodes, \(\lceil \log_2(n)\rceil\) bits of data are required to differentiate between the different opcodes, assuming that expanding opcodes (described in SectionΒ 4.2) are not used. (For example, if there are 40 operations, 6 bits of the instruction are necessary to store the opcode.) Depending on the word size of an instruction (which is typically a fixed value set by the manufacturer of the microcontroller), the number of operations and size of opcodes may be limited.
Machine language consists of each instruction written out in binary. These binary values are stored in the program memory of a microcontroller. Machine language can be very tedious to write and difficult to debug because itβs written in binary.
Every instruction is parsed in hardware by an instruction decoder (SectionΒ 2.2), which then routes the appropriate control and data signals to the arithmetic and logic unit (ALU, SectionΒ 2.3).
Assembly language uses mnemonic codes to refer to each instruction and uses names for operands. Because assembly uses words instead of binary, it is much easier to both write and read. Being familiar with the allowable instructions on a microcontroller means that code can be written very efficiently and compactly; assembly code generally uses less memory and takes less time to execute than code written using higher-level languages.
Example1.3.2.Machine and assembly language to add two values together.
The ATmega328P has an instruction that can add the contents of two general purpose registers together. (General purpose registers are used to temporarily store data to be used as operands in an instruction.) The machine language used to add the contents of general purpose registers r15 and r20 is 0001110101001111.
Assembly instructions can be conceptualized using register-transfer language (RTL). This is used to define the modification of the hardware affected by a particular instruction with respect to each of the operands. For example, an addition instruction could be represented in register-transfer language as Rd β Rd + Rr, where Rd is used to represent the destination of the result (and source of one of the addends) and Rr is used to represent the source of the other addend. This depiction indicates that the contents of the source and destination will be added together with the result stored into the destination.
Assembly is not a universal language; the actual instructions will be dictated by the processor. 8-bit AVR microcontrollers will share a set of assembly instructions, but those could not necessarily be used to program in assembly on any other microcontroller. For this reason, assembly code cannot be copy/pasted directly from an AVR microcontroller to an Intel processor, for example. The ATmega328P, being an AVR microcontroller, uses the AVR instruction set. AVR assembly is discussed in more detail in ChapterΒ 15.
High-level programming languages use functions, abstractions, and/or symbolic notation to describe the operation of the microcontroller. This is typically easier to write than assembly code, as it does not require the user to have a direct knowledge of the available instruction set. For example, arithmetic operations are executed using common symbols (+ for addition, - for subtraction, and so on). A compiler converts the high-level language code into machine instructions. A good compiler will attempt to optimize the high-level language as much as possible by using the fewest number of instructions to execute each operation. Examples of high-level languages are C, C++, Python and Visual Basic.
Using high-level languages means that more time can be spent working on algorithms rather than on the specifics of what machine instruction(s) to use. High-level programs in the same language can more-or-less be recycled from one microcontroller to another (with caveats, not all registers have the same names, some functions may be a bit different, etc.). C concepts that are pertinent to microcontroller programming are discussed in more detail in ChapterΒ 14.
Compilers ensure that the high-level programming language code is correct, both in syntax and in memory allocation. Errors or warnings are usually displayed in these cases, and codes with errors are not loaded onto the microcontroller. Once the code is correct, the compiler takes the code and converts it into assembly language, and from there generates a HEX file which contains all of the machine code that needs to go into memory on the microcontroller.