An assembler is a computer program to convert text in assembly language to machine code. This code can be executed directly by a processor. Assembly language is a programming language very close to the machine (computer hardware), but easier to understand by using mnemonics instead of numerical values. The assembly language itself is also called (incorrectly) assembler.
Assembling almost literally means assembling. The task of the assembler is therefore to compile processor executable instructions (opcodes with their possible operands) using human readable instructions.
Assembly and disassembly
The concepts of assembler and disassembler belong together. Assemblers and disassemblers are often combined, for example DEBUG.EXE (supplied with MS-DOS and present in every Windows version up to Windows 7) contained both. They are part of every package with which embedded systems are developed, but many integrated development systems for higher programming languages also contain a disassembler.
The assembler nowadays is usually written in a higher programming language (for example C++). In the past, these programs were generally written in the target machine assembly language. If a program was assembled on another machine with a different type of processor or in a higher language, a cross-assembler was used.
Assemblers
Some assemblers for the Intel x86 family are:
MASM, the Microsoft assembler, tries to make assembly more like a higher programming language and therefore has a sometimes strange syntax. This assembler was one of the first assemblers for the x86.
TASM, the Borland Turbo assembler, has its own syntax in addition to the MASM syntax. This assembler was also early there and is therefore often the choice of the old guard.
NASM, the Netwide assembler, a fairly new assembler with a very simple syntax in which everything has to be said clearly and nothing is assumed. This open source software is suitable for people who want to keep everything under control. This assembler is suitable for writing the boot loader of an operating system because of the perfect control over the final result.
YASM, a completely rewritten version of NASM, is released under the BSD license. Key features are the support of the x86-64 instruction set, and the corresponding output of PE COFF64 and ELF64 file formats (in addition to (PE) COFF (32), ELF32 and BIN).
FASM, the Flat assembler, is also an open source assembler. FASM also supports the x86-64 instruction set.
Furthermore, each (microprocessor) family has its assemblers. This applies, for example, to Microchip (PIC), Freescale (formerly Motorola, now owned by NXP Semiconductors), Renesas (formerly, among others, Mitsubishu), Atmel, ARM.
The well-known GNU Compiler Collection (GCC) always translates the higher code (for example C) first to assembly, not directly into object code. This means that for every processor for which GCC has been implemented, an assembler must be available.
Executable code
Most assemblers output an object file as output. Common extensions are .obj and .o. An object file must be converted into an executable file (for example, an .exe or an ELF executable under Unix variants) before it can be executed. This is done by linking. Linking also allows two or more object files to be merged. Because compilers of higher programming languages generally also supply object files, it is possible to create a mixed project. The operating system (such as Mac OS, Linux, DOS or Windows) prescribes how the object format is and therefore offers the left one.
Embedded processor programs are often generated directly in executable form. This can be a so-called image, where byte by byte contains the same code as the processor. It can also be a readable text file, containing the content as hexadecimal codes and further information such as the address where the code will be placed. The two best known formats are the Motorola S format, and the Intel hex format. These formats are also generated from an object file.
Links:
Assembler