IA-32

IA-32, sometimes generically called x86 or even x86-32, is the instruction set for a family of microprocessors installed in the vast majority of personal computers in the world. Within various programming language directives it is also referred to as i386; this directive informs the compiler to generate code only for the IA-32 instruction set. IA-32 refers mainly to the 32-bit specifications of the full x86 architecture.

The term means Intel Architecture, 32-bit, which distinguishes it from the 16-bit versions of the architecture that preceded it, and the 64-bit architecture IA-64 (which is very different, although it has an IA-32 compatibility mode). The more generic name for all 16-bit and 32-bit versions of this architecture, is x86.

Intel Corporation was the inventor and is the biggest supplier of processors compatible with this instruction set, but it is not the only supplier of such processors. The second biggest supplier is Advanced Micro Devices (AMD). Many other manufacturers have also supplied IA-32-compatible processors, such as Cyrix and VIA.

IA-32 was introduced via the Intel 80386 in 1985. This instruction set is still the basis of most processors thirty-five later in 2020. Even though the instruction set has remained intact, the successive generations of processors that run it have become much faster and more optimized at running it. The IA-32 instruction set is usually described as CISC (Complex Instruction Set Computer) architecture, though such classifications have become less meaningful with advances in processor design.

Memory Management
There are two memory access models that IA-32 supports. One is called Real Mode, and the other is called Protected Mode. In Real Mode, the processor is limited to accessing a total of just over 1 MiB of memory, while in Protected Mode it can access all of its memory.

Real Mode
The old MS-DOS operating system required Real Mode to work, while newer Microsoft Windows, GNU/Linux, and other operating systems, usually require Protected Mode. Upon booting, the processor initiates itself into Real Mode, and then it begins loading programs automatically into RAM from ROM and the Hard Disk. A program inserted somewhere along the boot sequence may be used to put the processor into the Protected Mode.

Protected mode
In Protected Mode, a number of other advantages beyond just the additional memory addressability beyond the MS-DOS 1 MiB limit get activated. One of them is protected memory, which prevents programs from corrupting one another. Another one is virtual memory, which lets programs use more memory than is physically installed on the machine. And the third feature is task-switching (i.e. multitasking), which lets a computer juggle multiple programs all at once to look like they are all running at the same time.

The size of memory in Protected Mode is usually limited to 4 GiB. However, this isn't the ultimate limit of the size of memory in IA-32 processors. Through tricks in the processor's page and segment memory management systems, IA-32 operating systems may be able to access more than 32-bits of address space, even without the switchover to 64-bit. One such trick is known as Physical Address Extensions (PAE), which allows for 36-bit page addressing.

Virtual 8086 Mode
There was also a sub-mode of operation in Protected Mode, called Virtual 8086 Mode. This is basically a special hybrid operating mode which allowed old MS-DOS programs and operating systems to run while under the control of a Protected Mode supervisor operating system. This allowed for a great deal of flexibility in running both Protected Mode programs and MS-DOS programs simultaneously. This mode was added only with the IA-32 version of Protected Mode, it did not exist previously in the Intel 80286 16-bit version of Protected Mode.

Registers
The Intel 80386 has eight 32-bit general purpose registers for application use. There are 8 floating point stack registers. Later processors added new registers with their various SIMD instruction sets too, such as MMX, 3DNow!, and SSE.

There are also system registers that are used mostly by operating systems but not by applications usually. They are known as segment, control, debug, and test registers. There are six segment registers, used mainly for memory management. The number of control, debug or test registers varies from model to model.

General Purpose registers
The x86 general purpose registers are not really as general purpose as their name implies. That is because these general purpose registers have some highly specialized tasks that can often only be done by using only one or two specific registers. In other architectures, a general purpose register really is a general purpose register; that is you can use any register you like for any purpose you like. The x86 general purpose registers further subdivide into registers specializing in data and others specializing in addressing.

Also a lot of operations can be done either inside a register or directly inside RAM without requiring the data to be loaded into a register first. The 1970s heritage of this architecture shows through with this behaviour. However, with the advent of the 64-bit extensions to x86 in AMD64, this odd behaviour has now been cleaned up (at least in 64-bit mode). General purpose registers are now truly general purpose and they can be used interchangeably.

8-bit and 16-bit register subsets
8-bit and 16-bit subsets of these registers are also accessible. For example, the lower 16-bits of the 32-bit EAX registers can be accessed by calling it the AX register. Some of the 16-bit registers can be further subdivided into 8-bit subsets too; for example, the upper 8-bit half of AX is called AH, and the lower half is called AL. Similarly, EBX is subdivided into BX (16-bit) and BH and BL (8-bit).

General data registers
All of the four following registers may be used as general purpose registers. However each has some specialized purpose as well. Each of these registers also have 16-bit or 8-bit subset names.


 * EAX Accumulator (with a special interpretation for arithmetic instructions; a for accumulator)
 * EBX base register (used for addressing data in the data segment)
 * ECX counter (with a special interpretation for loops, c for counter)
 * EDX data register

General address registers
Used only for address pointing. They have 16-bit subset names, but no 8-bit subsets.


 * EBP base pointer (holds the address of the current stack frame)
 * ESI source index (for string operations)
 * EDI destination index (for string operations)
 * ESP stack pointer (holds the top address of the stack)
 * EIP instruction pointer (holds the current instruction address)

Floating point stack registers
Since the introduction of the Intel 80486, there are 8 x87 floating point registers, known as ST(0) through ST(7). Each register is 80-bits wide and stores numbers in the extended precision format of the IEEE floating point standard.

These registers are not accessible directly, but are accessible like a LIFO stack. The register numbers are not fixed, but are relative to the top of the stack; ST(0) is the top of the stack, ST(1) is the next register below the top of the stack, ST(2) is two below the top of the stack, etc. That means that data is always pushed down from the top of the stack, and operations are always done against the top of the stack. So you couldn't just access any register randomly, it has to be done in the stack order.

SIMD registers
MMX, 3DNow!, and SSE also added new registers of their own to the IA-32 instruction set.

MMX registers
MMX added 8 new registers to the architecture, known as MM0 through MM7 (henceforth referred to as MMn). In reality, these new registers were just aliases for the existing x87 FPU stack registers. Hence, anything that was done to the floating point stack would also affect the MMX registers. Unlike the FP stack, these MMn registers were fixed not relative, and therefore they were randomly accessible.

Each of the MMn registers are 64-bit integers. However, one of the main concepts of the MMX instruction set is the concept of packed data types, which means instead of using the whole register for a single 64-bit integer, two 32-bit integers, four 16-bit integers, or eight 8-bit integers may be used.

Also because the MMX's 64-bit MMn registers are aliased to the FPU stack, and each of the stack registers are 80-bits wide, the upper 16-bits of the stack registers go unused in MMX, and these bits are set to all ones, which makes it look like NaN's or infinities in the floating point view. This makes it easier to tell whether you are working on a floating point data or MMX data.

3DNow! registers
3DNow! was designed to be the natural evolution of MMX from integers to floating point. As such, it uses the exact same register naming convention as MMX, that is MM0 through MM7. The only difference is that instead of packing byte to quadword integers into these registers, one would pack single precision floating points into these registers.

The advantage of aliasing registers with the FPU registers is that the same instruction and data structures used to save the state of the FPU registers can also be used to save 3DNow! register states. Thus no special modifications are required to be made to operating systems which would otherwise not know about

SSE registers
SSE discarded all legacy connections to the FPU stack. This also meant that this instruction set discarded all legacy connections to previous generations of SIMD instruction sets like MMX. But it freed the designers up, allowing them to use larger registers, not limited by the size of the FPU registers. The designers created eight 128-bit registers, named XMM0 through XMM7. (Note: in AMD64, the number of SSE XMM registers has been increased from 8 to 16.)

But the downside is that operating systems had to have an awareness of this new set of instructions in order to be able to save their register states. So Intel created a slightly modified version of Protected Mode, called Enhanced Mode which enables the usage of SSE instructions, whereas they stay disabled in regular Protected Mode. An operating system that is aware of SSE will activate Enhanced Mode, whereas an unaware operating system will enter only into Protected Mode.

SSE is a SIMD instruction set that works only on floating point values, like 3DNow!. However, unlike 3DNow!, it severs all legacy connection to the FPU stack. Because it has larger registers than 3DNow!, SSE can pack twice the number of single precision floats into its registers. The original SSE was limited to only single precision numbers, like 3DNow!. The SSE2 introduced the capability to pack double precision numbers too, which 3DNow! had no possibility of doing since a double precision] number is 64-bit in size which would be the full size of a single 3DNow! MMn register. At 128-bit, the SSE XMMn registers could pack two double precision floats into one register. Thus SSE2 is much more suitable for scientific calculations than either SSE or 3DNow!, which were limited to only single precision.

Instructions
The original IA-32 instruction set has been evolved over time with the addition of the multimedia instruction updates. However, the ultimate evolution of IA-32 will be when it becomes 64-Bit, but of course at that point it cannot be called IA-32 any more. It is called x86-64 and the first implementation was AMD's AMD64. We cannot call it IA-64 as Intel and HP already saved this label for the Intel Itanium design and this design is not really an evolution which extends IA-32 but AMD64 is. AMD64 was the first x86-64 instruction set designed. Later, Intel followed by imitating AMD's design with what they call EM64T.

SIMD Multimedia Instruction Set updates
Various generations of IA-32 CPUs since have added several extensions to the original instruction set. They were known technically as SIMD instruction sets. However, more colloquially they were known as Multimedia instruction sets, because they were mainly used in multimedia entertainment software applications.

MMX
The MMX extensions were the first major upgrade. This was a set of integer-only SIMD instructions. This was co-introduced by Intel and AMD in their Pentium MMX and AMD K6 processors, in 1997. It shared its registers with the x87 FPU; therefore operating systems did not have to be modified to accept these instructions, they automatically worked if the operating system also supported x87 state-saving.

MMX was further upgraded with the addition of floating point SIMD capabilities, with the introduction of 3DNow! in early 1999. Like MMX, this set shared its registers with the x87 FPU too. This extension was introduced by AMD in the K6-2 processor, but it was never picked up by Intel.

SSE
SSE was single precision floating point SIMD introduced by Intel in late 1999, with the introduction of the Pentium III processor, codenamed Katmai. Unlike 3DNow!, it was not an extension to the MMX extension, nor did it share its registers with the x87 FPU. It required some modifications to operating systems for them to work. This added programming inconvenience was made up for by the fact that SSE worked unencumbered by any of the old limitations of the x87 FPU. This instruction set was adopted eventually by AMD starting with its Athlon XP processor; all further extensions to SSE will likely be adopted by AMD from now on, as it will no longer make any extensions to its own 3DNow! instructions.

SSE2
SSE2 was introduced in early 2001 with the introduction of the Intel Pentium 4 processor, with the Wilmette Core. This was a further upgrade to the original SSE, adding double precision operations to its bag of tricks. AMD introduced SSE2 support with the Clawhammer Athlon 64 Core, in 2003.

SSE3
SSE3 was introduced in early 2004, in an upgraded version of the Pentium 4, codenamed Prescott. It featured some minor tweaks to the SSE2 extensions. It was introduced by AMD in April 2005, with Revision E of their Athlon 64 Processor, using the Venice and San Diego Cores.

Next-generation 64-bit Instruction Sets
Two new instruction sets can claim to be the 64-bit successor to IA-32. One of them builds on top of IA-32 but has a different name, while the other one discards IA-32 completely but has a similar name.

IA-64
Intel's IA-64 architecture is not directly compatible with the IA-32 instruction set. It completely discards all IA-32 instructions, and starts from scratch with a completely different instruction set, based on the Very Long Instruction Word Architecture. It can run IA-32 instructions through an instruction emulator - basically a piece of software that translates IA-32 instructions into IA-64 instructions on the fly. However, since it was designed by Intel, the original creator of the IA-32 instruction set, it gets to keep the "IA" prefix ,despite the lack of any real familial connections to the IA family up to IA-32. The IA-64 architecture relies heavily on software and compiler optimisation.

It is the instruction set that is used inside their Itanium line of processors.

AMD64
AMD's AMD64 instruction set, aka x86-64, is largely built on top of IA-32, and thus maintains the x86 family heritage. While extending the instruction set, AMD took the opportunity to clean up some of the odd behaviour of this instruction set that has existed since its earliest 16-bit days, while the processor is operating in 64-bit mode. They also doubled the number of general purpose registers from 8 to 16; and the general purpose registers are now much more truly general-purpose registers. They also doubled the number of SSE registers from 8 to 16 as well. They have also deprecated most of the functionality of the segment registers, since their usage has steadily declined even during the IA-32 days.

EM64T
By February 2004, Intel implicitly acknowledged the logic of the AMD64 instruction set, and is now using it itself in its own products. The 64-bit Intel Xeon processors were the first processors Intel put out in the market which made use of the new instruction set (the first generation of Xeon processors, which were 32-bit, obviously did not use it), Intel, however, has called this EM64T. Intel has also prevented itself from ever using the term IA-64 for this architecture, since it has obviously already given that name to the architecture behind its Itanium processors. In Intel's very own white paper on EM64T, they left out any mention of it's origins at AMD.