Home >> Computers >> Programming >> Languages >> Assembly >> x86


  Assemblers and Linkers
Books
  FAQs, Help, and Tutorials
Source Code
   


x86 assembly language is the assembly language for the x86 class of processors, which includes Intel's Pentium series and AMD's Athlon series.

x86 instruction set architecture
A x86 processor & instruction set design is CISC; however, since a prevent of the 1990s the internal architecture moved towards being further of the RISC or VLIW design. Modern x86-processors translate their videos to RISC-prefer microcodes before it execute the children, generating the x86 a superscalar design as many microcodes could easy exist as manufactured to execute at another time. This behaviour is even so invisible to the assembly software engineer.

a modern x86 instruction placed is really a series of extensions of instruction sets that began by owning the Intel 8008 microprocessor. About to the full binary feebleminded compatiblity is actually present between the Intel 8086 chip across to a modern Pentium or even Athlon depending processor. (There are certain unusual exceptions, like a counted shift videos, corrections to a original PUSHA instruction, a few orphan Intel 80286 semantics, a dropped LOADALL instruction, & the Pentium Four returning au fait accurate FPU operation numbers.) Apiece consecutive instruction extension has been either only directly added, or even accompanied by adding execution modes to the processor.

The various kinds of instructions
Generally, a modern x86 instructiwithin placed is variable length & alignment independent (encoded when little endian, when is completely information in a x86 architecture), concerted general (virtually all whole number videos may utilise any combination of GPRs) & inexplicit register usage (MUL & DIV have an inexplicit D:A dual output register combination & swimming point videos universally utilize st(Cypher) when of these its parameters), ii oper& (that is to say, a 1st register is ordinarily utilized for two input & output), supports various complex addressing modes (including quick addressing, offset addressing, & scaled stock addressing), contains favorite trend lines for atomlike instuctions (XCHG, CMPXCHG(8B), XADD, & whole number videos which combine by having the LOCK prefix), includes swimming point (to a fold of registers) & whole number videos, garden truck misdirect flags implicitly (across virtually all whole number ALU instruction) & explicitly (via the CMP instruction) and SIMD videos (videos which perform parallel genus pan only videos on numerous operands encoded in adjacent cells of wider registers)

The stack
the x86 processor as well comes sustaining a built-inherent execution fold mechanism. A call for/RET & a INT/IRET videos utilize a properly install fold to save & restore call-go to points. Videos such as ENTER/LEAVE, or even more straight manipulations of the fold register (ESP) may be utilized for saving local information in the fold. A instruction architecture too includes PUSH/POP videos for straight usage of the fold for whole number & location quantities. When these are a share of the instruction architecture, this simplifies ABI specifications sustaining respect to "call stack" computer software trend lines mechanisms every bit equated by owning RISC architectures which must become further expressed all about call for fold details. A fold occurs as crucial a portion of the architecture that is universally active.

Execution modes
A processor supports many modes of operation where a few videos come available & a few are not. The Sixteen-bit subset of videos come available around "REAL-MODE" (available since a 8086) or even "v86-MODE" (available since a Intel 80386). Around "32 bit protected mode" (available around processors starting by owning a Intel 80386) or even "legacy mode" (available whenever 64 bit extensions come enabled), 32-bit videos (+ SIMD videos) come available. Inside "long mode" (available since a AMD Opteron processor) 64-bit videos come available.

A instruction placed is according to similar ideas inside both mode, however involves different ways of accessing memory & so employs different programming strategies.

For tools on the assembly language in the several mode, view: Assembly in real mode Assembly in protected mode Assembly in long mode

Real mode
Rattling mode is mostly Sixteen-bit, however since a 80386 these are imaginable to utilise 32-bit registers therein mode. These are too imaginable to enable unfair 32-bit addressing inside really mode across the bug/feature that appears under certain conditions whenever switching from either secure mode back to real mode. A bit of DOS extenders produce utilize of this to make it imaginable to access supplementary than Ace megabyte of RAM. This bug-mode is occasionally known as unreal mode by assembly programmers.

Protected mode
Secure mode (or even 386 secure mode) enables fully 32-bit addressing, paging, memory protection, hardware-trend lines for multitasking, a few other registers & occasionally freshly videos to manage a 32-bit addressing. A instruction placed inside secure mode is perfectly feebleminded compatible by using a of these utilized inside real mode.

Xvi-bit secure mode besides is, however is nigh never utilized. It was utilized within early operating systems that required memory protection. A mode delivers Xxiv-bit addressing, which gives the utmost capability of Xvi megabytes of memory. Occasionally early Unix operating systems and OS/2 1.x used this mode.

Long mode
Long mode, as a subset of AMD64 is a mode that enables 64-bit addressing, 64-bit extensions of most registers & a select few newly 64-bit registers likewise. These are mostly an extension of a 32-bit instruction placed, however unlike the 16 -> 32 bit transition, numerous videos were come by the 64 bit mode. This doesn't affect actual binary feebleminded compatibility (which would execute bequest code around more modes that locate trend lines for people videos) however changes a way assembly program & compilers for recently code use at times to operate.

x86-personal processors boot into real mode for backward compatibility by owning a older 8086 class of processors. Generally, a operating system is responsible for shift to protected mode if it so wishes. Recently, long mode has been created as a 64-bit environment for the x86-processors. Processors using a ability to vary into long mode are said to belong to the x86-64-family. AMD Athlon64 is one x86-64 processor. To switch to long mode, the processor has to number one switch from either very mode to protected mode, then to yearn mode.

A way x86-processors may vary their processor-mode is remotely similar to reconfigurable computing.

Itanium

Itanium-processors can also do x86-code however by owning important performance degradation. Contrary to a bit of claims, a Itanium doesn't actually emulate the x86 videos within software system, however such as maps x86 videos to the sale level Itanium primitives very much prefer a modern Pentium-like chip maps its instructions to its RISC-like core. a understanding how come the x86 instructiaround execute then slowly on Itanium is that Single) A Itanium has a lot moo clock rate than a Pentium & Two) the primitives in, & general architecture of the Itanium come non a good match for even x86 videos & semantics, & so videos are typically mapped to hanker or suboptimal Itanium primitives. For instance, a Itanium doesn't stand rename registers, trend lines for away from choose execution, nor does it compute flags implicitely when a share of whole number ALU computations whereas totally modern x86 processors launder.

Integer registers
A x8Six inside real mode contains 6 general Sixteen-bit registers (Axe, BX, CX, DX, SI, DI) Two favorite fold registers (BP & SP) a single 16-bit flags register (FLAGS) & Tetrad section registers (CS, SS, DS, ES). A 1st Four of the general registers, themselves split into top & bottom half Octad-bit registers (Axe = AH:AL, BX = BH:BL, CX = CH:CL, DX = DH:DL) which are then independently usable around Octad-bit instruction forms. A instruction pointer (IP) register lives, however is evidently simply utilized inside an inexplicit manner (though its value may be stored on the fold & accessed forswearing condition).

Addressing is Sixteen-bit & segmented, meaning that a actual location is from Section * Xvi + OFFSET. Segments come either inexpressed or even processed explicit vithe a section override. By default a general registers come assumed to utilise a DS (information) section, a fold registers come assumed to utilise a SS (fold) section, & IP is assumed to apply a CS (code) section. This segmented architecture provide addressing good the trifle above 1MB of memory, yet in 64K can be addressed by using a given section at any in one case. This likewise stimulated low confusion by having something known as a "A20" line, since a location from either 0x100000 to 0x10FFEF may technically become addressed though early systems usually didn't produce that more 64K available (& ended higher wrapping a location, by dropping a top bit instead) when late systems did non since a x86 evolved shipway of addressing to a higher degree 1MB of memory. Segmentation too created swell complications for C compiler implementors world health organization introduced odd pointer modes like "near", "far" & "huge" to leverage a inexplicit nature and severity of metameric architecture to different degrees.

Starting using a Intel 80386 processor, a x86 within 32-bit secure mode extended a Sixteen bit registers to 32 bits (EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP, EFLAGS, EIP). A older Xvi-bit registers were overlayed by owning a bottom half of the 32-bit registers & can be accessed by owning an instruction override. No "high-half" Xvi-bit register access; instead, Intel chose to generalize a addressing thus that each register can be utilized for scaled stock addressing, so that EBP can be utilized as a general register, besides as a fold register.

A section registers there is no yearn perform any sort of shift-offset around an attempt at increasing location space, however like point to technique-level specific structures known as "selectors". These selectors contain in a babies pointers to the specific linear location ranges which dislodge & attached the offset location. By default amount addressing is limited to 32-bits, still, there is the site extension mode (1st added in the Intel Pentium Pro) which allows even more, an extra Quaternion-bits of addressing. However such as a Xvi-bit segmented architecture, offset addressing is however limited to 32-bits. Unlike on the Sixteen-bit architectures, C compiler vender & Operating Models trafficker did non widely trend lines this extended addressing mode, & so these are only usually non utilized.

Starting using a AMD Opteron processor, a x86 within 64-bit yearn mode (as a subset of AMD64 or x86-64 mode) extended the 32-bit registers in kind that 32-bits did prior to it (RAX, RBX, RCX, RDX, Macd, RDI, RBP, RSP, RFLAGS, RIP) all the same AMD likewise added 8 extra 64-bit registers (R8, R9, ..., R15).

A addressing modes were non drammatically changed from either 32 bit mode, except that addressing was reach 64 bits, physical addressing is nowadays sign extended (thus memory universally adds equally to the top & bottom of memory; note that this doesn't affect linear or even virtual addressing), & more selector details keep around been dramatically reduced.

Floating point stack
Starting sustaining a Intel 8087 swimming point coprocessor (& 1st integrated as a standard extension of a x86 architecture in the Intel 80486DX chip) the x86 processor includes an 8-entry Lxxx-bit swimming point fold using separately selectable entries (st(Zero), st(Unity), ..., st(Septenary), in which st(Cypher) is universally a top entry of the fold). Swimming point videos could click entries onto a fold, or even pop a top entry remove. When one of its deuce operands, a swimming point instruction might choose any fold entry, nevertheless the more must exist as st(Zero). A FXCH instruction too is as a convenience, to allow a swapping of any pair of fold entries. a swimming point fold entry is exclusively valid whenever it has antecedently been pushed onto the fold. A swimming point videos underst& and write information to memory utilizing whole number addressing to swimming point values that come 32-bit, 64-bit, Fourscore-bit, or even even as a matter of fact whole number values that come 32 or 64 bit. Inexplicit numeral format conversions come performed when necessary.

A x86 swimming point videos could work inside one of Triplet conceivable execution modes by having respect to operand size: 32-bit, 64-bit or even Eighty-bit, too when various rounding modes. For compatibility (by using external nin-x86 sources, like information generated on a RISC processor, which might often trend lines single 64-bit mode) reasons, a size mode is commonly placed to 64-bit, & the rounding mode is placed to (TBD). Even so, older C compilers, & for accuracy purposes especially within Fortran compilers, occasionally Fourscore-bit mode is utilized.

Note that a AMD64 did non add extra entries to a swimming point fold, though the extra whole number registers may be utilized for memory addressing.

SIMD registers
Starting by owning a mid-90s, many SIMD instruction extensions stand been introduced. You may break victims down on text by their marketing list.

MMX registers
These instruction sets mapped 8 64-bit SIMD registers (MM0, MM1, ..., MM7) in top of a swimming point fold, however did non adopt the fold-such as semantics. A cause for this mapping was therefore that existent operating systems can however aright save & restore a register state whilst multitasking while forgoing modifications. SIMD videos could every which way access any of its SIMD registers in any instruction. To refrainside from confusion by owning swimming point videos, these SIMD videos must execute in hinders bracketted per EMMS (or even FEMMS in the 3DNow!) instruction. a instructions implicitly clear the FP fold as a side outcome, then & FP entries come clearly misused following of the instruction.

A format one registers depends on the videos utilizing the children. MMX book of instructions utilizes the two as a pair of genus pan 32-bit whole number values, or even even Four 16-bit whole number values or Eighter from decatur whole number Eighter from decatur-bit values. A 3DNow! operating instructions utilizes the babies as a pair of 32-bit swimming point values.

Note that a AMD64 did non add extra MMX registers, though a extra whole number registers may be utilized for memory addressing.

Streaming SIMD extensions
Starting by owning a Intel Pentium III, these instructions sets utilized Octet fresh 128-bit registers known as SSE registers (XMM0, XMM1, ..., XMM7). SIMD videos might haphazardly access any of its SIMD registers in any instruction. Intel & followed by AMD added additional SIMD instruction sets, however utilized these equivalent registers until AMD introduced a AMD64 yearn mode execution mode. AMD64 only extends a total of registers to 16 128-bit registers (XMM8, XMM9, ..., XMM15), & extends the instructions to become respire to utilise any one registers.

A format one registers depends on the videos applying the two. A original SSE instruction placed utilizes the two when Quaternary pan 32-bit swimming point values. SSE-Two allows usage of the babies when Deuce coinciding 6Little joe-bit swimming point values, 4 coincident 32-bit whole number values, Octet cooccurring Xvi-bit whole number values, or even Xvi concurrent Octad-bit values.

Instruction overview
As the CISC processor, a x86 offers a heavy total of videos of varying capabilities.

Integer ALU instructions
x86 assembly has a standard mathematical operations, add, sub, mul, by having idiv; a logical operators and, or even, xor, neg; bitshift arithmetic and logical, sal/sar, shl/shr; rotate by having & while forgoing carry, rcl/rcr, rol/ror, the complement of BCD arithmetical videos, abdominal aortic aneurysm, aad, daa & others.

Floating point instructions
x86 (since the 80486DX processor) assembly language includes a fold depending swimming point unit which might perform addition, subtraction, negation, multiplication, section, remainder, square roots, whole number truncation, fraction truncation, & shell by power of deuce. a operations likewise include conversion videos which potty bucket or even even store a value from either memory in any of the charted formats: Binary star coded decimal, 32-bit whole number, 64-bit whole number, 32-bit swimming point, 64-bit swimming point or Eighty-bit swimming point (upon loading, the value is converted to the presently swimming point mode). a x86 too includes a total of nonnatural functions including sin, cos, tangent, inverse tangent, involution by owning the base Ii & log to bases Two, X, or even e.

A fold register to fold register format of the instuctions is unremarkably F(OP) st, st(*) or even F(OP) st(*), st. In which st is same to st(Nought), & st(*) is one of a Eighter fold registers (st(Cipher), st(Unity), ..., st(Eighter from decatur)) Rather a whole number, a number 1 oper& is each a number one source operand and a destination operand. FSUBR & FDIVR should become singled out when number 1 swapping a source operands prior even to performing a subtraction or section. A addition, subtraction, multiplication, section, store & comparison videos include instruction modes that may pop a top of the fold when their operation is complete. Therefore e.g. FADDP st(I), st performs a calculation st(I) = st(I) + st(Cipher), so removes st(Nought) from either a top of fold, so making what was a symptom inside st(Single) a top of the fold within st(Nought).

SIMD instructions
Modern x86 CPUs contawitharound SIMD videos, which largely perform the equivalent operatiin in parallel on numerous values encoded in a wide SIMD register. Various instructiin technologies trend lines different operations in different register sets, however taken when complete whole (from either MMX, to SSE-Three) it include general computations on whole number or even even swimming point arithmetical (addition, subtraction, multiplication, shift, minimisation, maximization, comparison, section or square root). Then for instance, PADDW MM0, MM1 performs Four parallel Xvi-bit (indicated per W) whole number adds (indicated per PADD) of mm0 values to mm1 & places a effect around mm0. SSE & SSE-Two as well include swimming point modes where sole a super number 1 value of the registers is actually modified. Another unusual videos st& been added including the total of absolute differences (utilized for motion estimation inside streaming videos processing, like is waste MPEG) and the Xvi-bit multiply accumulation instruction (utile for programs-depending alpha-blending). SSE-3 and 3DNow! Extensions, include addition and subtraction instructions for treating paired floating point values like complex numbers.

These instruction sets besides include many fixed sub-word videos for shuffling, inserting & extracting a values about in a registers. Additionally there are videos for moving information between a whole number registers & SSE/MMX registers.

Data manipulation instructions
the x86 processor even likewise includes complex addressing modes for addressing memory by owning an quick offset, the register, the register by having an offset, the scaled register sustaining or forgoing an offset, & a register by having an optional offset & a second scaled register. Thus e.g., of these may encode mov eax, [Table + ebx + esi*4] when a only instruction which loads 32 bits of information from either either a location computed as (Table + ebx + esi * Four) offset from a DS selector, & places it to the eax register. Generally a x86 processor potty batch & utilize memory matched to the size of any register these are operating in. (A SIMD videos besides include half-batch videos.)

a x86 instruction place500 includes string machine load, store & move videos (LODS, STOS, & MOVS) which perform to each one operation to a specified size (B for Octet-bit byte, W for Xvi-bit word, D for 32-bit double word) so increments the inexplicit location register (SI for LODS, DI for STOS, & two for MOVS). For even a bucket & store, the inexplicit target/source register is in the AL, Axe or EAX register (based in size.) A inexplicit section utilized is DS, except for MOVS which utilizes ES for the store & DS for the machine load. Around modern x86 processors, these complex videos don't offer any performance benefit across thomas more just implemented separate load/store & location increment videos.

A fold is au fond implemented as an implicitly decrementing (click) & incrementing (popular) fold pointer. Around Xvi-bits, this inexplicit fold pointer is addressed when SS:[SP], in 32-bits its SS:[ESP], & inside 64-bits its SS:[RSP] (TBD: Is this admittedly? We.e., around 64-bits would it be actually DS:[RSP]?). A fold pointer actually points to a next value that is stored, under a assumption that its size may match the in operation mode of the processor (i personally.e., Sixteen, 32, or even 64 bits) to match a default breadth of the PUSH/POP/CALL/RET videos. As well involved come the instructions ENTER & LEAVE which reserve & dislodge information from either the top of the fold when setting higher a fold frame pointer around BP/EBP/RBP. But, straight setting, or even addition & subtraction to a SP/ESP/RSP register is likewise supported, therefore the ENTER/LEAVE videos come typically unneeded. More videos for manipulating a fold include PUSHF/POPF for storing & retrieving a (E)FLAGS register. A PUSHA/POPA videos may store & retrieve a entire whole number register state to & from either a fold.

Values for even the SIMD batch or store come assumed to become packed around adjacent positions for the SIMD register & may align the children within serial little-endian sequentially. A bit of SSE bucket & store videos demand Xvi-byte alignment to work properly. A SIMD instruction sets as well include "prefetch" videos which perform a batch however don't target any register, utilized for cache loading. the SSE instruction sets too include non-temporal store videos which may perform places straight to memory while forgoing performing a cache allocate in case the destination is non already cached (otherwise it may behave prefer a regular store.)

Virtually all generic whole number & swimming point (however there is no SIMD) videos potty have a single parameter when a complex location as a 2nd source parameter. Whole number videos can as well assume a single memory parameter as a destination operand.

Programming flow
A x86 assembly has an unconditional go for it operation, jmp which potty choose an quick location, the register or even an indirect location as a parameter. (Note that virtually all RISC processors sole trend lines the hyperlink register or even short quick displacement for jumping.)

Likewise supported come many misguide jumps, including je (climb on equality), jne (mount inequality), jg (bestride greater than, sign-language), jl (get on to a lesser degree, gestural), ja (bestride above/greater than, unsigned), jb (bestride below/less than, unsigned). These misdirect operations come according to a state of specific bits in the (E)FLAGS register. Numerous arithmetic & logical operatiin placed, clear or even complement these flags based on their effect. A comparison cmp (compare) & line 3 text videos placed a flags when whenever it got performed the subtraction or even a bitwise & operation, severally, forgoing altering the values of the operands. There are as well videos like clc (clear carry flag) & cmc (complement carry flag) which operate on the flags directly. Swimming point comparisons come performed via FCOM or even FICOM videos which one of these days develop to exist as converted to whole number flags.

Apiece go for it operation has deuce-ace different forms, based on the size of the operand. The short go for it utilizes an Eighter from either decatur-bit sign operand, which occurs as relative offset from a todays instruction. Within real mode or even Sixteen-bit secure mode, the nigh go for even it utilizes the Sixteen-bit or unsigned operand as an location relative to the todays section base; around 32-bit secure mode, the nigh go for even it occurs as Sixteen-bit or 32-bit sign proportional offset similar to the short go for it. The far go for it is a single that utilizes a fully section base:offset value as an absolute location. There are as well indirect & indexed forms of both one.

Additionally to a elementary go for it operations, there are the call for (call for the routine) & ret (go to from either procedure) videos. Prior to transferring control to the function, call for pushes a section offset location of a instruction ensuing the call for onto a fold; ret pops this value off a fold, & jumps thereto, profits giving a flow of control to it a portion of the program. In the instance of the far call for, a section base is pushed below a offset.

There are as well deuce similar videos, int (interrupt), which saves the todays register values on the fold, so performs a far call for, except that instead of an location, it utilizes an interrupt vector, an stock into the table of interrupt handler addresses. A matching go to from either interrupt instruction is iret, which restores a register values fallowing giving. Soft Interrupts of the nature & severity described above come utilized by a few operating systems for supervisor call instruction, and can too exist as utilized around debugging arduous interrupt handlers. Difficult interrupts come triggered by external devices cases.

Atomic instructions
the x86 instruction placed architecture includes a mechanism for performing formulas-wide atomlike operations. This ensures that an entire instruction is executed forswearing interruption of sub-operations by more technique BUS mastering gear (like graphics cards, DMA operations or even more CPUs). This is an significant feature that is utilized for multiprocessor shared out resource critical sections.

A atomlike mechanism may be manifest explicitly using an instruction prefix known as lock. This prefix may be applied to certain whole number videos which may perform the machine load, ALU operation, so store inside one instruction. A XCHG instruction implicitly behaves when in case a lock prefix were given thereto.

This may be seen when virtually all utile for videos like CMPXCHG(8B) which performs the bucket, equivalence, so based on the outcomes of the compare, the misguide exchange operation. A second simpler case is the BTS instruction which performs an bit batch to the carry flag, so bit placed store operation.

This mechanism compares favourably thereto utilized in the PowerPC architecture. the PowerPC utilizes a "load reserved" instruction which locks the applicable cache line when the successful bucket operation, however must exist as unlocked using a subsequent "store reserved" instruction. Apiece PowerPC potty trend lines at the most 1 great "reserved line", and so care must become taken that pre-emptions don't interfere by having a desired minute semantics between a execution of the 2 videos. For instance, in case an instruction between a machine machine load reticent & store reserved drives a report fault, so a web page fault handler would non exist as respire to have the presently occupy "load reserved" mechanism itself to assure its have atomicity when however supporting a original code path's atomicity. (Note that occasionally PowerPC motherboards trend lines patterns BUS locking, nevertheless this occurs as only spherical mechanism, & doesn't location a nested lock condition.)

By isolating lock primitives to individual complete (& CISC-prefer) videos, a x86 might implement minute operations possibly withinside user-level code in isolation independently from either more atomlike primitives in the formulas.

North.B.: A above statements come wrong: around exceptions like interrupts, preemptions or even report faults, a exception handler must simply stamp out a existent reservation & potty so apply reservations for its have locks; entirely this means is that when you link to from either a exception, a lock code notices that it did non succeed in locking, because it has misplaced a reservation, & must try once again. There's there are no "nested locking" mechanism in intel either; in case you take a web page fault in the lock comparison & exchange - loop if failing sequence, it's au fond a equivalent tool. A lock prefix applies exclusively to the instruction it's prefixed to, that's a lot.

Fidonet's 80xxx Snippets
File site for 80x86 Assembly Language enthusiasts.

Intel Developer Home
Technical product specifications, documentation, support, tools. Almost all you need to know about Intel products.

The X (86) Files
A few articles, links, downloads.

terse
Algebraic assembly language employing prefix, infix, and postfix notation. Assembler simplified. Biggest advance in low-level programming since Macro Assembler. All the control of assembler, with the ease-of-use and look-and-feel of high-level languages. Low cost.

Iczelion's Win32 Assembly HomePage
Tutorials, documentation and examples links collection.

obock.de
This site is about Win32 assembler. Introductions, tutorials, sourcecodes and freeware.

Dolphinz Home Page
This page is devoted to Win32 programming in assembly language, and it contains tutorials, articles, tools, resources, and some code for Windows 95/98 using assembly language.

Power Assembler 32
An IDE for assembly language programming for Windows. [Shareware]

MadWizard.org
Tutorials, downloads and links about programming assembly in windows, as well as a public snippet library.

Koms Bomb Assembly World
Resource for assembly. Tools, articles, links.


Computers: Hardware: Components: Processors: x86
Computers: Programming: Disassemblers: DOS and Windows
Computers: Programming: Languages: Open Source
Computers: Software: Operating Systems: x86





© 2005 GeneralAnswers.org