6502 “Illegal” Opcodes Demystified
A closer look at the “illegal” opcodes and undocumented instructions of the MOS 6502 MPU.
The instruction table of the MOS 6502 MPU, designed by MOS Technology and introduced in 1975 (the CMOS version, 65C02, was developed by Western Design Center) has some obvious gaps, with just 56 instructions documented in various address modes. This leaves 105 undocumented slots — and the 6502 community has been eager to fill these gaps, ever since.
Still, there’s some mystery left and there are questions unanswered, like, were at least some of them intentional (especially, since some of them are handy for block transfer, something the Z80 has dedicated instructions for) or are they all by accident, how do they behave, and why so? Here, we’ll try to come up with some answers to these questions.
First, let's have a look at the instruction table, as it is commonly presented, with the blank gaps filled in. (Here, for the “illegal” opcodes, we use the mnemonics used by the DASM and ACME assemblers, with the exception of “USBC
” for instruction code $EB
, where these use plain “SBC
”.)
And here are all the 21 (more or less) “illegal” opcodes (alternative names given in parentheses) as they are commonly described:
- ALR (ASR)
-
AND oper + LSR
A AND oper, 0 -> [76543210] -> C
addressing assembler opc bytes cycles immediate ALR #oper 4B 2 2 - ANC
-
AND oper + set C as ASL
A AND oper, bit(7) -> C
addressing assembler opc bytes cycles immediate ANC #oper 0B 2 2 - ANC (ANC2)
-
AND oper + set C as ROL
effectively the same as instr. 0B
A AND oper, bit(7) -> C
addressing assembler opc bytes cycles immediate ANC #oper 2B 2 2 - ANE (XAA)
-
* AND X + AND oper
Highly unstable, do not use.A base value in A is determined based on the contets of A and a constant, which may be typically $00, $ff, $ee, etc. The value of this constant depends on temerature, the chip series, and maybe other factors, as well.
In order to eliminate these uncertaincies from the equation, use either 0 as the operand or a value of $FF in the accumulator.(A OR CONST) AND X AND oper -> A
addressing assembler opc bytes cycles immediate ANE #oper 8B 2 2 †† - ARR
-
AND oper + ROR
This operation involves the adder:
V-flag is set according to (A AND oper) + oper
The carry is not set, but bit 7 (sign) is exchanged with the carryA AND oper, C -> [76543210] -> C
addressing assembler opc bytes cycles immediate ARR #oper 6B 2 2 - DCP (DCM)
-
DEC oper + CMP oper
M - 1 -> M, A - M
addressing assembler opc bytes cycles zeropage DCP oper C7 2 5 zeropage,X DCP oper,X D7 2 6 absolute DCP oper CF 3 6 absolut,X DCP oper,X DF 3 7 absolut,Y DCP oper,Y DB 3 7 (indirect,X) DCP (oper,X) C3 2 8 (indirect),Y DCP (oper),Y D3 2 8 - ISC (ISB, INS)
-
INC oper + SBC oper
M + 1 -> M, A - M - C -> A
addressing assembler opc bytes cycles zeropage ISC oper E7 2 5 zeropage,X ISC oper,X F7 2 6 absolute ISC oper EF 3 6 absolut,X ISC oper,X FF 3 7 absolut,Y ISC oper,Y FB 3 7 (indirect,X) ISC (oper,X) E3 2 8 (indirect),Y ISC (oper),Y F3 2 8 - LAS (LAR)
-
LDA/TSX oper
M AND SP -> A, X, SP
addressing assembler opc bytes cycles absolut,Y LAS oper,Y BB 3 4* - LAX
-
LDA oper + LDX oper
M -> A -> X
addressing assembler opc bytes cycles zeropage LAX oper A7 2 3 zeropage,Y LAX oper,Y B7 2 4 absolute LAX oper AF 3 4 absolut,Y LAX oper,Y BF 3 4* (indirect,X) LAX (oper,X) A3 2 6 (indirect),Y LAX (oper),Y B3 2 5* - LXA (LAX immediate)
-
Store * AND oper in A and X
Highly unstable, involves a 'magic' constant, see ANE
(A OR CONST) AND oper -> A -> X
addressing assembler opc bytes cycles immediate LXA #oper AB 2 2 †† - RLA
-
ROL oper + AND oper
M = C <- [76543210] <- C, A AND M -> A
addressing assembler opc bytes cycles zeropage RLA oper 27 2 5 zeropage,X RLA oper,X 37 2 6 absolute RLA oper 2F 3 6 absolut,X RLA oper,X 3F 3 7 absolut,Y RLA oper,Y 3B 3 7 (indirect,X) RLA (oper,X) 23 2 8 (indirect),Y RLA (oper),Y 33 2 8 - RRA
-
ROR oper + ADC oper
M = C -> [76543210] -> C, A + M + C -> A, C
addressing assembler opc bytes cycles zeropage RRA oper 67 2 5 zeropage,X RRA oper,X 77 2 6 absolute RRA oper 6F 3 6 absolut,X RRA oper,X 7F 3 7 absolut,Y RRA oper,Y 7B 3 7 (indirect,X) RRA (oper,X) 63 2 8 (indirect),Y RRA (oper),Y 73 2 8 - SAX (AXS, AAX)
-
A and X are put on the bus at the same time (resulting effectively in an AND operation) and stored in M
A AND X -> M
addressing assembler opc bytes cycles zeropage SAX oper 87 2 3 zeropage,Y SAX oper,Y 97 2 4 absolute SAX oper 8F 3 4 (indirect,X) SAX (oper,X) 83 2 6 - SBX (AXS, SAX)
-
CMP and DEX at once, sets flags like CMP
(A AND X) - oper -> X
addressing assembler opc bytes cycles immediate SBX #oper CB 2 2 - SHA (AHX, AXA)
-
Stores A AND X AND (high-byte of addr. + 1) at addr.
unstable: sometimes 'AND (H+1)' is dropped, page boundary crossings may not work (with the high-byte of the value used as the high-byte of the address)
A AND X AND (H+1) -> M
addressing assembler opc bytes cycles absolut,Y SHA oper,Y 9F 3 5 † (indirect),Y SHA (oper),Y 93 2 6 † - SHX (A11, SXA, XAS)
-
Stores X AND (high-byte of addr. + 1) at addr.
unstable: sometimes 'AND (H+1)' is dropped, page boundary crossings may not work (with the high-byte of the value used as the high-byte of the address)
X AND (H+1) -> M
addressing assembler opc bytes cycles absolut,Y SHX oper,Y 9E 3 5 † - SHY (A11, SYA, SAY)
-
Stores Y AND (high-byte of addr. + 1) at addr.
unstable: sometimes 'AND (H+1)' is dropped, page boundary crossings may not work (with the high-byte of the value used as the high-byte of the address)
Y AND (H+1) -> M
addressing assembler opc bytes cycles absolut,X SHY oper,X 9C 3 5 † - SLO (ASO)
-
ASL oper + ORA oper
M = C <- [76543210] <- 0, A OR M -> A
addressing assembler opc bytes cycles zeropage SLO oper 07 2 5 zeropage,X SLO oper,X 17 2 6 absolute SLO oper 0F 3 6 absolut,X SLO oper,X 1F 3 7 absolut,Y SLO oper,Y 1B 3 7 (indirect,X) SLO (oper,X) 03 2 8 (indirect),Y SLO (oper),Y 13 2 8 - SRE (LSE)
-
LSR oper + EOR oper
M = 0 -> [76543210] -> C, A EOR M -> A
addressing assembler opc bytes cycles zeropage SRE oper 47 2 5 zeropage,X SRE oper,X 57 2 6 absolute SRE oper 4F 3 6 absolut,X SRE oper,X 5F 3 7 absolut,Y SRE oper,Y 5B 3 7 (indirect,X) SRE (oper,X) 43 2 8 (indirect),Y SRE (oper),Y 53 2 8 - TAS (XAS, SHS)
-
Puts A AND X in SP and stores A AND X AND (high-byte of addr. + 1) at addr.
unstable: sometimes 'AND (H+1)' is dropped, page boundary crossings may not work (with the high-byte of the value used as the high-byte of the address)
A AND X -> SP, A AND X AND (H+1) -> M
addressing assembler opc bytes cycles absolut,Y TAS oper,Y 9B 3 5 † - USBC (SBC)
-
SBC oper + NOP
effectively same as normal SBC immediate, instr. E9.
A - M - C -> A
addressing assembler opc bytes cycles immediate USBC #oper EB 2 2 - NOPs (including DOP, TOP)
-
Instructions effecting in 'no operations' in various address modes. Operands are ignored.
opc addressing bytes cycles 1A implied 1 2 3A implied 1 2 5A implied 1 2 7A implied 1 2 DA implied 1 2 FA implied 1 2 80 immediate 2 2 82 immediate 2 2 89 immediate 2 2 C2 immediate 2 2 E2 immediate 2 2 04 zeropage 2 3 44 zeropage 2 3 64 zeropage 2 3 14 zeropage,X 2 4 34 zeropage,X 2 4 54 zeropage,X 2 4 74 zeropage,X 2 4 D4 zeropage,X 2 4 F4 zeropage,X 2 4 0C absolute 3 4 1C absolut,X 3 4* 3C absolut,X 3 4* 5C absolut,X 3 4* 7C absolut,X 3 4* DC absolut,X 3 4* FC absolut,X 3 4* - JAM (KIL, HLT)
-
These instructions freeze the CPU.
The processor will be trapped infinitely in T1 phase with $FF on the data bus. — Reset required.
Instruction codes: 02, 12, 22, 32, 42, 52, 62, 72, 92, B2, D2, F2
Legend to markers used in the instruction details:
- *
- add 1 to cycles if page boundary is crossed
- †
- unstable
- ††
- highly unstable
Disclaimer:
Information is provided as-is, without any guarantee of completness or correctness.
None of these “illegal” instructions are guaranteed to work, some are highly unstable, some may even start two asynchronous threads competing in race condition with the winner determined by such miniscule factors as temperature or minor differences in the production series, at other times, the outcome depends on the exact values involved and the chip series.
Use with care and at your own risk.
Well, this is all fine and good, but… we really do not learn much about hat they are and why these are.
Let’s risk another look at the instruction layout, as it ought to be viewed.
Another Look at the Instruction Layout
The 6502 instruction table is laid out according to a pattern a-b-c
, where
a
and b
are octal numbers, followed by a group of two binary digits c
,
as in the bit-vector “aaabbbcc
”.
a | a | a | b | b | b | c | c | |
bit | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
---|---|---|---|---|---|---|---|---|
(0…7) | (0…7) | (0…3) |
Example:
All ROR
instructions share a = 3
and c = 2 (3b2
) with the address mode in b
.
At the same time, all instructions addressing the zero-page share b = 1 (a1c
).
abc = 312=> ( 3 << 5 | 1 << 2 | 2 ) = %011.001.10 = $66 “ROR zpg”.
If we arrange the instruction table by components c
, a
and b
, we find them all neatly lined up per address mode in the vertical columns (with the notable exception of instructions related to the X register, which show up with their respective Y counterpart for address modes involving an index by X). Notably, all the “illegals” adhere strictly to this scheme.
Moreover, all the instructions internal to the CPU and its flow of control are listed in the top quarter at c=0
, while the bottom quarter at c=3
, where we find the majority of “illegal” opcodes, is completely unpopulated by official opcodes. Further, for sections, where c=1
or c=2
, we see opcodes of a kind sharing the same row (with the notable outliers of the two stack transfer instructions “TXS
” and TSX
).
While this certainly informative, it still doesn’t give away a systemic aspect of the unimplemented instructions, nor does this view tell us what they really are.
So let’s give this another try, this time arranging the instruction layout by components a
, c
and b
:
Well, this is better, much better.
NOPs
First, we learn what the additional NOP
s really are. By comparing opcodes by row and address modes by column, we can clearly see, what these ought to be.
E.g.,
$80 (a=4, c=0, b=0)
is clearly “STY immediate
”, attempting to store the the contents of the Y register in the literal operand.
Generally speaking, these additional NOP
s are instructions with non-functional or nonsensical address modes, which do execute, but without any external effects.
JAMs
However, instructions of this group which involve indirect addressing fail entirely with the CPU infinitely trapped in T1 phase, resulting in a “JAM
” (or KIL
), rendering the CPU unresponsive and requiring a reset.
Instructions at ‘C = 3’
This is the really interesting part, the meat of the “illegal opcodes”.
Generally, we may observe that any of the instructions at c=3
are really inheriting their behavior from those at c=1
and c=2
in the same slot, found in the rows immediately above, same column, using the address mode of the instruction at c=1
. Mind that in binary 3
is the composite of 1
and 2
with bits 0 and 1 set.
In other words, any instruction xxxxxx11
will execute the instructions at xxxxxx01
and xxxxxx10
at once, using the address mode of the instruction at xxxxxx01
. (However, the general rule regarding X and Y register specific indexed address modes still applies.)
E.g.,
“SAX abs
” ($8F
, a=4,c=3,b=3
) is the composite of
“STA abs
” ($8D
, a=4,c=1,b=3
) and
“STX abs
” ($8E
, a=4,c=2,b=3
).
E.g.,
“LAX X,ind
” ($A3
, a=5,c=3,b=0
) is the composite of
“LDA X,ind
” ($A1
, a=5,c=1,b=0
) and
“LDX imm
”
($A2
, a=5,c=2,b=0
).
The “Magic” Constant
Let’s have a closer look at the two highly unstable instructions “ANE
” (XAA
) and “LXA
” (LAX immediate
) involving a “magic constant” — typically $00
, $FF
, $EE
, etc. —, which are both combinations of an accumulator operation and an inter-register transfer between the accumulator and the X register:
$8B (a=4,c=3,b=2): ANE imm = STA imm (NOP) + TXA (A OR CONST) AND X AND oper -> A $AB (a=5,c=3,b=2): LXA imm = LDA imm + TAX (A OR CONST) AND oper -> A -> X
In the case of “ANE
”, the contents of the accumulator is put on the internal
data lines at the same time as the contents of the X-register, while there's
also the operand read for the immediate operation, with the result
transferred to the accumulator.
In the case of “LXA
”, the immediate operand and the contents of the accumulator
are competing for the imput lines, while the result will be transferred to
both the accumulator and the X register.
The outcome of these competing, noisy conditions depends on the production
series of the chip, and maybe even on environmental conditions. This effects
in an OR
-ing of the accumulator with the “magic constant” combined with an
AND
-ing of the competing inputs. The final transfer to the target register(s)
then seems to work as may be expected.
(We may note that all the instructions involved in these two opcodes complete in 2 cycles, the shortest sequence available on the 6502, meaning, everything is virtually happening “at once”.)
This AND
-ing of competing output values suggests that the 6502 is working internally
in active low logic, where all data lines are first set to high and then
cleared for any zero bits. This also suggests that the “magic constant” stands
merely for a partial transfer of the contents of the accumulator.
(Mind that this is not a qualified statement about the internals of the 6502 hardware, but merely an observation on its external effects.)
Much of this also applies to “TAS
” (XAS
, SHS
), $9B
, but here the extra cycles
for indexed addressing seem to contribute to the conflict being resolved
without this “magic constant”. However, “TAS
” is still unstable.
The ‘H+1’ Group
There are four instructions, which add the peculiar term ‘high-byte of provided address + 1’ to the equation. These are:
SHA (AHX, AXA) A AND X AND (H+1) -> M $9F SHA abs,Y (5) SHX (A11, SXA, XAS) X AND (H+1) -> M $9E SHX abs,Y (5) SHY (A11, SYA, SAY) Y AND (H+1) -> M $9C SHY abs,X (5) TAS (XAS, SHS) A AND X -> SP, A AND X AND (H+1) -> M $9B TAS abs,Y (5)
We may already see, where this comes from: as the calculations for the effective address involves the ALU, a partial result for the high-byte adds to the conflicting output values. However, depending on minor timing discrepancies, this term may be also dropped (meaning, become overriden).
We may also discern, why the effective high-address may be replaced by the ouput value altogether, in case a page boundary is crossed, since this provides just the extra amount of timing required to allow the output value to stabilize and to override the address high-byte. Again, these instructions are unstable.
The Outliers
We may note that “SHY
” and “SHX
” are not part of the c=3
group, but rather the unimplemented instructions “STY abs,X
” (c=0
) and “STX abs,Y
” (c=2
) respectively. Both are apparently falling back to the implementation of “STA abs,X
” with the extra quirk of the ‘H+1
’ term.
“SHA abs,Y
”, finally, is the composite instruction adhering to the c=3
rule that we have already established, executing “STA abs,X
” and “SHX abs,Y
” at once. (Notably, this flips the address mode to “abs,Y
”, where “abs,X
” may be expected. Which suggests that this adjustment for indexed instructions concerning any X register transfers is implemented as an additional stage.)
“SHA ind,Y
” ($93
), however, is the composite of “STA ind,Y
” ($91
) and “SHX ind,Y
” ($92
), which JAM
s on its own.
It’s a rather interesting decision by the MOS Technology design team around Chuck Peddle to not implement the instructions “STX abs,Y
” and “STY abs,X
”, while the instruction decoding would have easily provided for this.
Was this just to keep the instruction set simple by arbitrarily limiting what could be done with the X and Y registers? Or was there a more serious conflict, like with the mechanism identifying flag operations, which may be responsible for the slots at (c=0/b=5)
and (c=0/b=7)
typically found empty, thus making the implementation of “STY abs,X
” rather expensive, for which “STX abs,Y
” was dropped, as well? — We may presume it might be about the access of the X and Y registers using the same internal data lanes, but this is contradicted by the very existence of “SHX
” and “SHY
”, which successfully access both registers, while at subsequent stages.
It could be simply a carry-over from the Motorola 6800 processor — from which the 6502 originated as a simplified, cost-reduced version —, which had just a single index register and thus lacked such an option anyway.
Mysterious NOPs
As mentioned earlier, we are able to figure out, what most of the NOP
s and JAM
instructions are, just from their disposition on the layout. But there is a group of 12 NOP
instructions (all at a=0
and c≤3
and odd values of b
), which seem to be truly empty slots. Namely these are the instructions at:
$04 (a=0, c=0, b=1) $0C (a=0, c=0, b=3) $14 (a=0, c=0, b=5) $1C (a=0, c=0, b=7) $34 (a=1, c=0, b=5) $3C (a=1, c=0, b=7) $44 (a=2, c=0, b=1) $54 (a=2, c=0, b=5) $5C (a=2, c=0, b=7) $64 (a=3, c=0, b=1) $74 (a=3, c=0, b=5) $7C (a=3, c=0, b=7)
From their very position on the instruction layout, we may infer that these should be instructions internal to the CPU. Typically, instructions at (a=0/c=0
) have a counterpart at (a=1/c=0
) in the repective b
position, as is also true for (a=2/c=0
) and (a=3/c=0
). E.g., PHP
& PLP
, BPL
& BMI
, CLC
& SEC
, and so on.
Here, however, the counterparts are missing, as well. (Only $04
and $0C
have a counterpart in “BIT
”, but we may have a hard time figuring out, what the counterpart of “BIT
” may actually be.) For all we know, these instructions are simply unimplemented, and it’s a small wonder that the timing sequence for these instructions does resolve without a JAM
. But these instructions are still interesting, as they direct our attention towards how the internal instructions which are implemented are systematically arranged on the decoding matrix.
The same pattern, BTW, may be observed for most instructions, so that we may think of even and consecutive odd values of a
and same values for c
and b
as “opposing” or “complementary” slots, where we find in one slot the store instruction for a given register in and the other one the load instruction, both in the address mode defined by b
, or a shift in one direction and the opposing shift in the other direction.
(U)SBC
Here, we also find an answer to the nagging question why the instruction for subtraction, “SBC
”, isn’t found among the other ALU instructions, e.g., near its reverse operation “ADC
”, but amid the the compare instructions. Now, “CMP
” is simply the same as “SBC
”, but without the final transfer of the result to the accumulator. So it makes sense to have “SBC abs
” as the counterpart or complementary instruction to “CMP abs
”, etc., once with the final transfer, and once without. However, unlike addition, compare instructions address various registers and not just the accumulator, thus claiming a considerable section of the instruction table. Therefor we find the arithmetic instruction “SBC
”, $E9
, “displaced” among the register instructions, rather than the other way round.
Which provides SBC
’s second incarnation, in combination with the official “NOP
” instruction at $EA
(which is the nonsensical instruction “INC impl
”), “USBC
”, $EB
.
Mind that the instructions at a = 6
and a = 7
, occupying the bottom part of our third table, typically involve ALU operations in combination with various registers and/or memory locations.
Conclusions
What we have observed here is really a text-book example of undefined behavior for undefined input patterns. For any instruction with the two least significant bits set at once (c=3
) the two instructions in the respective slot with c=1
and c=2
are started in parallel, asynchronous threads with competing output values AND
-ed. Minor implementation details and environmental factors may contribute to the outcome of some of these instructions and how the timing eventually stabilizes.
Notably, there are no NOP
s or jamming instructions at c=3
, meaning, it doesn't matter, if any of the two threads JAM
s, if the timing for one of them resolves successfully (thus advancing the internal phase).
At c=0
, c=1
and c=2
we find either undocumented instructions with ineffective address modes, or undocumented instructions that fail entirely over unresolved timing issues, resulting in a “JAM
”. There are just two exceptions to this rule, namely “SHY
” and “SHX
”, which, while unstable, may be somewhat usable.
So is any of this intentional? Hardly. It’s just undefined behavior. Orderly chaos as provided by the decoding matrix. However, we may learn some from this about the internals of 6502 and its various close cousins. — Which is at least some.
Mind that there is much more competent commentary on the 6502, which is based on analysis of the actual hardware, especially at visual6502.org. But, maybe, you found this “hermeneutic” approach, trying to reveal the systematic aspects of what may be observed externally, interesting, as well.
PS: All the tables in this post are SVG images. You may download and use them (mind the “open in a new tab” links), but please give reference to https://www.masswerk.at/6502/6502_instruction_set.html, where you can find the original tables.
Norbert Landsteiner,
Vienna, 2021-06-05