An Update to the Virtual 6502 Suite
Improving one of the oldest 6502 tool sets on the web.
The Virtual 6502 suite maybe the oldest 6502 tool set on the web still in existence. It may be also one of the earlier ones, at least, when I was doing it, I hadn’t found another one. Reason enough, to give these venerable web pages a bit of an update treatment.
The Virtual 6502 suite consists of three tools, which are accompanied by a documentation of the 6502 instruction set. As there are:
- the Virtual 6502 MPU Emulator
- the Virtual 6502 Assembler
- the Virtual 6502 Disassembler
- a documentation of the 6502 Instruction Set.
Generally, this is about the JavaScript side of things and the HTML code is still the same as in its 2005 backward compatible glory, including table layout and “ (update: fixed this). The JS code has been moderately modernized and now sports features previously unheard of, like source code encapsulation. Still, while some MSIE 4 related features (regarding missing string methods) have been stripped, the tools should be compatible to most legacy browsers. And, as a bonus, all the apps have recieved nicer buttons and dialogs.bgcolor
” and “link
” attributes
Conversely, we also take care of some of the more annoying features of modern HTML, at least, annoying for our purpose, like automatic translation, spell-checking, auto-correction, auto-capitalize, auto-complete, or telephone number detection (and automatic conversion into links — have you ever felt like ringing up some 6502 machine language code?).
As another general upgrade, all of these tools have received the addition of a file upload button and now support drag & drop operations for their respective code input fields (see below for details).
Virtual 6502 Assembler
Most of the more serious upgrades concern the assembler.
The assembler was written originally with the intent to replicate those of the 1970s and early 1980s, as I couldn’t find any doing so at that time. Thanks to the upgrade, it should now supports about any notation, you may think of. However, it is still centered around early 6502 notation and should be fully compatible to the syntax and formats of the original MOS cross-assembler, used by Commodore and other vendors.
On the I/O side, the assembler now supports drag & drop operations for its source input field, as well as pasting and in-place editing. The output has been improved by skipping intermediate updates of the log, which aren’t shown by modern browsers anyway, which results in a considerable speed-up. Also, there’s now an option for the code output to optionally include addresses for the individual lines of bytes, just like a hex-dump.
Syntax
Generally, the assembler is case-insensitive (with the notable exception of character literals and strings). Expressions (like “$4000+7
”) and macros are still unsuported (which would imply a major rewrite of the principal parsing mechanism), but otherwise, you should be able to throw about anything at it for a quick assembly session:
Addressing
- INX
- implied addressing
- ROL A
- accumulator
- ROL
- accumulator (alternative, implied notation)
- LDA #BB
- immediate
- LDA HHLL
- absolute
- LDA HHLL,X
- absolute, X-indexed
- LDA HHLL,Y
- absolute, Y-indexed
- LDA (LL,X)
- X-indexed, indirect
- LDA (LL),Y
- indirect, Y-indexed
- LDA (LL)Y
- indirect, Y-indexed (compatible format)
- LDA *LL
- zeropage (explicit, see below)
- LDA.B LL
- zeropage / forced byte mode (alternative format, see below)
- LDA.W LL
- forced word mode (default, see below)
- LDA @LL,X
- zeropage, X-indexed (also “
LDA.B LL,X
”) - LDA @LL,Y
- zeropage, Y-indexed (also “
LDA.B LL,Y
”) - JMP (HHLL)
- indirect
- BNE HHLL
- relative — operand is branch target!
- ;comment
- comments are ignored, but are included in the log in a new line.
where
- A
- literal “
A
” (accumulator) - BB
- byte value
- LL
- single byte address (low-byte only)
- HHLL
- word address, high-low notation (little endian)
Generally, there are no regulations regarding white space, but there must be a separating white space between opcodes and operands, while the operands themselves must not contain any white space.
Zeropage Addressing and Addressing Defaults
Where ambiguous, the assembler — just as its older predecessors — defaults to word-size addressing, making explicit zeropage address notion a requirement, either by the address prefix “*
” (as in “LDA *$20
”) or by the opcode extension “.B
” (as in “LDA.b $20
”).
However, this behavior may be switched by the prgama “.OPT ZPGA
”:
Using this, address modes are evaluated automatically depending on the opcode and the value of the address operand. Any byte-sized values ≤ 0xFF
will be regarded as zeropage addresses, with the notable exception of any addresses or labels/identifiers defined by more than two hex-digits (see below). Moreover, using a byte modifier like “<
” or “>
” (see below), will result in zeropage addressing as well.
(Labels must have been defined before being used for this to work and will otherwise default to word-size values, since this is still a two-pass assembler.)
To force word address modes with automatical zeropage detection use the opcode extension “.W
” (as in “LDA.w $20
”). To opt out of automatical zeropage detection entirely, use the prgama “.OPT WORDA
”.
Values
The assembler accepts a quite wide variety of number notations (case-insensitive), as there are:
- $[0-9A-F]
- hexadecimal
- &[0-9A-F]
- hexadecimal (BBC-style)
- @[0-7]
- octal
- 0[0-7]
- octal
- %[10]
- binary
- [1-9][0-9]
- decimal
- 0x[0-9A-F]
- hexadecimal
- 0o[0-7]
- octal
- 0b[10]
- binary
- 'A
- character literal in current encoding (see below), closing quote mark optional
- "A
- as above
- <
- low-byte selector (prefix), e.g., “
LDA #<LABEL1
” - >
- high-byte selector (prefix), e.g., “
LDA #>$C020
”, same as “LDA #$C0
”
Labels and Identifiers
Labels must begin with a letter and may consist of any letters, digits, and the underscore ([A-Z0-9_]
). In order to be consistent with older assemblers, only the first 6 characters are significant.
Labels may be used anywhere, you may use a value or operand, but, when used as an identifier, must be defined by a number literal themselves. E.g.,
- HERE
- as the onyl symbol on a line: the label “
HERE
” is defined with the current value of PC (program counter, i.e. the address of the next instruction) - HERE = *
- same as above, but using an explicit assignment of the value of PC (“
*
”) - LOOP LDA #0
- again, label “
LOOP
” is defined with the current address (value of PC) - LOOP: LDA #0
- optionally, a label declaration may be followed immediately by a colon (“
:
”) - VAL1 = $C000
- symbol “
VAL1
” is defined with value0xC000
- VAL2=$C000
- separating white space is optional
- BNE LOOP
- use anywhere, where a number or address is expected
Mind the difference in the following declarations, when using “.OPT ZPGA
”:
- ADDR1=$0040
- results in word-size addressing, when used as an address
- ADDR2=$40
- results in zeropage addressing, when used as an address
- ADDR3=0x0040
- results in word-size addressing, when used as an address
- ADDR4=0x40
- results in zeropage addressing, when used as an address
Labels and identifiers may be declared only once and any attempts at reassigning will throw an error.
Pragmas and Directives
The assembler supports a few pragmas (assembler directives). A pragma must be the only instruction on a given line.
- * = $C800
- the program counter is set to
0xC800
- .ORG $C800
- same as above, using the “origin” directive
- .ORG = $C800
- a hybrid version is accepted as well
- .RORG $C800
- “
.RORG
” may be used synonymously to “.ORG
” - .END
- end of assembly (optional), rest ignored
- .BYTE $01
- include a byte value
- .BYTE 1, 2
- multiple values may be used for “
.BYTE
” and similar directives and are separated by white space and/or commas (However, with “.BYTE
”, the first value must not be a character literal, as it will be interpreted as a text string.) - .BYTE #<$2021
- values for “
.BYTE
” and similar directives may come with a byte selector (“<
”, “>
”) and/or a literal value prefix (“#
”, optional) - .BYT $01
- compatible version of “
.BYTE
” - .DBYTE $01EF
- include a double-byte (big endian), accepts multiple values
- .DBYT $01EF
- compatible version of “
.DBYTE
” - .WORD $EF01
- include a word (little endian), accepts multiple values
- .TEXT 'Abc'
- include a text string (in current encoding) as a series of character literals, closing quote may be ommitted
- .TEXT "Abc"
- strings may be wrapped in double quotes, as well
- .BYTE 'Abc'
- strings may be included using “
.BYTE
” or “.BYT
”, synonymous to “.TEXT
” - .ASCII 'Abc
- include a string in ASCII encoding
- .PETSCI 'Abc
- include a string in Commodore PETSCII encoding (synonym “
.PETSCII
”) - .PETSRC 'Abc
- include a string in Commodore 8-bit screen character encoding (synonym “
.C64SCR
”) - .OPT <option>
- set an option: see below
- .SKIP
- includes a blank line in the listing (for compatibility only)
- .PAGE
- includes a blank line in the listing (for compatibility only)
The following options (assembler modes) may be set with the pragma “.OPT
”:
- .OPT ZPGA
- use automatic zeropage detection for determining address modes
- .OPT WORDA
- reset to word addressing default
- .OPT ASCII
- use ASCII encoding for strings and character literals (default)
- .OPT PETSCI
- or “
.OPT PETSCII
”, use PETSCII encoding for strings and character literals - .OPT PETSCR
- or “
.OPT C64SCR
”, use Commodore 8-bit screen codes for strings and character literals
The following options are allowed for compatibility, but are otherwise ignored: XREF
, NOXREF
, COUNT
, NOCOUNT
, CNT
, NOCNT
, LIST
, NOLIST
, MEMORY
, NOMEMORY
, GENERATE
, NOGENERATE
.
Old, But Still of Value
Thanks to the extended compatibility, the assembler should be still of value, as it’s pretty much syntax agnostic as far as simple assembler code is concerned. (You may have to set “.OPT ZPGA
” or may have to insert a leading dot to a directive, but this is pretty much all that may be required to assemble some random source code.)
E.g., both of the following source codes are valid and produce identical results:
;traditional * = $C000 LDY *$20 LOOP LDA $80,Y ROL A STA ($C0)Y DEY BNE LOOP RTS .END
;modern style .org 0xC000 ldy.b 0x20 loop: lda.w 0x80,y rol sta (0xC0),y dey bne loop rts .end
So it should be still a convenient tool for a quick editing session or to give some legacy source code a try.
Virtual 6502 Disassembler
The disassembler is mostly as it has been before, but has mostly improved by the speed-up of the output.
Further, the output is now more similar to the native format of the assembler: just check the new "show assembler code only" option (visible, once you’ve generated an output) and you’re ready to copy and paste the code over to the assembler. (That is, you may have to add “.OPT ZPGA
” or deal with zeropage addressing otherwise, compare the notes on the assembler above.)
Of course, this also works the other way round and the dissassembler accepts any of the output formats generated by the assembler.
Generally speaking, the disassembler accepts the following input:
- Binary files (via drag & drop or via the fiel upload button) or hex-dumps (conforming the rules below).
- A stream of hex-digits, like “
A2008A9D0004A901…
”.
White space may be included at any byte boundary. - A stream of white space separated hex-digits (up to a pair per byte), as in
“A2 00 8A 9D 00 04 A9 01…
”.
As any non-hex-characters are ignored, this also includes comma-sspearated text, as in:
“A2, 00, 8A, 9D, 00, 04, A9, 01…
”
Again, any white space may occure at a byte boundary. - Any characters in an input line including and following a semicolon (
;
) are ignored as comments. - As with many hex-dumps, a line may start with a line number or address either prefixed or suffixed by a colon (
:
) and such a line-prefix will be ignored. E.g.,
“:C000 A2 00 8A 9D 00 04 A9 01 ;first line
”, or
“C008: 9D 00 D8 E8 D0 F4 60 00 ;second line
”.
(You may see how this works with monitors and hex-dumps with addresses at the beginning of a line and any ASCII-literals as a comment.)
Virtual 6502 Emulator
As the disassembler, the emulator is mostly the same. It now accepts the same input methods and formats as already described for the disassembler. Like the disassembler, it now supports binary file by drag & drop operations or by file upload via the newly included file upload button (no data is sent to the server, everything is processed inside the browser). There are now nicer buttons and dialogs, and there are also some minor improvements to the run-cycle.
Moreover, the ROM-loading mechanism has been revised and there’s now an additional option to load the ROMs of the PET 2001 (rev. 2.0).
Update:
The emulator also received a watchdog to check breakpoints and other conditions. And there’s now a sparate, protected ROM space.
6502 Instruction Set Sheet
The documentation for the instruction set has been updated a few months earlier already and features modern, semantic markup and accessibility support.
That’s it. :-)
As with any major update, in spite of extensive testing, errors may have made their way into the code. This is especially true for the assembler, where the extended compatibility added a few layers of complexity. Also, some error messages issued by the assembler may not be as selective or as informative as they used to be. So, if you come accross an error or issue, don’t hesitate to contact me — bug reports are welcome!
BTW, while speaking of updates, since the number of posts is steadily growing (55 and counting), the blog has received a new feature, as well, namely a new list mode, providing a compact overview over the entire list of posts.
Norbert Landsteiner,
Vienna, 2021-05-20