virtual 6502 / Assembler

src: object code:
load
i
listing:

Instructions

Description

This is a simple 2-pass MOS 6502 assembler optimized for compatibility and to accept a broad variety of syntax styles. The general idea is that it should work with any code of somewhat conventional and/or sane format.

A first pass determines instruction lengths and addresses, while the second pass resolves final values and generates the object code (the machine language program).
In the listing, the first pass shows how the assembler "sees" the source, while the second pass represents what the assembler actually resolves and lists this in a normalized format, which is close to the original MOS notation.

A special "BBC Micro mode" activates additional features to provide compatibility with the syntax used for embedded assembler code in BBC BASIC, while it also applies some restriction to the general format.

Basic Syntax

The assembler supports common 6502 assembler syntax styles. Mind that there must be a seperating white space between labels, opcodes, and any operands. Operands, on the other hand, must not contain any white space. Operands may be simple numeric values, defined symbols, instruction labels, or complex expressions. Compare the 6502 Instruction Set for instruction details and addressing modes.

Here, we use "HHLL" to represent a word-sized 16-bit operand, "LL" for a single-byte addresses, and "BB" for any other byte-sized operands. (In actuality, these may be any simple or complex expressions.)

CLC
immediate, no operand.
ROR A
instruction with accumulator as the operand.
ROR
same as above. "A" is optional and may be omitted.
LDA #BB
immediate mode, loading the literal value.
LDA HHLL
absolute, loads the value from the provided memory address.
LDA HHLL,X
absolute, X-indexed.
LDA HHLL,Y
absolute, Y-indexed.
LDA LL
zero-page address mode (with automatic address mode detection).
LDA *LL
forced zero-page address mode in the style of the original MOS assembler.
LDA.b LL
forced zero-page address mode, modern byte-size notation.
LDA.byte LL
as above.
LDA.w LL
forced absolute address mode, modern word-size notation.
LDA.word LL
as above.
LDA LL,X
zero-page, X-indexed.
LDA LL,Y
zero-page, Y-indexed.
LDA (LL,X)
X-indexed, indirect.
LDA (LL),Y
indirect, Y-indexed.
LDA (LL)Y
indirect, Y-indexed, old MOS format (no comma).
JMP (HHLL)
indirect address.
BEQ HHLL
relative addresses (-127 ≤ offset ≤ +127) are computed from absolute target addresses.
;comment
comments start with a semicolon and extend to the end of the line.
\comment
in BBC mode, comments inside assembler blocks start with a backslash and extend to the end of the line or to the next colon (":").

Further, in BBC mode, immediate addressing may be denoted by a "@" prefix as in Acorn Atom BASIC:

LDA @BB
alternative immediate adressing mode in Acorn Atom syntax (BBC mode only).

The assembler is generally case-insensitive, with the exception of strings and character literals.

Generally, there may be just a single instruction on each line. (In BBC mode, however, there may be multiple instructions on a line, separated by the BASIC colon oporator.)

Values and Numeric Representations

The assembler supports a variety of number formats:

$12EF
hexadecimal [0-9A-F].
&12EF
hexadecimal.
0x12EF
hexadecimal.
1289
decimal [0-9].
0d1289
decimal.
@1267
octal [0-7].
0o1267
octal.
01267
octal.
%1010101
binary [01].
0b1010101
binary.
'A
character value of "A" ($41 in ASCII)

(In BBC mode, "$" is reserved for the BASIC indirection operator and "&" denotes any hex-numbers.)

Value Expressions

Anywhere a value mayn occur this may be a complex expression as well. Expressions may include addition, subtraction, multiplication, divisions, and unary minus (+-*/ and -).

There are also the two special unary byte operators "<" and ">":

<$12EF
low-byte value ($EF).
>$12EF
high-byte value ($12).

Please mind that "<" and ">" are just mathematical operators, like "+" and "", and do not imply anything regarding address modes (which are are still to be defined by a prefix, if applicable.)

Expressions are evaluated strictly from left to right, without precedence, but may be grouped using round or square brackets ((…), […]).
The use of square brackets is recommended, though, as round brackets can be ambiguous in the context of certain 6502 instructions and their syntax.

1+2
3
2*3
6
1+2*3
9   (1+2 => 3, 3*3 => 9)
1+[2*3]
7   ([2*3] => 6, 1+6 => 7)
1+(2*3)
same as above

Expressions may include defined symbols and instruction labels.
Mind that there must not be any white space inside an expression.

BBC mode adds an emulation of the string related BASIC functions "ASC(<string-expr.>)" and "LENGTH(<string-expr.>)" outside of assembler blocks, and the construct 'ASC"<character>"' inside assembler blocks.

The Program Counter

The program counter (also PC or location counter) represents the memory address of the current instruction. Outside of an instruction, it represents the address, where the next instruction will be inserted. There are several ways to address the program counter:

* = $1234
the asterisk represents the "native" (MOS) format. Assigning to it sets the program counter.
BEQ *+2
the asterisk may be used in expressions as well.
* = *+4 $EA
when assigning to the program counter, an optional second argument specifies a fill-byte to be applied to any gaps. Here, we advance the program counter by 4 locations and fill the gap with NOP instructions ($EA).
P% = P%+2
the BBC BASIC-style symbol "P%" may be used synonymously to the asterisk anywhere the former may occur.
.ORG $1234
the more modern-style directive ".ORG" may be used for setting PC, as well. (However, you can't use ist in an expression.)
.ORG = $1234
you may use .ORG in assignment style, as well.
.ORG EQU $1234
generally, "EQU" may be used as the assignment operator, as well.
(Mind that there must be white-space around "EQU" in order for it to be recognized as a token, which is not a requirement with "=".)
.RORG $1234
synonym to ".ORG" (in many assemblers you are not allowed alter the origin set by ".ORG" and this is meant to provide compatibility.)
BEQ .+2
in expressions, a dot (.) may be used synonymously for the asterisk. However, you can not assign to it. (Strictly speaking, this is local context, but, while the assembler doesn't implement macros, it's the same anyway.)

In order to ensure compatibility, in BBC mode only "P%" is available.

Labels and Symbols

Instruction labels and defined symbols start with a letter character or underscore and may contain, letters, digits, or the underscore. Only the first 12 characters are significant. (In BBC mode, a symbol may optionally end with a percent sign, "%", like a BASIC integer variable.)

Instruction labels may precede an instruction or may be the only entity on a line. They may be optionally end in a trailing colon. Labels may be used anywhere in an expression:

LOOP LDA A,X
declares the instruction label LOOP.
LOOP: LDA A,X
labels may end in a colon (optional).
BEQ LOOP
using a label as an address value.
.LOOP LDA,X
In BBC mode, labels are declared with a preceding dot. However, there is no dot, when used as a value (compare the above example).

Optional "@" prefix for further compatibility (not applicable in BBC mode):

@LOOP LDA,X
Unless in BBC mode, labels may be declared an optional "@" prefix.
@LOOP: LDA,X
Same as above, but using a trailing colon.
BNE @LOOP
Unless in BBC mode, labels may be referred to using an optional "@" prefix.

Symbols are declared by an assignment and may be used as values anywhere. (In BBC mode, you may not declare a symbol/variable inside an assembler block.)

TEST = $2000
declares the symbol TEST.
TEST EQU $2000
EQU may be used synonymously.
C = *+[TEST*2]
assignments may be complex expressions.

Mind that — like with most assemblers — you may not redefine or reuse any symbols or labels per default. However, you may change this behavior by setting option "REDEF" (see below). In BBC mode, you may reassign symbols (BASIC variables) outside of an assembler block out of the box.

Note on hexadecimal values and automatic zero-page mode

Any numeric values provided by at least 4 hexadecimal digits, where the two leading digits are zeros, will be considered to be of word-size and will effect absolute address modes, when used in ambiguuos context. This "word-size tainting" also propagates to expressions and assignments. (E.g., defining the symbol "C" by "C = 0x0002" and using this in "LDA C+2" will result in a word-sized, absolute instruction, while the effective value is well inside single-byte range. Defining C as "0x02", on the other hand, would have resulted in a zero-page address mode instruction.)
If a label or symbol yet undefined is encountered in a value expression in pass #1, a word-size format will be automatically assumed and addresses will be reserved accordingly. If it is still undefined in pass #2, an error will be thrown. (In assignments to the program counter, however, an expression must resolve in pass #1 already, otherwise the assembly fails.)

Anonymous (Temporary) Labels

The assembler also supports anonymous labels for temporary branch and jump targets:
Just mark an instruction by "!" or ":" (empty label) and refer to this mark by either "!+" (or ":+") for the next anonymous label as a target or by "!-" (or ":-") for the previous one. You may refer to a target further away by repeating "+" or "-".  E.g., "BNE !--" branches to the second anonymous label before the insertion point. Mind that this counts anonymous labels and not addresses.

Example:

! START  LDA #0        ;first anonymous label
                       ;anonymous labels may precede a normal label
         LDX #0
!                      ;just mark this address
:        STA $1000,X   ;third label (same address), we may use ":" as well
         INX
         BNE !-        ;select the closest previous anonymous label
         JMP :---      ;jump back 3 anonymous labels (same as START)
                       ;again, ":" and "!" are synonymous

This will assemble to (with anonymous labels listed in a column of their own):

LOC   CODE         LABEL     INSTRUCTION

0800  A9 00      ! START     LDA #$00
0802  A2 00                  LDX #$00
0804             !
0804  9D 00 10   !           STA $1000,X
0807  E8                     INX
0808  D0 FA                  BNE $0804
080A  4C 00 08               JMP $0800

Restrictions:
This feature is not available in BBC mode and is only supported for branch instructions and absolute jump targets. An anonymous target must be the sole operand and cannot be used in an arithmetic expression.

Note: Anonymous labels are not listed in symbol tables.

Pragmas and Directives

Pragmas and directives start generally with a dot.

Directives for embedding data:

.BYTE 1, $02
embeds a single byte or a list of bytes at the current location. Lists are sperated by white-space and/or commas. (An optional "#", preceding any values, is ignored.) Values may be complex expressions, as well.
.DBYTE $12EF
embeds a double byte given in LLHH memory order (little-endian). This inserts the bytes $12 and $EF at the current location. ".DBYTE" takes a list of values, as well.
.WORD $12EF
embeds a word given in HHLL order (human readable, big-endian). This inserts the bytes $EF and $12 at the current location. (Also, use this when using previously defined labels and symbols in an expression.)
Again, values and expressions may be also provided as a list, as well.
.BYT $01
synonym for ".BYTE" as used by some assemblers.
.DB $02
another synonym for ".BYTE" (Define Byte).
.DCB $03
another synonym for ".BYTE" (Define Constant Byte).
.DBYT $12EF
synonym for ".DBYTE" as used by some assemblers.
.TEXT "Abc"
embeds a text literal (case-sensitive) using the current encoding (see below).
.ASCII "Abc"
embeds a text literal (case-sensitive) using ASCII encoding.
.PETSCII "Abc"
embeds a text literal (case-sensitive) using Commodore 8-bit encoding.
.PETSCR "Abc"
embeds a text literal (case-sensitive) as Commodore 8-bit screen codes.
.C64SCR "Abc"
as above (synonym).

Directives for aligning code or filling space:

.ALIGN $100
advances the program counter to the next multiple of the value provided (here, we align to the next memory page). Any gaps will be filled by zero. If no argument is provided ".ALIGN" aligns to the next even memory location.
.ALIGN $100 $EA
an optional second byte may specify a byte value to be used to fill any gaps (here $EA, "NOP", as used by most Commodre 8-bit machines).
.FILL $20 $EA
fill the next n bytes using the value provided by the second argument. If no second argument is providing, zero will be used as the fill-byte.
.REPEAT n
repeats the instruction or directive following this directive on the same line n times. An optional "STEP" parameter defines an increment to be applied to the repeat-counter on each iteration (default 1). The repeat-counter is accessibly as "R%".
E.g.,
.REPEAT 26 .BYTE 'A+R%
will fill the next 26 memory locations with the letters of the alphabet.
ODD_NUMS ;generate list of odd numbers
.REPEAT 5 STEP 2 .BYTE 1+R%
will fill the next 5 memory locations with the odd number series 1,3,5,7,9.

And this will fill the next 6 bytes by the sequence 0x00, 0x00, 0x02, 0x02, 0x04, 0x04:
.REPEAT 3 STEP 2 *=*+2 R% ;PC += 2, fill-byte R%

Other directives:

.END
ends the source code, any remaining text is ignored. (optional)
.NOLIST
switches listing output off (e.g, for data sections. This is also available as an option.)
.LIST
switches listing output on (default, also available as an option).
.SKIP
inserts a blank line in the listing (pass #2). This is mostly for compatibility.
.PAGE
inserts a blank line and a page number in the listing (pass #2). Any comment found at the head of the source code will be used as a title. Again, this is mostly for compatibility.
.DATA
any such directive is ignored in order to ensure compatibility with symbol tables used by the companion disassembler.

Special directives for Commodore BASIC:

.PETSTART
Generates a short BASIC program, consisting of optional REM-lines and a line with a "SYS" command, jumping to the next available address immediately following this BASIC text (which starts at 0x0401, the BASIC start address off the Commodore PET). The program counter will be advanced to this start address automatically.
Without any arguments, just a line with the SYS command will be generated, using the current year as the line number:
.PETSTART
> 2021 SYS 1038
If a first, numeric argument is provided, this will be used as a line number for the line holding the SYS statement:
.PETSTART 10
> 10 SYS 1038
If a string argument is provided, the assembler will generate a heading line with line number "0" and a REM statement using this string. If a list of strings (separated by white-space and optionally commas) is provided, or a string contains a line-break ("\n"), multiple REM lines will be generated:
.PETSTART 2001 "*** a program ***", "(c) example.com"
> 0 REM *** A PROGRAM ***
> 1 REM (c) EXAMPLE.COM
> 2001 SYS 1084
(Mind that lower case letters will appear as upper-case and upper-case letters as graphics characters in standard PETSCII upper-case/graphics mode.)
.C64START
Same as ".PETSTART" (see above), but arranging for a BASIC start address of 0x0801, suitable for the C64.

In BBC mode, directives do not have a preceding dot and are available only inside assembler blocks. The following directives are available in BBC mode (these are actually additions to the original BBC BASIC and were introduced with Level II, 1982 issue):

EQUB &01, &02
inserts a byte value (may be a comma-separated list).
EQUW &12EF
inserts a word or a comma-separated list of words.
EQUD &11223344
inserts a series of 4 bytes, starting at the highest significant one. Here, we insert the byte series 0x11, 0x22, 0x33, 0x44, starting at the current memory location.
EQUS "Abc"
inserts a text literal (case-sensitive). This must be a single item (not a list), but may be a string concatenation, including the BASIC function "CHR$()".
E.g.,
EQUS CHR$(34)+"foo"+"bar"+CHR$(34)
will insert the character sequence '"foobar"'.
ALIGN
in BBC mode "ALIGN" does not take any arguments and aligns the location counter "P%" to the next multiple of 4. Any gaps are filled with 0xFF. (This is an even later addition to the standard.)

Options

Options are a special set of directives switching the behavior of the assembler. Like other pragmas, they start with a dot (.).

.OPT WORDA
switches automatic zero-page detection for address modes off. All addresses default to word-size and zero-page address modes must be specified manually by a leading asterisk ("*") or the byte extension (".b"). Use this for fine grain control and/or compatibility with old sources.
.OPT ZPGA
switches automatic zero-page detection to on (default).
.OPT ZPA
synonym to option "ZPGA".
.OPT ILLEGALS
enables support for “illegal” op-codes (see below).
.OPT LEGALS
disables support for “illegal” op-codes (default).
.OPT NOILLEGALS
synonym to option "LEGALS".
.OPT REDEF
allows symbols and labels to be redefined / reused.
.OPT NOREDEF
reuse of symbols is not allowed and will throw an error (default).
.OPT ASCII
set character encoding for .TEXT-directives and character literals to ASCII (default).
.OPT PETSCII
set the default character encoding to PETSCII.
.OPT PETSCR
set the default character encoding to Coomodore 8-bit screen characters.
.OPT C64SCR
synonym to option "PETSCR".
.OPT NOLIST
switches listing output off.
.OPT LIST
witches listing output on (default).

Further, the following options (mostly used by MOS assemblers) are recognized for compatibility, but are otherwise ignored: XREF, NOXREF, COUNT, NOCOUNT, CNT, NOCNT, MEMORY, NOMEMORY, GENERATE, NOGENERATE.

In BBC mode, options evaluating to numeric values (BBC BASIC reprorting levels) are ignored. Other, you may use any of the above options, but whitout the leading dot (e.g., "OPT ILLEGALS").

Compatibility

This assembler is all about a quick assembly session without worrying too much about the specific syntax (starting with the format of the very first MOS cross-assembler and extending to more modern styles). As long as you do not require macros or conditional assembly, you should be able to throw about any style of source code at it.

E.g., the following examples are semantically identical and produce the same object code:

;MOS/traditional

* = $4000
TARGET = $20

       LDY *$20
LOOP   LDA $0080,Y
       ROL A
       STA (TARGET)Y
       DEY
       BNE LOOP
       RTS
.END
;modern style

.ORG 0x4000
TARGET EQU 0xC0

       LDY.b 0x20
LOOP:  LDA.w 0x80,Y
       ROL
       STA (TARGET),Y
       DEY
       BNE LOOP
       RTS
.END
\BBC Micro mode

P% = &4000
TARGET = &C0
[
       LDY &20
.LOOP  LDA &80,Y
       ROL
       STA (TARGET),Y
       DEY:BNE LOOP
       RTS
]
END

Processing Example

Here is an example for a complete assembly of a short source: [ try it ]

Source code:

;fill a page with bytes,
;preserve program

*=$800

start
      ldx #offset
loop  txa
      sta start,x
      inx
      bne loop
      brk

;insert bytes here
offset=*-start
.end



Object code:

0800: A2 0A 8A 9D 00 08 E8 D0
0808: F9 00



Note:
Press "Show Memory" and activate "live update" in the emulator.
Listing:

pass 1

LINE  LOC          LABEL     PICT

   1               ;fill a page with bytes,
   2               ;preserve program

   4  0800                   * = $800
   6  0800         START
   7  0800                   LDX #OFFSET
   8  0802         LOOP      TXA
   9  0803                   STA START,X
  10  0806                   INX
  11  0807                   BNE LOOP
  12  0809                   BRK
  14                         ;insert bytes here
  15                         OFFSET = *-START
  16                         .END

symbols
 LOOP       $0802
 OFFSET       $0A
 START      $0800

pass 2

LOC   CODE         LABEL     INSTRUCTION

                   ;fill a page with bytes,
                   ;preserve program

0800                         * = $0800
0800               START
0800  A2 0A                  LDX #$0A
0802  8A           LOOP      TXA
0803  9D 00 08               STA $0800,X
0806  E8                     INX
0807  D0 F9                  BNE $0802
0809  00                     BRK
                             ;insert bytes here
                             OFFSET = $0A
                             .END

done (code: 0800..0809).

About BBC Micro Compatibility Mode

The BBC Micro computers had a quite powerful and versatile implementation of BASIC (BBC BASIC), which in turn built on the BASIC implementation of the Acorn Atom computer. One of its features was a built-in assembler, allowing users to embed assembler code directly in their BASIC sources. "BBC Micro mode" (enabled by the equally named checkbox) adds basic compatibility for the syntax of this embedded assembler of BBC BASIC.

This is not an attempt at a complete or strict emulation. You may even use some of the features of standard mode, where these do not conflict with the BBC syntax, e.g., "0x..." style number formats or character literals like "'A". Moreover, you may use the assembler even without assembly explicit blocks ("[…]") for a quick session. (However, some features, like assembler directives, are restricted to assembler blocks.)

BASIC line numbers are recognized, but simply ignored and are not evaluated for sequence. Similarly, FOR-NEXT loops are ignored (the loop variable will be registered, but will always evaluate to zero), as are DIM, CALL, DEF, ENDPROC, and PRINT statements. Options evaluating to a numeric argument (as used for reporting levels) are generally ignored, but will list a warning. REM statements are allowed outside assembler blocks and the colon (":") may be used as a statement/instruction separator anywhere in the code. (As opposed to standard mode, there may be multiple statements or instructions on a line.)

There is also support for BBC BASIC indirection operators, BBC BASIC's way of accessing memory, outside of assembler blocks. These are "?M" (query), "$M" (dollar), "!M" (pling) with their variations "M?N" and "M$N" for the left-hand side of assignements. These will be mostly used with "P%" for the purpose of embedding data. "$P%" is also supported for the right-hand side, including an emulation of the BASIC function "LENGTH($P%)".

There is basic support for string concatenations and the string related BASIC functions "ASC()", "CHR$()", "LENGTH()", and "STRING$()". E.g, you may things like,

L = (LEN($P%+"foo"+"bar"+CHR$(33))+4)*2+ASC("A")

(These string and expressions related features are implemented as preprocessors. While errors will be reported in detail, the listing of a successful assembly will onyl include values as returned by these preprocessors.)

Besides the support of BASIC indirection operators, there's also support for the assembler directives used in BBC BASIC Level II for embeding literal values — EQUB, EQUW, EQUD, and EQUS — inside assembler block, as well as for ASC"<char>". (So you may embed any data using your preferred method.)

Moreover, you may reassign any symbols (BASIC variables) outside of an assembler block, even without explicitly setting "OPT REDEF".

Since assembler code was often wrapped in FOR loops for various passes with logging levels passed in OPT statements, these are recognized, as well, but ignored. (A variable defined in a FOR loop will be lexically known, but its value will be zero.) This is also true for a few other BASIC statements often found in the context of an assembler program, like DIM (as used to allocate some space for data), PRINT statements, and DEF and ENDPROC keywords, which may wrap the entire assembler source, as well as any CALL statements for executing the resulting machine code.

However, there are some unsupported features, as well,

Still, there's sufficient support to successfully assemble a source like this:

100 REM THIS IS BASIC
110 P% = &4000 : REM SET LOCATION COUNTER
120 TARGET = &4100 : REM A VARIABLE
130 FOR C=0 TO 2 STEP 2 : REM PASS LOOP (ignored)
140 [OPT C              \now in asm mode
150     LDX #&20        \copy 32 bytes
160 .LOOP  LDA SOURCE,X
170     STA TARGET,X
180     DEX:BNE LOOP    \two in a row!
190 .SOURCE
200 ]NEXT : REM BACK TO BASIC
210 $P%=":-) "+STRING$(28,"*")
220 P%=P%+LEN($P%) : REM UPDATE LOCATION
230 END

Please refer to a manual for the BBC Micro for further details.

Illegal Opcodes

Support for "illegal" opcodes (undefined instructions) is enabled by the pragma ".OPT ILLEGALS".

The following mnemonics are implemented (supported synonyms in parentheses):

opc (synonyms) imp imm abs abX abY zpg zpX zpY inX inY
 
ALR (ASR) | 4B |
ANC | 0B |
ANC2 | 2B |
ANE (XAA) | 8B |
ARR | 6B |
DCP (DCM) | CF DF DB C7 D7 C3 D3 |
ISC (ISB, INS) | EF FF FB E7 F7 E3 F3 |
LAS (LAR, LAE) | BB |
LAX (ATX) | AB AF BF A7 B7 A3 B3 |
LXA (LAX imm) | AB |
RLA | 2F 3F 3B 27 37 23 33 |
RRA | 6F 7F 7B 67 77 63 73 |
SAX (AXS, AAX) | 8F 87 97 83 |
SBX | CB |
SHA (AXA, AHX) | 9F 93 |
SHX | 9E |
SHY (SAY, SYA) | 9C |
SLO (ASO) | 0F 1F 1B 07 17 03 13 |
SRE (LSE) | 4F 5F 5B 47 57 43 53 |
TAS (SHS, XAS) | 9B |
USBC | EB |
NOP | EA 80 0C 1C 04 14 |
DOP (SKB) | 80 04 14 |
TOP (SKW) | 0C 1C |
JAM (HLT, KIL) | 02 |

Notes:

Disclaimer

This application is provided for free and AS IS, therefore without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. Use at own risk.

This application uses either Web Storage technology or, if this is not available, a cookie to store your choice for the preferred color mode for the virtual 6502 suite of applications. (Either the word "light" or the word "dark" is stored.)

Fun Fact

This assembler evolved from a Commodore BASIC program written in the 1980s, which was ported to HTML/JavaScript in 2005 and has since been substantially enhanced. Extended functionally, BBC Micro compatibility, and new GUI added in 2021.

Other Resources

You may be also interested in…

Data Transfer Using BASIC

An 8-bit machine usually comes with BASIC on board. You may use this to transfer the object code to memory using DATA statements. See this little tool to generate such code from the output of the assembler (copy & paste the object code): bytes2basic.html.

Related Links

© Norbert Landsteiner 2005–2023, mass:werk