Guide to the 6502 Assembler of the PET 2001 Emulator

General Description
Theory of Operation
Basic Syntax
Value Expressions
The Program Counter
Relative Offset Literals
Labels and Symbols
Anonymous (Temporary) Labels
Pragmas and Directives
Options
Compatibility
Illegal Opcodes
Combining Assembler with BASIC
Rationale — General Considerations
Some Useful Addresses

General Description

The PET 2001 Emulator features a built-in assembler that is modelled closely after the original MOS 6502 assembler from the 1970s, but has been extended to support a variety of common syntax flavors. Still, it should accept and process sucessfully any original MOS source. It is based on the assembler found here and adjusted for the needs and requirements of the PET. Moreover, it provides facilities to compile stand-alone programs that can be loaded, started, and distributed on their own (see below, there are also a few sample files to start playing around with).

Please mind that that this is still a relatively simple assembler, which cames with a few restrictions and limitations:

There is no support for macros or conditional assembly.
As the principal lexer is based on symbols, there must be some separating white space between symbols, but there must be no white-space in any expressions. Otherwise, they will not be recognized as symbols or operand values.
As the assembler uses its own encoding engine, there is (currently) no support for PETSCII markups as they are supported for BASIC sources.

Still, it should be good for some quick experiments, without having to worry about another syntax flavor. When it comes to complex projects, you will probably prefer an assembler and/or IDE of your own choice anway.

The assembler is fully compatible with the 6502 online assembler found at www.masswerk.at/6502/assembler.html and any source code generated by the associated disassembler. However, there is no support for external symbol tables. Please provide any symbol definitions in your source. (Some useful definitions can be found at the very end of this page.)

A general description of 6502 opcodes and what these do can be found at here: 6502 Instruction Set.

If you are already familiar with this particular assembler, there are just a few changes:

Encoding formats have been adjusted for the PET, so there are really just two formats, PETSCII and screen codes. While other encoding options are still present, they always refer to either PETSCII or PET screen codes.
There are additional, synonymous pragmas and options for PET screen encoding, namely SCR and SCREEN (since any sources apply to the PET only.)
Similarly, there is .BASICSTART as a maybe more memorable and telling synonym for .PETSTART.
Any special cases related to the BBC Micro have been stripped.

Operations

The assembler is invoked by either mounting an assembler source via the mount dialog or by dropping an assembler source on the emulated screen. An assembler source is a text file with a file name extension of either ".asm", ".a65", ".a", ".src", or just ".s".

(The assembler may be also invoked from within a BASIC source, which is covered below in the section on integration with BASIC.)

The assembler is a simple 2-pass assembler, where a first pass determines instruction lengths and addresses, while the second pass resolves final values and generates the object code (the machine language program).
As the assembler has processed the suplied source, it will present a listing in a special dialog. In this listing, the first pass shows how the assembler "sees" the source, while the second pass represents what the assembler actually resolves and lists this in a normalized format, which is close to the original MOS notation.

The assembler will always fail on the first error, since — it’s the 1970s! :-)

If the assembly succeeds, you will be presented with some options below the listing, regarding how to proceed:

Options provided by the PET 2001 emulator for a successful assembly. — Options provided for a successful assembly.

“Load Code As Program” will load the resulting code just as if it were a binary PRG file. The associated checkbox gives a choice, whether or not to reset the virtual machine before loading the program. (Somewhat self-referentially, any other information or state of the emulated PET will be lost, if you choose to reset.)
“Inject Code Into Memory” will load the resulting code directly into memory and leave the machine in the same state as it is now. The associated checkbox provides an option to additionally adjust the BASIC system pointers to the start and end of the code. Mind that any variables will be lost, when doing so. (This will option will be disabled, if the emulator detects that the object code is outside the range of safe user RAM, from 0x041 to the top of RAM.)
“Export Code as PRG File” does exactly what it suggest, namely generating a link with the object code as a binary file that you can download to your local computer. For this, it will automatically prepend the start address as the two first bytes of this file, as it is required by the PRG file format common to all Commodore 8/bit machines.
Finally, “Do Nothing” does exactly this, namely closing the dialog without further action. Maybe you just wanted to test something or review the assembly process?

To further interact with your code, you may refer to the debugger of the emulator, available by its icon on the top right of the window. Unlock the lock icon to the right of the register display to edit registers and flags. (The debugger is only interactive when the emulated PET is halted.)

Mind that the PET also comes with a built-in machine monitor in ROM, which is normally hooked to the interrupt vector (IRQ). Therefore, executing any zero-byte (a BRK in 6502 machine code) will invoke the monitor. Such a zero-byte should be always present at address 0x400, just before the start of any BASIC text, and can be called by the memorable BASIC command "SYS 1024".

The built-in machine monitor of the PET 2001 (ROM2 and ROM4). — The built-in machine monitor of the PET 2001.
$E62E (IRQ) points to machine monitor in ROM2/3,
$E455 to the monitor as it is implemented in ROM4.

Basic Syntax

The assembler supports a variety of common 6502 assembler syntax styles. Mind that there must be a seperating white space between labels, opcodes, and any operands. Operands, on the other hand, must not contain any white space. Operands may be simple numeric values, defined symbols, labels, or complex expressions.
Compare the 6502 Instruction Set for instruction details and addressing modes.

Here, we use "HHLL" to represent a word-sized 16-bit operand, "LL" for a single-byte addresses, and "BB" for any other byte-sized operands. (In actuality, these may be any simple or complex expressions.)

CLC: immediate, no operand.
ROR A: instruction with accumulator as the operand.
ROR: same as above. "A" is optional and may be omitted.
LDA #BB: immediate mode, loading the literal value.
LDA HHLL: absolute, loads the value from the provided memory address.
LDA HHLL,X: absolute, X-indexed.
LDA HHLL,Y: absolute, Y-indexed.
LDA LL: zero-page address mode (with automatic address mode detection).
LDA *LL: forced zero-page address mode in the style of the original MOS assembler.
LDA.b LL: forced zero-page address mode, modern byte-size notation.
LDA.w LL: forced absolute address mode, modern word-size notation.
LDA LL,X: zero-page, X-indexed.
LDA LL,Y: zero-page, Y-indexed.
LDA (LL,X): X-indexed, indirect.
LDA (LL),Y: indirect, Y-indexed.
LDA (LL)Y: indirect, Y-indexed, old MOS format (no comma).
JMP (HHLL): indirect address.
BEQ HHLL: relative addresses (-127 ≤ offset ≤ +127) are computed from absolute target addresses.

Supported synonyms:

LDA.byte LL: forced zero-page address mode.
LDA.by LL: as above.
LDA.word LL: forced absolute address mode, word-size.
LDA.wo LL: as above.
LDA+1 LL: forced zero-page address mode (like ACME assembler).
LDA+2 LL: forced absolute address mode, word-size (like ACME assembler).

Comments:

;comment: comments start with a semicolon and extend to the end of the line.

The assembler is generally case-insensitive, with the exception of strings and character literals.

Generally, there may be just a single instruction on each line.

Values and Numeric Representations

The assembler supports a variety of number formats:

$12EF: hexadecimal [0-9A-F].
&12EF: hexadecimal.
0x12EF: hexadecimal.
1289: decimal [0-9].
0d1289: decimal.
@1267: octal [0-7].
0o1267: octal.
01267: octal.
%1010101: binary [01].
0b1010101: binary.
'A: character value of "A" ($41 in ASCII)

Value Expressions

Anywhere a value mayn occur this may be a complex expression as well. Expressions may include addition, subtraction, multiplication, divisions, and unary minus (+-*/ and -).

There are also the two special unary byte operators "<" ">":

<$12EF: low-byte value ($EF).
>$12EF: high-byte value ($12).

Expressions are evaluated strictly from left to right, without precedence, but may be grouped using round or square brackets ((…), […]).
The use of square brackets is recommended, though, as round brackets can be ambiguous in the context of certain 6502 instructions and their syntax.

1+2: 3
2*3: 6
1+2*3: 9 (1+2 => 3, 3*3 => 9)
1+[2*3]: 7 ([2*3] => 6, 1+6 => 7)
1+(2*3): same as above

Expressions may include defined symbols and instruction labels.
There must not be any white space in an expression!

The Program Counter

The program counter (also PC or location counter) represents the memory address of the current instruction. Outside of an instruction, it represents the address, where the next instruction will be inserted. There are several ways to address the program counter:

* = $1234: the asterisk represents the "native" (MOS) format. Assigning to it sets the program counter.
BEQ *+2: the asterisk may be used in expressions as well.
* = *+4 $EA: when assigning to the program counter, an optional second argument specifies a fill-byte to be applied to any gaps. Here, we advance the program counter by 4 locations and fill the gap with NOP instructions ($EA).
P% = P%+2: the symbol "P%" may be used synonymously to the asterisk anywhere the former may occur.
.ORG $1234: the more modern-style directive ".ORG" may be used for setting PC, as well. (However, you can't use ist in an expression.)
.ORG = $1234: you may use .ORG in assignment style, as well.
.ORG EQU $1234: generally, "EQU" may be used as the assignment operator, as well.
(Mind that there must be white-space around "EQU" in order for it to be recognized as a token, which is not a requirement with "=".)
.RORG $1234: synonym to ".ORG" (in many assemblers you are not allowed to alter the origin set by ".ORG" and this is meant to provide compatibility.)
BEQ .+2: in expressions, a dot (.) may be used synonymously for the asterisk. However, you can not assign to it. (Strictly speaking, this is local context, but, while the assembler doesn't implement macros, it's the same anyway.)

Relative Offset Literals

As an extension to the standard syntax the assembler also allows relative offset literals for branch instructions (the relative offset to PC+2 as in machine code, instead of the usual target address) with the "#" prefix (same as immediate mode):

BCS #0: equivalent to "BCS *+2", results in "B0 00" ($B0: instruction code for BCS).
BCS #4: equivalent to "BCS *+6", results in "B0 04".
BCS #-4: equivalent to "BCS *-2", results in "B0 FC" ($FC: -4 in two's complement).
BCS #$FC: as above, results in "B0 FC".
BCS #6-(2*5): expressions allowed, equivalent to "BC #-4", results in "B0 FC".

Relative offset literals are automatically constrained to single-byte values in the range of $00…$FF:

BCS #$104: results in "B0 04".
BCS #$1FC: results in "B0 FC".

Labels and Symbols

Instruction labels and defined symbols start with a letter character or underscore and may contain, letters, digits, or the undesrcore. Please mind that, for compatibility with older and historic sources, only the first 12 characters are significant. Use option "LONGNAMES" (see below) to disable this default.

Instruction labels may precede an instruction or may be the only entity on a line. They may be optionally end in a trailing colon. Labels may be used anywhere in an expression:

LOOP LDA A,X: declares the instruction label LOOP.
LOOP: LDA A,X: labels may end in a colon (optional).
BEQ LOOP: using a label as an address value.

Optional "@" prefix for further compatibility:

@LOOP LDA,X: Labels may be declared an optional "@" prefix.
@LOOP: LDA,X: Same as above, but using a trailing colon.
BNE @LOOP: Labels may be referred to using an optional "@" prefix.

Symbols are declared by an assignment and may be used as values anywhere.

TEST = $2000: declares the symbol “TEST”.
TEST EQU $2000: “EQU” may be used synonymously.
C = *+[TEST*2]: assignments may be complex expressions.

Mind that — like with most assemblers — you may not redefine or reuse any symbols or labels per default. However, you may change this behavior by setting option "REDEF" (see below).

Note on hexadecimal values and automatic zero-page mode

Any numeric values provided by at least 4 hexadecimal digits, where the two leading digits are zeros, will be considered to be of word-size and will effect absolute address modes, when used in ambiguuos context. This "word-size tainting" also propagates to expressions and assignments. (E.g., defining the symbol "C" by "C = 0x0002" and using this in "LDA C+2" will result in a word-sized, absolute instruction, while the effective value is well inside single-byte range. Defining C as "0x02", on the other hand, would have resulted in a zero-page address mode instruction.)
If a label or symbol yet undefined is encountered in a value expression in pass #1, a word-size format will be automatically assumed and addresses will be reserved accordingly. If it is still undefined in pass #2, an error will be thrown. (In assignments to the program counter, however, an expression must resolve in pass #1 already, otherwise the assembly fails.)

Anonymous (Temporary) Labels

The assembler also supports anonymous labels for temporary branch and jump targets:
Just mark an instruction by "!" or ":" (empty label) and refer to this mark by either "!+" (or ":+") for the next anonymous label as a target or by "!-" (or ":-") for the previous one. You may refer to a target further away by repeating "+" or "-". E.g., "BNE !--" branches to the second anonymous label before the insertion point. Mind that this counts anonymous labels and not addresses.

Example:

! START  LDA #0        ;first anonymous label
                       ;anonymous labels may precede a normal label
         LDX #0
!                      ;just mark this address
:        STA $1000,X   ;third label (same address), we may use ":" as well
         INX
         BNE !-        ;select the closest previous anonymous label
         JMP :---      ;jump back 3 anonymous labels (same as START)
                       ;again, ":" and "!" are synonymous

This will assemble to (with anonymous labels listed in a column of their own):

LOC   CODE         LABEL     INSTRUCTION

0800  A9 00      ! START     LDA #$00
0802  A2 00                  LDX #$00
0804             !
0804  9D 00 10   !           STA $1000,X
0807  E8                     INX
0808  D0 FA                  BNE $0804
080A  4C 00 08               JMP $0800

There is also support for an alternative grammar for anonymous targets, marking forward and backward references separately (like it's used by the ACME cross-assembler.)
Here, instructions used for forward references are marked by "+" and those to be used for backward references are marked by "−", each contributing to a dedicated list of anonymous labels. These are then referred to as a target address as above, but without any leading "!" or ":". This is an important difference! Please mind that this is not just an alternative syntax, but comes with its own semantics.
(Hence, these targets are managed in separate lists. While not recommended, you could mix both grammars in a single source.)

* = $800
         BCS +         ;branch to exit
         LDY #3
         LDA $3000
-        CLC           ;outer loop
         ADC #5
         LDX #5
-        STA $1000,x   ;inner loop
         DEX
         BNE -
         DEY
         BNE --
+        RTS           ;forward target

LOC   CODE         LABEL         INSTRUCTION

0800                             * = $0800
0800  B0 13                      BCS $0815   ;branch to exit
0802  A0 03                      LDY #$03
0804  AD 00 30                   LDA $3000
0807  18         -               CLC         ;outer loop
0808  69 05                      ADC #$05
080A  A2 05                      LDX #$05
080C  9D 00 10   -               STA $1000,X ;inner loop
080F  CA                         DEX
0810  D0 FA                      BNE $080C
0812  88                         DEY
0813  D0 F2                      BNE $0807
0815  60         +               RTS         ;forward target

Restrictions:
This feature is only supported for branch instructions and absolute jump targets. An anonymous target must be the sole operand and cannot be used in an arithmetic expression.

Note: Anonymous labels are not listed in symbol tables.

Pragmas and Directives

Pragmas and directives start generally with a dot. For enhanced compatibility, an exclamation mark ("!") may be used as well, but will be normalized and show up as a dot in the listing of pass 2. The following examples use the dot for a general/canonical notation.

Directives for embedding data:

.BYTE 1, $02: embeds a single byte or a list of bytes at the current location. Lists are sperated by white-space and/or commas. (An optional "#", preceding any values, is ignored.) Values may be complex expressions, as well.
.DBYTE $12EF: embeds a double byte given in LLHH memory order (little-endian). This inserts the bytes $12 and $EF at the current location. Again, complex expressionsare allowed and ".DBYTE" takes a list of values, as well.
.WORD $12EF: embeds a word given in HHLL order (human readable, big-endian). This inserts the bytes $EF and $12 at the current location. (Also, use this when using previously defined labels and symbols in an expression.)
Again, values and expressions may be also provided as a list, as well.
.TEXT "Abc": embeds a text literal (case-sensitive) using the current encoding, here always PETSCII.
.PETSCII "Abc": embeds a text literal (case-sensitive) using Commodore 8-bit encoding.
.PETSCR "Abc": embeds a text literal (case-sensitive) as Commodore 8-bit screen codes.
.IMAGE "X..XX.X.": embeds a byte represented by an image string.
Characters "X", "x" or "#" are considered as binary 1, any others as binary 0.
E.g., "X..XX.X." will be interpreted as %10011010 or $9A.

Supported synonyms:

.WO $12EF: synonym for ".WORD".
.BYT $01: synonym for ".BYTE".
.BY $01: synonym for ".BYTE".
.DB $02: synonym for ".BYTE" (Define Byte).
.DCB $03: synonym for ".BYTE" (Define Constant Byte).
.DBYT $12EF: synonym for ".DBYTE".
.PET "Abc": synonym for ".PETSCII".
.SCREEN "Abc": synonym for ".PETSCR".
.SCR "Abc": synonym for ".PETSCR".
.TX "Abc": synonym for ".TEXT".
.ASCII "Abc": here the same as ".PETSCII".
.IMG "X..XX.X.": synonym for ".IMAGE".

Directives for aligning code or filling space:

.ALIGN $100

advances the program counter to the next multiple of the value provided (here, we align to the next memory page). Any gaps will be filled by zero. If no argument is provided ".ALIGN" aligns to the next even memory location.

.ALIGN $100 $EA

an optional second byte may specify a byte value to be used to fill any gaps (here $EA, "NOP", as used by most Commodre 8-bit machines).

.FILL $20 $EA

fill the next n bytes using the value provided by the second argument. If no second argument is providing, zero will be used as the fill-byte.

.REPEAT n

repeats the instruction or directive following this directive on the same line n times. An optional "STEP" parameter defines an increment to be applied to the repeat-counter on each iteration (default 1). The repeat-counter is accessibly as "R%".
E.g.,

.REPEAT 26 .BYTE 'A+R%

will fill the next 26 memory locations with the letters of the alphabet.

ODD_NUMS ;generate list of odd numbers
.REPEAT 5 STEP 2 .BYTE 1+R%

will fill the next 5 memory locations with the odd number series 1,3,5,7,9.

And this will fill the next 6 bytes by the sequence 0x00, 0x00, 0x02, 0x02, 0x04, 0x04:

.REPEAT 3 STEP 2 *=*+2 R% ;PC += 2, fill-byte R%

Other directives:

.END: ends the source code, any remaining text is ignored. (optional)
.NOLIST: switches listing output off (e.g, for data sections. This is also available as an option.)
.LIST: switches listing output on (default, also available as an option).
.SKIP: inserts a blank line in the listing (pass #2). This is mostly for compatibility.
.PAGE: inserts a blank line and a page number in the listing (pass #2). Any comment found at the head of the source code will be used as a title. Again, this is mostly for compatibility.
.DATA: any such directive is ignored (this merely exists to ensure compatibility with symbol tables used by this stand-alone disassembler.

Special directives for Commodore BASIC:

.BASICSTART

Generates a short BASIC program, consisting of optional REM-lines and a line with a "SYS" command, jumping to the next available address immediately following this BASIC text (which starts at 0x0401, the BASIC start address off the Commodore PET). The program counter will be advanced to this start address automatically.
Without any arguments, just a line with the SYS command will be generated, using the current year as the line number:

.BASICSTART
> 2023 SYS 1038

If a first, numeric argument is provided, this will be used as a line number for the line holding the SYS statement:

.BASICSTART 10
> 10 SYS 1038

If a string argument is provided, the assembler will generate a heading line with line number "0" and a REM statement using this string. If a list of strings (separated by white-space and optionally commas) is provided, or a string contains a line-break ("\n"), multiple REM lines will be generated:

.BASICSTART 2001 "*** a program ***", "(c) example.com"
> 0 REM *** A PROGRAM ***
> 1 REM (c) EXAMPLE.COM
> 2001 SYS 1084

(Mind that lower case letters will appear as upper-case and upper-case letters as graphics characters in standard PETSCII upper-case/graphics mode.)

.PETSTART

Same as ".BASICSTART" (see above).

Options

Options are a special set of directives switching the behavior of the assembler. Like other pragmas, they start with a dot (.) or an eclamation mark (!).

.OPT WORDA: switches automatic zero-page detection for address modes off. All addresses default to word-size and zero-page address modes must be specified manually by a leading asterisk ("*") or the byte extension (".b"). Use this for fine grain control and/or compatibility with old sources.
.OPT ZPGA: switches automatic zero-page detection to on (default).
.OPT ZPA: synonym to option "ZPGA".
.OPT ILLEGALS: enables support for “illegal” op-codes (see below).
.OPT LEGALS: disables support for “illegal” op-codes (default).
.OPT NOILLEGALS: synonym to option "LEGALS".
.OPT REDEF: allows symbols and labels to be redefined / reused.
.OPT NOREDEF: reuse of symbols is not allowed and will throw an error (default).
.OPT ASCII: set character encoding for “.TEXT”-directives and character literals
Here, this is only included for compatibility reasons and the encoding always defaults to PETSCII.
.OPT PETSCII: set the default character encoding to PETSCII.
.OPT PETSCR: set the default character encoding to Coomodore 8-bit screen characters.
.OPT SCREEN: synonym to option "PETSCR".
.OPT SCR: same as above.
.OPT NOLIST: switches listing output off.
.OPT LIST: witches listing output on (default).
.OPT LONGNAMES: disables the default 12 character limit for the significance of labels and identifiers for unlimited length.

Further, the following options (mostly used by MOS assemblers) are recognized for compatibility, but are otherwise ignored: XREF, NOXREF, COUNT, NOCOUNT, CNT, NOCNT, MEMORY, NOMEMORY, GENERATE, NOGENERATE.

Compatibility

This assembler is all about a quick assembly session without worrying too much about the specific syntax (starting with the format of the very first MOS cross-assembler and extending to more modern styles). As long as you do not require macros or conditional assembly, you should be able to throw about any style of source code at it.

E.g., the following examples are semantically identical and produce the same object code:

;MOS/traditional

* = $4000
TARGET = $20

       LDY *$20
LOOP   LDA $0080,Y
       ROL A
       STA (TARGET)Y
       DEY
       BNE LOOP
       RTS
.END

;modern style

.ORG 0x4000
TARGET EQU 0xC0

       LDY.b 0x20
LOOP:  LDA.w 0x80,Y
       ROL
       STA (TARGET),Y
       DEY
       BNE LOOP
       RTS
.END

Processing Example

Here is an example for a complete assembly of a short source:

Source code:

;fill a page with bytes,
;preserve program

*=$800

start
      ldx #offset
loop  txa
      sta start,x
      inx
      bne loop
      brk

;insert bytes here
offset=*-start
.end

Resulting object code:

0800: A2 0A 8A 9D 00 08 E8 D0
0808: F9 00

Listing:

pass 1

LINE  LOC          LABEL     PICT

   1               ;fill a page with bytes,
   2               ;preserve program

   4  0800                   * = $800
   6  0800         START
   7  0800                   LDX #OFFSET
   8  0802         LOOP      TXA
   9  0803                   STA START,X
  10  0806                   INX
  11  0807                   BNE LOOP
  12  0809                   BRK
  14                         ;insert bytes here
  15                         OFFSET = *-START
  16                         .END

symbols
 LOOP       $0802
 OFFSET       $0A
 START      $0800

pass 2

LOC   CODE         LABEL     INSTRUCTION

                   ;fill a page with bytes,
                   ;preserve program

0800                         * = $0800
0800               START
0800  A2 0A                  LDX #$0A
0802  8A           LOOP      TXA
0803  9D 00 08               STA $0800,X
0806  E8                     INX
0807  D0 F9                  BNE $0802
0809  00                     BRK
                             ;insert bytes here
                             OFFSET = $0A
                             .END

done (code: 0800..0809).

Illegal Opcodes

Support for "illegal" opcodes (undefined instructions) is enabled by the pragma ".OPT ILLEGALS".

The following mnemonics are implemented (supported synonyms given in parenthesis):

opc (synonyms) imp imm abs abX abY zpg zpX zpY inX inY

ALR (ASR) | 4B |

ANC | 0B |

ANC2 | 2B |

ANE (XAA) | 8B |

ARR | 6B |

DCP (DCM) | CF DF DB C7 D7 C3 D3 |

ISC (ISB, INS) | EF FF FB E7 F7 E3 F3 |

LAS (LAR, LAE) | BB |

LAX (ATX) | AB AF BF A7 B7 A3 B3 |

LXA (LAX imm) | AB |

RLA | 2F 3F 3B 27 37 23 33 |

RRA | 6F 7F 7B 67 77 63 73 |

SAX (AXS, AAX) | 8F 87 97 83 |

SBX | CB |

SHA (AXA, AHX) | 9F 93 |

SHX | 9E |

SHY (SAY, SYA) | 9C |

SLO (ASO) | 0F 1F 1B 07 17 03 13 |

SRE (LSE) | 4F 5F 5B 47 57 43 53 |

TAS (SHS, XAS) | 9B |

USBC | EB |

NOP | EA 80 0C 1C 04 14 |

DOP (SKB) | 80 04 14 |

TOP (SKW) | 0C 1C |

JAM (HLT, KIL) | 02 |

Notes:

NOP: There are several NOP instructions, but these are the ones commonly used.
DOP: "double NOP" (single-byte operand/address, 2 bytes total).
TOP: "triple NOP" (word-address, 3 bytes total).
JAM: freezes the CPU (again, there are several equivalent instructions).

Combining Assembler with BASIC

There are two general approaches to combing BASIC text with assembler code, one from the assembler side and another, probably more versatile one from the BASIC side of things. An important thing to note is that with both approaches you do not set the program counter, as its value will be determined by the assembler. Both methods produce a stand-alone program that can be saved or exported by other means as a single binary file.

BASIC Preambles in Assembler Sources

The first one is the pragma ".BASICSTART" already described above. You can add an optional line number for the final SYS statement and any number of strings, which will be prepended in REM statements starting at line number 0. The assembler will generate the required tokenized BASIC code with a SYS statement pointing to immediately after this short BASIC program and set the program counter for the assembly accordingly.

Example

Source (assembler text “proud-demo.asm”)

.BASICSTART 100 "*** my great program ***","(c) by me and me alone"

RTS ; that's it

Assembler Listing

pass 1

LINE  LOC          LABEL     PICT

   1  0401                   .BASICSTART 100 "*** my great program ***\n(c) by me and me alone"
   3  044A                   RTS ; that's it

pass 2

LOC   CODE         LABEL     INSTRUCTION

0401                         .BASICSTART 100 "*** my great program ***\n(c) by me and me alone"
>>>>  COMPILING BASIC PREAMBLE...
0401  20 04                  $0420 ;LINE LINK
0403  00 00                  $0000 ;LINE NO. ("0")
0405  8F 20                  ;"REM "
0407  2A 2A 2A               ;TEXT "***"
040A  20 4D 59               ;TEXT " MY"
040D  20 47 52               ;TEXT " GR"
0410  45 41 54               ;TEXT "EAT"
0413  20 50 52               ;TEXT " PR"
0416  4F 47 52               ;TEXT "OGR"
0419  41 4D 20               ;TEXT "AM "
041C  2A 2A 2A               ;TEXT "***"
041F  00                     $00   ;EOL
0420  3D 04                  $043D ;LINE LINK
0422  01 00                  $0001 ;LINE NO. ("1")
0424  8F 20                  ;TOKEN REM, " "
0426  28 43 29               ;TEXT "(C)"
0429  20 42 59               ;TEXT " BY"
042C  20 4D 45               ;TEXT " ME"
042F  20 41 4E               ;TEXT " AN"
0432  44 20 4D               ;TEXT "D M"
0435  45 20 41               ;TEXT "E A"
0438  4C 4F 4E               ;TEXT "LON"
043B  45                     ;TEXT "E"
043C  00                     $00   ;EOL
043D  48 04                  $0448 ;LINE LINK
043F  64 00                  $0064 ;LINE NO. ("100")
0441  9E 20                  ;TOKEN SYS, " "
0443  31 30 39               ;TEXT "109
0446  38                     ;TEXT "8"
0447  00                     $00   ;EOL
0448  00 00                  $0000 ;END OF BASIC TEXT (EMPTY LINK)
>>>>  START OF ASSEMBLY AT $044A ("SYS 1098")
044A  60                     RTS ; that's it

done (code: 0401..044A).

Listing in BASIC (0x044A = dec. 1098)

LIST

 0 REM *** MY GREAT PROGRAM ***
 1 REM (C) BY ME AND ME ALONE
 100 SYS 1098
READY.

Here’s another example (probably a bit more healthy and stable),

hearts.asm (click for the source file)

.basicstart 100 "with love..."

       lda #211      ;screen code inverted heart
       ldx #0
loop:  sta $8000,x   ;fill the screen
       sta $8100,x
       sta $8200,x
       sta $8300,x
       dex
       bne loop
       rts           ;done

Assembler Code Appended to BASIC Sources

The other, probably more capable approach is appending an assembler source to a BASIC source file

This achieved by the special BASIC source tag "{ASM START}" (also "{ASM_START}" or "{ASMSTART}"). This will immediately terminate the processof tokenizing the BASIC program with the rest of the line ignored and will insert an ASCII sequence for the memory address following immediately after the BASIC program that is currently generated. Any source text following this will be assumed to be 6502 assembler code. (If no such code is found, a simple RTS instruction will be appended.)

The general idea is that you put this behind a "SYS" command to form a final statement that leads to the execution of the following machine language program.

Example

hearts-demo-1.txt (click for the source file)

100 REM HEARTS DEMO 1
110 PRINT "READY FOR SOME HEARTS?"
120 GET K$:IF K$="" GOTO 130
130 SYS {ASMSTART}

;routine to fill the screen with hearts
       lda #83      ;screen code for heart character
       ldx #0
loop:  sta $8000,x
       sta $8100,x
       sta $8200,x
       sta $8300,x
       dex
       bne loop
       rts          ;return to BASIC

BASIC Listing

LIST

 100 REM HEARTS DEMO 1
 110 PRINT "READY FOR SOME HEARTS?"
 120 GET K$:IF K$="" GOTO 130
 130 SYS 1112
READY.

Here, the emulator behaves as it usually does, whenever we drop a BASIC source file onto it: the code will be transformed and loaded seamlessly, but you will be asked, whether you would want to review the assembler listing or not. In case there should be an error and the assembly fails, the listing will be presented anyways.
Mind that this can be used to push any configurations, etc, to dialogs written in BASIC, where this may be easier to handle than in assembler.

Notably, this can also be used without this assembler, just omit the assembler part, export the resulting program either as binary or as a hex-dump and use it in the assembler of your choice, replacing the final RTS (0x60) instruction by your code.

Another way of using this is by appending the assembler code to a final DATA statement, from where we can read the jump address to be used from anywhere in the BASIC program. Here, rather than using BASIC as a means to start our assembler program, we use the assembler to provide some fast routine(s) for BASIC.

Example
(Here, we deposite a screen code to be used for filling the screen in address 255 before calling our routine. This is either screen code 83 for a heart character or 211 for an inverted heart character. We use this to blink the screen three times and finally clear it by filling it using a space character. Mind how the call address is read into variable A from the final DATA statement.)

hearts-demo-2.txt (click for the source file)

100 REM HEARTS DEMO 2
110 READ A: REM READ CALL ADDRESS
120 PRINT "READY FOR SOME HEARTS?"
130 GET K$:IF K$="" GOTO 130
140 FOR I=0 TO 4
150 POKE 255, 83:SYS A
160 FOR D=0 TO 300:NEXT D
170 POKE 255,211:SYS A
180 FOR D=0 TO 300:NEXT D
190 NEXT I
200 POKE 255,32:SYS A
210 PRINT "THIS WAS FUN!"
220 DATA {ASMSTART}

;routine to fill the screen with char in $ff
       lda $ff ;unused zeropage addr
       ldx #0
loop:  sta $8000,x
       sta $8100,x
       sta $8200,x
       sta $8300,x
       dex
       bne loop
       rts

Using “Disassemble Program” from the emulator’s “Utils/Export” menu, we get the result of our combined BASIC and assembler efforts as in memory:

Program Disassembly ($0401-$0511)

                         .[tokenized BASIC text]

0401  15 04               link: $0415
0403  64 00               line# 100
0405  8F                  token REM
0406  20 48 45 41 52 54   ascii « HEART»
040C  53 20 44 45 4D 4F   ascii «S DEMO»
0412  20 32               ascii « 2»
0414  00                  -EOL-
0415  32 04               link: $0432
0417  6E 00               line# 110
0419  87                  token READ
041A  20 41 3A 20         ascii « A: »
041E  8F                  token REM
041F  20 52 45 41 44 20   ascii « READ »
0425  43 41 4C 4C 20 41   ascii «CALL A»
042B  44 44 52 45 53 53   ascii «DDRESS»
0431  00                  -EOL-
0432  51 04               link: $0451
0434  78 00               line# 120
0436  99                  token PRINT
0437  20 22 52 45 41 44   ascii « "READ»
043D  59 20 46 4F 52 20   ascii «Y FOR »
0443  53 4F 4D 45 20 48   ascii «SOME H»
0449  45 41 52 54 53 3F   ascii «EARTS?»
044F  22                  ascii «"»
0450  00                  -EOL-
0451  68 04               link: $0468
0453  82 00               line# 130
0455  A1                  token GET
0456  20 4B 24 3A         ascii « K$:»
045A  8B                  token IF
045B  20 4B 24            ascii « K$»
045E  B2                  token =
045F  22 22 20            ascii «"" »
0462  89                  token GOTO
0463  20 31 33 30         ascii « 130»
0467  00                  -EOL-
0468  76 04               link: $0476
046A  8C 00               line# 140
046C  81                  token FOR
046D  20 49               ascii « I»
046F  B2                  token =
0470  30 20               ascii «0 »
0472  A4                  token TO
0473  20 34               ascii « 4»
0475  00                  -EOL-
0476  88 04               link: $0488
0478  96 00               line# 150
047A  97                  token POKE
047B  20 32 35 35 2C 20   ascii « 255, »
0481  38 33 3A            ascii «83:»
0484  9E                  token SYS
0485  20 41               ascii « A»
0487  00                  -EOL-
0488  9C 04               link: $049C
048A  A0 00               line# 160
048C  81                  token FOR
048D  20 44               ascii « D»
048F  B2                  token =
0490  30 20               ascii «0 »
0492  A4                  token TO
0493  20 33 30 30 3A      ascii « 300:»
0498  82                  token NEXT
0499  20 44               ascii « D»
049B  00                  -EOL-
049C  AE 04               link: $04AE
049E  AA 00               line# 170
04A0  97                  token POKE
04A1  20 32 35 35 2C 32   ascii « 255,2»
04A7  31 31 3A            ascii «11:»
04AA  9E                  token SYS
04AB  20 41               ascii « A»
04AD  00                  -EOL-
04AE  C2 04               link: $04C2
04B0  B4 00               line# 180
04B2  81                  token FOR
04B3  20 44               ascii « D»
04B5  B2                  token =
04B6  30 20               ascii «0 »
04B8  A4                  token TO
04B9  20 33 30 30 3A      ascii « 300:»
04BE  82                  token NEXT
04BF  20 44               ascii « D»
04C1  00                  -EOL-
04C2  CA 04               link: $04CA
04C4  BE 00               line# 190
04C6  82                  token NEXT
04C7  20 49               ascii « I»
04C9  00                  -EOL-
04CA  DB 04               link: $04DB
04CC  C8 00               line# 200
04CE  97                  token POKE
04CF  20 32 35 35 2C 33   ascii « 255,3»
04D5  32 3A               ascii «2:»
04D7  9E                  token SYS
04D8  20 41               ascii « A»
04DA  00                  -EOL-
04DB  F1 04               link: $04F1
04DD  D2 00               line# 210
04DF  99                  token PRINT
04E0  20 22 54 48 49 53   ascii « "THIS»
04E6  20 57 41 53 20 46   ascii « WAS F»
04EC  55 4E 21 22         ascii «UN!"»
04F0  00                  -EOL-
04F1  FC 04               link: $04FC
04F3  DC 00               line# 220
04F5  83                  token DATA
04F6  20 31 32 37 38      ascii « 1278»
04FB  00                  -EOL-
04FC  00 00               -EOP- (link = null)

                         .[end of BASIC text]

                         * = $04FE
04FE  A5 FF              LDA $FF
0500  A2 00              LDX #$00
0502  9D 00 80   L0502   STA $8000,X
0505  9D 00 81           STA $8100,X
0508  9D 00 82           STA $8200,X
050B  9D 00 83           STA $8300,X
050E  CA                 DEX
050F  D0 F1              BNE L0502
0511  60                 RTS
                         .end

This mechanism can be used to integrate multiple machine language routines, but you will have to add any offset to the base address returned in the final DATA statement on your own.

Note: I’m not aware that this has been done before, so this could be well a genuine invention, since this requires some kind of engine capable of handling both BASIC source text and assembler code at once.

Fixed Start Addresses with BASIC Sources

In case you really want to use a fixed start address for your routine, you may either put the "{ASMSTART}" behind a dummy command or best in a "REM" statement. (Mind that hiding it in a string won’t work, as any such text is ignored by the parser.)

In the following example, the space between $0475, the end of the BASIC program, and $0480 (decimal 1152), the explicitly provided start of the 6502 code, will be filled by zero-bytes:

hearts-demo-3.txt (click for the source file)

100 REM HEARTS DEMO 3
110 PRINT "READY FOR SOME HEARTS?"
120 GET K$:IF K$="" GOTO 130
130 SYS 1152
140 REM ML RANGE STARTS AT {ASMSTART}

* = $0480

;routine to fill the screen with hearts
       lda #83      ;screen code for heart character
       ldx #0
loop:  sta $8000,x
       sta $8100,x
       sta $8200,x
       sta $8300,x
       dex
       bne loop
       rts          ;return to BASIC

This will result in the following hex-dump (”Utils/Export” → “Hex-Dump Program”):

0400: .. 15 04 64 00 8F 20 48   ..... H
0408: 45 41 52 54 53 20 44 45  EARTS DE
0410: 4D 4F 20 33 00 34 04 6E  MO 3.4..
0418: 00 99 20 22 52 45 41 44  .. "READ
0420: 59 20 46 4F 52 20 53 4F  Y FOR SO
0428: 4D 45 20 48 45 41 52 54  ME HEART
0430: 53 3F 22 00 4B 04 78 00  S?".K...
0438: A1 20 4B 24 3A 8B 20 4B  . K$:. K
0440: 24 B2 22 22 20 89 20 31  $."" . 1
0448: 33 30 00 56 04 82 00 9E  30.V....
0450: 20 31 31 35 32 00 74 04   1152...
0458: 8C 00 8F 20 4D 4C 20 52  ... ML R
0460: 41 4E 47 45 20 53 54 41  ANGE STA
0468: 52 54 53 20 41 54 20 31  RTS AT 1
0470: 31 34 32 00 00 00 00 00  142.....
0478: 00 00 00 00 00 00 00 00  ........
0480: A9 53 A2 00 9D 00 80 9D  .S......
0488: 00 81 9D 00 82 9D 00 83  ........
0490: CA D0 F1 60              ...`

(Orange: end of BASIC text, blue: filled by assembler until start of 6502 code at 0x480.)

The various parts still form a homogeneous program as indicated by the system pointers TXTTAB (start of BASIC text) = $0401 and VARTAB (start of BASIC variables) = $0494.
(Use ”Utils/Export” → “Show BASIC System Pointers” to view these pointers.)

Rationale — General Considerations

I’ve always looked with respectful envy at those BASIC dialects featuring in-line assembly, like BBC BASIC. Could we have similar for Commodore BASIC? I’d argue that this isn’t the way to go about this on the Commodore 8-bits, since the BASIC runtime shuffles variables around in memory, as new variables are encountered. This is especially true for subscripted variables (arrays), which are often used for a scheme like this, and there is no such thing as a stable location in memory.

The Commodore way of doing this — at least for me — is appending any machine language code to the BASIC program, but including it in the program range as set by the two system pointers TXTTAB and VARTAB, the former holding the start address of the tokenized BASIC text in memory (usually 0x401 on the PET), the latter providing the start of the memory available for variables, just after the last byte of BASIC text. This way, the machine language part is still an integral part of the program and won’t be affected by the runtime.

However, mind that, should your machine language routine(s) make use of some tables, you’d better reserve the space required. Since, if you were merely addressing some space beyond your program blindly, this potentially clashes with any variables managed by the BASIC runtime.

— ❦ —

Finally, some useful addresses (new ROM / BASIC 2.0)

; PET 2001 system addresses (ROM 2.0)

USRPOK   = $00  ;$4C constant (JMP instruction)
USRADD   = $01  ;USR function addr. lo, hi ($02)
COUNT    = $05  ;BASIC input buffer pointer ("#" subscript)
VAUYP    = $07  ;variable flag, type: $FF=string, $00=numeric
INTFLG   = $08  ;integer flag: $80=integer, $00=floating point
GARBFL   = $09  ;flag for DATA, LIST quote, memory
SUBFLG   = $0A  ;flag for subscript, FNx
INPFLG   = $0B  ;input/read flag: $00=input, $40=get, $98=read
TANSGN   = $0C  ;flag ATN sign, comparision evaluation
LINNUM   = $11  ;BASIC integer address for SYS, GOTO, etc (lo, hi)
INDEX    = $1F  ;pointer for number transfer (lo, hi)
RESHO    = $23  ;product staging area for multiplication
TXTTAB   = $28  ;pointer: start of BASIC text in memory
VARTAB   = $2A  ;pointer: end of BASIC, start of variables
ARYTAB   = $2C  ;pointer: end of variables, start of arrays
STREND   = $2E  ;pointer: end of arrays
FRETOP   = $30  ;pointer: top of memory, bottom of strings
FRESPC   = $32  ;utility string pointer
MEMSIZ   = $34  ;pointer: limit of BASIC memory
CURLIN   = $36  ;current BASIC line number
OLDLIN   = $38  ;previous BASIC line number
OLDTXT   = $3A  ;pointer to BASIC statement for CONT
DATLIN   = $3C  ;line number, current DATA item
DATPTR   = $3E  ;pointer to current DATA item
INPPTR   = $40  ;input vector
VARNAM   = $42  ;current variable name
VARPNT   = $44  ;current variable address
FORPNT   = $46  ;variable pointer for FOR/NEXT
TEMPF1   = $54  ;misc numeric storage area
TEMPF2   = $59  ;misc numeric storage area
FACEXP   = $5E  ;floating point accumulator 1: exponent
FACHO    = $5F  ;floating point accumulator 1: mantissa (4 bytes)
FACSGN   = $63  ;floating point accumulator 1: sign
SGNFLG   = $64  ;series evaluation constant pointer
BITS     = $65  ;accumulator hi-order propagation word
ARGEXP   = $66  ;floating point accumulator 2: exponent
ARGHO    = $67  ;floating point accumulator 2: mantissa (4 bytes)
ARGSGN   = $6B  ;floating point accumulator 2: sign
ARISGN   = $6C  ;sign comparison (primary vs. secondary)
FACOV    = $6D  ;low-order rounding byte for FAC #1
FBUFPT   = $6E  ;cassette buffer length/series pointer
CHRGET   = $70  ;subroutine to get the next character
CHRGOT   = $76  ;character found by CHARGET
TXTPTR   = $77  ;pointer to source text for CHARGET
RNDX     = $88  ;round storage and work area
TIME     = $8D  ;jiffy clock in 1/60 sec for TI and TI$ (lo, hi)
CINV     = $90  ;IRQ vector (lo, hi), hardware interrupt
CBINV    = $92  ;BRK interrupt vector (lo, hi)
NMINV    = $94  ;NMI interrupt vector (lo, hi)
STATUS   = $96  ;status word ST
LSTX     = $97  ;which key? matrix coordinates of last key down: row/col, $FF=no key
SFDX     = $98  ;shift key: 1=pressed
STKEY    = $9B  ;last read from keyboard scan: STOP and RVS flags
SVXT     = $9C  ;timing constant buffer
VERCK    = $9D  ;flag: LOAD=0, VERIFY=1
NDX      = $9E  ;index into keyboard buffer
RVS      = $9F  ;screen reverse flag
C3PO     = $A0  ;IEEE output flag: $FF=character waiting
INDX     = $A1  ;pointer: end-of-line for input
LXSP     = $A3  ;cursor log (row, col)
BSOUR    = $A5  ;IEEE output character buffer
BLNSW    = $A7  ;flag: 0=flashing cursor, else no cursor
BLNCT    = $A8  ;countdownfor cursor timing
GDBLN    = $A9  ;character under cursor
BLNON    = $AA  ;cursor blink flag
SYNO     = $AB  ;EOT bit received
NXTBIT   = $AB  ;-- " --
CRSW     = $AC  ;input from screen/input from keyboard
LDTND    = $AE  ;number of open files, pointer into file table
DFLTN    = $AF  ;input device (normally 0)
DFLTO    = $B0  ;output CMD device (normally 3)
PRTY     = $B1  ;tape character parity
DPSW     = $B2  ;byte received flag
BUFPNT   = $BB  ;tape buffer #1 count ($BC: tape buffer #2 count)
INBIT    = $BD  ;write leader count, read pass 1/pass 2
BITCI    = $BE  ;write new byte, read error flag
RINONE   = $BF  ;write start bit, read bit seq error
FNMIDX   = $C0  ;pass 1 error log pointer
PTR1     = $C0  ;-- " --
PTR2     = $C1  ;pass 2 error correction pointer
RIDATA   = $C2  ;current function: 0=scan, $01-$0F=count, $40=load, $80=end
RIPRTY   = $C3  ;read checksum, write leader length
PNT      = $C4  ;pointer to screen line (lo, hi)
PNTR     = $C6  ;column position of cursor on above line
SAL      = $C7  ;utility pointer for tape buffer, scrolling
EAL      = $C9  ;tape end address / end of current program
QTSW     = $CD  ;flag for quote mode: 0=direct mode, else programmed cursor
BITTS    = $CE  ;timer 1 enabled for tape read, 0=disabled
FNLEN    = $D1  ;number of characters in file name
LA       = $D2  ;current logical file number
SA       = $D3  ;current secondary address, or R/W command
FA       = $D4  ;current device number
LNMX     = $D5  ;line length (39 or 79) for screen
TAPE1    = $D6  ;start of tape buffer (address lo, hi)
TBLX     = $D8  ;current line with cursor
DATAX    = $D9  ;last key input, buffer checksum, bit buffer
FNADR    = $DA  ;pointer to current file name
INSRT    = $DC  ;number of keyboard INSERTs outstanding
ROPRTY   = $DD  ;write shift word / receive input character
FSBLK    = $DE  ;number of blocks remaining for read/write
MYCH     = $DF  ;serial buffer word
LDTB1    = $E0  ;screen line table, hi order addr. and line wrap
CAS1     = $F9  ;interrupt driver flag for cassette #1 status switch
CAS2     = $FA  ;interrupt driver flag for cassette #2 status switch
STAL     = $FB  ;tape start address (lo, hi)
MEMUSS   = $FD  ;pointer for monitor (MLM)

BAD      = $0100  ;start of processor stack, tape error log
BUF      = $0200  ;MLM area
TBUFFR   = $027A  ;tape (cassette) buffer
TIMOUT   = $03FC  ;

; kernal addresses
OPEN     = $FFC0
CLOSE    = $FFC3
CHKIN    = $FFC6  ;set input device
CHKOUT   = $FFC9  ;set output device
CLRCHN   = $FFCC  ;restor I/O
CHRIN    = $FFCF  ;read a byte from input
CHROUT   = $FFD2  ;write a byte to output
LOAD     = $FFD5
SAVE     = $FFD8
VERIFY   = $FFDB
SYS      = $FFDE
STOP     = $FFE1  ;check STOP key (affects A only, zero-flag set: STOP pressed)
GETIN    = $FFE4  ;get a character
CLALL    = $FFE7  ;abort all I/O
INCTIME  = $FFEA  ;update clock, scan and store key

; hardware addresses
VIDEO    = $8000

PIA1_PA	 = $E810
PIA1_CRA = $E811
PIA1_PB	 = $E812
PIA1_CRB = $E813
PIA2_PA	 = $E820
PIA2_CRA = $E821
PIA2_PB	 = $E822
PIA2_CRB = $E823

VIA_DRB	 = $E840
VIA_DRA	 = $E841
VIA_DDRB = $E842
VIA_DDRA = $E843
VIA_T1CL = $E844
VIA_T1CH = $E845
VIA_T1LL = $E846
VIA_T1LH = $E847
VIA_T2CL = $E848
VIA_T2CH = $E849
VIA_SR	 = $E84A
VIA_ACR	 = $E84B
VIA_PCR	 = $E84C
VIA_IFR	 = $E84D
VIA_IER	 = $E84E
VIA_ANH	 = $E84F

— ❦ —

Norbert Landsteiner
Feb. 2023
www.masswerk.at