Guide to the 6502 Assembler of the PET 2001 Emulator

General Description

The PET 2001 Emulator features a built-in assembler that is modelled closely after the original MOS 6502 assembler from the 1970s, but has been extended to support a variety of common syntax flavors. Still, it should accept and process sucessfully any original MOS source. It is based on the assembler found here and adjusted for the needs and requirements of the PET. Moreover, it provides facilities to compile stand-alone programs that can be loaded, started, and distributed on their own (see below, there are also a few sample files to start playing around with).

Please mind that that this is still a relatively simple assembler, which cames with a few restrictions and limitations:

Still, it should be good for some quick experiments, without having to worry about another syntax flavor. When it comes to complex projects, you will probably prefer an assembler and/or IDE of your own choice anway.

The assembler is fully compatible with the 6502 online assembler found at www.masswerk.at/6502/assembler.html and any source code generated by the associated disassembler. However, there is no support for external symbol tables. Please provide any symbol definitions in your source. (Some useful definitions can be found at the very end of this page.)

A general description of 6502 opcodes and what these do can be found at here: 6502 Instruction Set.

If you are already familiar with this particular assembler, there are just a few changes:

Operations

The assembler is invoked by either mounting an assembler source via the mount dialog or by dropping an assembler source on the emulated screen. An assembler source is a text file with a file name extension of either ".asm", ".a65", ".a", ".src", or just ".s".

(The assembler may be also invoked from within a BASIC source, which is covered below in the section on integration with BASIC.)

The assembler is a simple 2-pass assembler, where a first pass determines instruction lengths and addresses, while the second pass resolves final values and generates the object code (the machine language program).
As the assembler has processed the suplied source, it will present a listing in a special dialog. In this listing, the first pass shows how the assembler "sees" the source, while the second pass represents what the assembler actually resolves and lists this in a normalized format, which is close to the original MOS notation.

The assembler will always fail on the first error, since — it’s the 1970s! :-)

If the assembly succeeds, you will be presented with some options below the listing, regarding how to proceed:

Options provided by the PET 2001 emulator for a successful assembly.
Options provided for a successful assembly.

To further interact with your code, you may refer to the debugger of the emulator, available by its icon on the top right of the window. Unlock the lock icon to the right of the register display to edit registers and flags. (The debugger is only interactive when the emulated PET is halted.)

The debugger of the PET 2001 emulator.
The debugger of the PET 2001 emulator.

Mind that the PET also comes with a built-in machine monitor in ROM, which is normally hooked to the interrupt vector (IRQ). Therefore, executing any zero-byte (a BRK in 6502 machine code) will invoke the monitor. Such a zero-byte should be always present at address 0x400, just before the start of any BASIC text, and can be called by the memorable BASIC command "SYS 1024".

The built-in machine monitor of the PET 2001 (ROM2 and ROM4).
The built-in machine monitor of the PET 2001.
$E62E (IRQ) points to machine monitor in ROM2/3,
$E455 to the monitor as it is implemented in ROM4.

Basic Syntax

The assembler supports a variety of common 6502 assembler syntax styles. Mind that there must be a seperating white space between labels, opcodes, and any operands. Operands, on the other hand, must not contain any white space. Operands may be simple numeric values, defined symbols, labels, or complex expressions.
Compare the 6502 Instruction Set for instruction details and addressing modes.

Here, we use "HHLL" to represent a word-sized 16-bit operand, "LL" for a single-byte addresses, and "BB" for any other byte-sized operands. (In actuality, these may be any simple or complex expressions.)

CLC
immediate, no operand.
ROR A
instruction with accumulator as the operand.
ROR
same as above. "A" is optional and may be omitted.
LDA #BB
immediate mode, loading the literal value.
LDA HHLL
absolute, loads the value from the provided memory address.
LDA HHLL,X
absolute, X-indexed.
LDA HHLL,Y
absolute, Y-indexed.
LDA LL
zero-page address mode (with automatic address mode detection).
LDA *LL
forced zero-page address mode in the style of the original MOS assembler.
LDA.b LL
forced zero-page address mode, modern byte-size notation.
LDA.w LL
forced absolute address mode, modern word-size notation.
LDA LL,X
zero-page, X-indexed.
LDA LL,Y
zero-page, Y-indexed.
LDA (LL,X)
X-indexed, indirect.
LDA (LL),Y
indirect, Y-indexed.
LDA (LL)Y
indirect, Y-indexed, old MOS format (no comma).
JMP (HHLL)
indirect address.
BEQ HHLL
relative addresses (-127 ≤ offset ≤ +127) are computed from absolute target addresses.

Supported synonyms:

LDA.byte LL
forced zero-page address mode.
LDA.by LL
as above.
LDA.word LL
forced absolute address mode, word-size.
LDA.wo LL
as above.
LDA+1 LL
forced zero-page address mode (like ACME assembler).
LDA+2 LL
forced absolute address mode, word-size (like ACME assembler).

Comments:

;comment
comments start with a semicolon and extend to the end of the line.

The assembler is generally case-insensitive, with the exception of strings and character literals.

Generally, there may be just a single instruction on each line.

Values and Numeric Representations

The assembler supports a variety of number formats:

$12EF
hexadecimal [0-9A-F].
&12EF
hexadecimal.
0x12EF
hexadecimal.
1289
decimal [0-9].
0d1289
decimal.
@1267
octal [0-7].
0o1267
octal.
01267
octal.
%1010101
binary [01].
0b1010101
binary.
'A
character value of "A" ($41 in ASCII)

Value Expressions

Anywhere a value mayn occur this may be a complex expression as well. Expressions may include addition, subtraction, multiplication, divisions, and unary minus (+-*/ and -).

There are also the two special unary byte operators "<" ">":

<$12EF
low-byte value ($EF).
>$12EF
high-byte value ($12).

Expressions are evaluated strictly from left to right, without precedence, but may be grouped using round or square brackets ((), []).
The use of square brackets is recommended, though, as round brackets can be ambiguous in the context of certain 6502 instructions and their syntax.

1+2
3
2*3
6
1+2*3
9   (1+2 => 3, 3*3 => 9)
1+[2*3]
7   ([2*3] => 6, 1+6 => 7)
1+(2*3)
same as above

Expressions may include defined symbols and instruction labels.
There must not be any white space in an expression!

The Program Counter

The program counter (also PC or location counter) represents the memory address of the current instruction. Outside of an instruction, it represents the address, where the next instruction will be inserted. There are several ways to address the program counter:

* = $1234
the asterisk represents the "native" (MOS) format. Assigning to it sets the program counter.
BEQ *+2
the asterisk may be used in expressions as well.
* = *+4 $EA
when assigning to the program counter, an optional second argument specifies a fill-byte to be applied to any gaps. Here, we advance the program counter by 4 locations and fill the gap with NOP instructions ($EA).
P% = P%+2
the symbol "P%" may be used synonymously to the asterisk anywhere the former may occur.
.ORG $1234
the more modern-style directive ".ORG" may be used for setting PC, as well. (However, you can't use ist in an expression.)
.ORG = $1234
you may use .ORG in assignment style, as well.
.ORG EQU $1234
generally, "EQU" may be used as the assignment operator, as well.
(Mind that there must be white-space around "EQU" in order for it to be recognized as a token, which is not a requirement with "=".)
.RORG $1234
synonym to ".ORG" (in many assemblers you are not allowed to alter the origin set by ".ORG" and this is meant to provide compatibility.)
BEQ .+2
in expressions, a dot (.) may be used synonymously for the asterisk. However, you can not assign to it. (Strictly speaking, this is local context, but, while the assembler doesn't implement macros, it's the same anyway.)

Relative Offset Literals

As an extension to the standard syntax the assembler also allows relative offset literals for branch instructions (the relative offset to PC+2 as in machine code, instead of the usual target address) with the "#" prefix (same as immediate mode):

BCS #0
equivalent to "BCS *+2", results in "B0 00" ($B0: instruction code for BCS).
BCS #4
equivalent to "BCS *+6", results in "B0 04".
BCS #-4
equivalent to "BCS *-2", results in "B0 FC" ($FC: -4 in two's complement).
BCS #$FC
as above, results in "B0 FC".
BCS #6-(2*5)
expressions allowed, equivalent to "BC #-4", results in "B0 FC".

Relative offset literals are automatically constrained to single-byte values in the range of $00…$FF:

BCS #$104
results in "B0 04".
BCS #$1FC
results in "B0 FC".

Labels and Symbols

Instruction labels and defined symbols start with a letter character or underscore and may contain, letters, digits, or the undesrcore. Please mind that, for compatibility with older and historic sources, only the first 12 characters are significant. Use option "LONGNAMES" (see below) to disable this default.

Instruction labels may precede an instruction or may be the only entity on a line. They may be optionally end in a trailing colon. Labels may be used anywhere in an expression:

LOOP LDA A,X
declares the instruction label LOOP.
LOOP: LDA A,X
labels may end in a colon (optional).
BEQ LOOP
using a label as an address value.

Optional "@" prefix for further compatibility:

@LOOP LDA,X
Labels may be declared an optional "@" prefix.
@LOOP: LDA,X
Same as above, but using a trailing colon.
BNE @LOOP
Labels may be referred to using an optional "@" prefix.

Symbols are declared by an assignment and may be used as values anywhere.

TEST = $2000
declares the symbol TEST.
TEST EQU $2000
EQU may be used synonymously.
C = *+[TEST*2]
assignments may be complex expressions.

Mind that — like with most assemblers — you may not redefine or reuse any symbols or labels per default. However, you may change this behavior by setting option "REDEF" (see below).

Note on hexadecimal values and automatic zero-page mode

Any numeric values provided by at least 4 hexadecimal digits, where the two leading digits are zeros, will be considered to be of word-size and will effect absolute address modes, when used in ambiguuos context. This "word-size tainting" also propagates to expressions and assignments. (E.g., defining the symbol "C" by "C = 0x0002" and using this in "LDA C+2" will result in a word-sized, absolute instruction, while the effective value is well inside single-byte range. Defining C as "0x02", on the other hand, would have resulted in a zero-page address mode instruction.)
If a label or symbol yet undefined is encountered in a value expression in pass #1, a word-size format will be automatically assumed and addresses will be reserved accordingly. If it is still undefined in pass #2, an error will be thrown. (In assignments to the program counter, however, an expression must resolve in pass #1 already, otherwise the assembly fails.)

Anonymous (Temporary) Labels

The assembler also supports anonymous labels for temporary branch and jump targets:
Just mark an instruction by "!" or ":" (empty label) and refer to this mark by either "!+" (or ":+") for the next anonymous label as a target or by "!-" (or ":-") for the previous one. You may refer to a target further away by repeating "+" or "-".  E.g., "BNE !--" branches to the second anonymous label before the insertion point. Mind that this counts anonymous labels and not addresses.

Example:

! START  LDA #0        ;first anonymous label
                       ;anonymous labels may precede a normal label
         LDX #0
!                      ;just mark this address
:        STA $1000,X   ;third label (same address), we may use ":" as well
         INX
         BNE !-        ;select the closest previous anonymous label
         JMP :---      ;jump back 3 anonymous labels (same as START)
                       ;again, ":" and "!" are synonymous

This will assemble to (with anonymous labels listed in a column of their own):

LOC   CODE         LABEL     INSTRUCTION

0800  A9 00      ! START     LDA #$00
0802  A2 00                  LDX #$00
0804             !
0804  9D 00 10   !           STA $1000,X
0807  E8                     INX
0808  D0 FA                  BNE $0804
080A  4C 00 08               JMP $0800

There is also support for an alternative grammar for anonymous targets, marking forward and backward references separately (like it's used by the ACME cross-assembler.)
Here, instructions used for forward references are marked by "+" and those to be used for backward references are marked by "", each contributing to a dedicated list of anonymous labels. These are then referred to as a target address as above, but without any leading "!" or ":". This is an important difference! Please mind that this is not just an alternative syntax, but comes with its own semantics.
(Hence, these targets are managed in separate lists. While not recommended, you could mix both grammars in a single source.)

* = $800
         BCS +         ;branch to exit
         LDY #3
         LDA $3000
-        CLC           ;outer loop
         ADC #5
         LDX #5
-        STA $1000,x   ;inner loop
         DEX
         BNE -
         DEY
         BNE --
+        RTS           ;forward target

LOC   CODE         LABEL         INSTRUCTION

0800                             * = $0800
0800  B0 13                      BCS $0815   ;branch to exit
0802  A0 03                      LDY #$03
0804  AD 00 30                   LDA $3000
0807  18         -               CLC         ;outer loop
0808  69 05                      ADC #$05
080A  A2 05                      LDX #$05
080C  9D 00 10   -               STA $1000,X ;inner loop
080F  CA                         DEX
0810  D0 FA                      BNE $080C
0812  88                         DEY
0813  D0 F2                      BNE $0807
0815  60         +               RTS         ;forward target

Restrictions:
This feature is only supported for branch instructions and absolute jump targets. An anonymous target must be the sole operand and cannot be used in an arithmetic expression.

Note: Anonymous labels are not listed in symbol tables.

Pragmas and Directives

Pragmas and directives start generally with a dot. For enhanced compatibility, an exclamation mark ("!") may be used as well, but will be normalized and show up as a dot in the listing of pass 2. The following examples use the dot for a general/canonical notation.

Directives for embedding data:

.BYTE 1, $02
embeds a single byte or a list of bytes at the current location. Lists are sperated by white-space and/or commas. (An optional "#", preceding any values, is ignored.) Values may be complex expressions, as well.
.DBYTE $12EF
embeds a double byte given in LLHH memory order (little-endian). This inserts the bytes $12 and $EF at the current location. ".DBYTE" takes a list of values, as well.
.WORD $12EF
embeds a word given in HHLL order (human readable, big-endian). This inserts the bytes $EF and $12 at the current location. (Also, use this when using previously defined labels and symbols in an expression.)
Again, values and expressions may be also provided as a list, as well.
.TEXT "Abc"
embeds a text literal (case-sensitive) using the current encoding, here always PETSCII.
.PETSCII "Abc"
embeds a text literal (case-sensitive) using Commodore 8-bit encoding.
.PETSCR "Abc"
embeds a text literal (case-sensitive) as Commodore 8-bit screen codes.

Supported synonyms:

.WO $12EF
synonym for ".WORD".
.BYT $01
synonym for ".BYTE".
.BY $01
synonym for ".BYTE".
.DB $02
synonym for ".BYTE" (Define Byte).
.DCB $03
synonym for ".BYTE" (Define Constant Byte).
.DBYT $12EF
synonym for ".DBYTE".
.PET "Abc"
synonym for ".PETSCII".
.SCREEN "Abc"
synonym for ".PETSCR".
.SCR "Abc"
synonym for ".PETSCR".
.TX "Abc"
synonym for ".TEXT".
.ASCII "Abc"
here the same as ".PETSCII".

Directives for aligning code or filling space:

.ALIGN $100
advances the program counter to the next multiple of the value provided (here, we align to the next memory page). Any gaps will be filled by zero. If no argument is provided ".ALIGN" aligns to the next even memory location.
.ALIGN $100 $EA
an optional second byte may specify a byte value to be used to fill any gaps (here $EA, "NOP", as used by most Commodre 8-bit machines).
.FILL $20 $EA
fill the next n bytes using the value provided by the second argument. If no second argument is providing, zero will be used as the fill-byte.
.REPEAT n
repeats the instruction or directive following this directive on the same line n times. An optional "STEP" parameter defines an increment to be applied to the repeat-counter on each iteration (default 1). The repeat-counter is accessibly as "R%".
E.g.,
.REPEAT 26 .BYTE 'A+R%
will fill the next 26 memory locations with the letters of the alphabet.
ODD_NUMS ;generate list of odd numbers
.REPEAT 5 STEP 2 .BYTE 1+R%
will fill the next 5 memory locations with the odd number series 1,3,5,7,9.

And this will fill the next 6 bytes by the sequence 0x00, 0x00, 0x02, 0x02, 0x04, 0x04:
.REPEAT 3 STEP 2 *=*+2 R% ;PC += 2, fill-byte R%

Other directives:

.END
ends the source code, any remaining text is ignored. (optional)
.NOLIST
switches listing output off (e.g, for data sections. This is also available as an option.)
.LIST
switches listing output on (default, also available as an option).
.SKIP
inserts a blank line in the listing (pass #2). This is mostly for compatibility.
.PAGE
inserts a blank line and a page number in the listing (pass #2). Any comment found at the head of the source code will be used as a title. Again, this is mostly for compatibility.
.DATA
any such directive is ignored (this merely exists to ensure compatibility with symbol tables used by this stand-alone disassembler.

Special directives for Commodore BASIC:

.BASICSTART
Generates a short BASIC program, consisting of optional REM-lines and a line with a "SYS" command, jumping to the next available address immediately following this BASIC text (which starts at 0x0401, the BASIC start address off the Commodore PET). The program counter will be advanced to this start address automatically.
Without any arguments, just a line with the SYS command will be generated, using the current year as the line number:
.BASICSTART
> 2023 SYS 1038
If a first, numeric argument is provided, this will be used as a line number for the line holding the SYS statement:
.BASICSTART 10
> 10 SYS 1038
If a string argument is provided, the assembler will generate a heading line with line number "0" and a REM statement using this string. If a list of strings (separated by white-space and optionally commas) is provided, or a string contains a line-break ("\n"), multiple REM lines will be generated:
.BASICSTART 2001 "*** a program ***", "(c) example.com"
> 0 REM *** A PROGRAM ***
> 1 REM (c) EXAMPLE.COM
> 2001 SYS 1084
(Mind that lower case letters will appear as upper-case and upper-case letters as graphics characters in standard PETSCII upper-case/graphics mode.)
.PETSTART
Same as ".BASICSTART" (see above).

Options

Options are a special set of directives switching the behavior of the assembler. Like other pragmas, they start with a dot (.) or an eclamation mark (!).

.OPT WORDA
switches automatic zero-page detection for address modes off. All addresses default to word-size and zero-page address modes must be specified manually by a leading asterisk ("*") or the byte extension (".b"). Use this for fine grain control and/or compatibility with old sources.
.OPT ZPGA
switches automatic zero-page detection to on (default).
.OPT ZPA
synonym to option "ZPGA".
.OPT ILLEGALS
enables support for “illegal” op-codes (see below).
.OPT LEGALS
disables support for “illegal” op-codes (default).
.OPT NOILLEGALS
synonym to option "LEGALS".
.OPT REDEF
allows symbols and labels to be redefined / reused.
.OPT NOREDEF
reuse of symbols is not allowed and will throw an error (default).
.OPT ASCII
set character encoding for .TEXT-directives and character literals
Here, this is only included for compatibility reasons and the encoding always defaults to PETSCII.
.OPT PETSCII
set the default character encoding to PETSCII.
.OPT PETSCR
set the default character encoding to Coomodore 8-bit screen characters.
.OPT SCREEN
synonym to option "PETSCR".
.OPT SCR
same as above.
.OPT NOLIST
switches listing output off.
.OPT LIST
witches listing output on (default).
.OPT LONGNAMES
disables the default 12 character limit for the significance of labels and identifiers for unlimited length.

Further, the following options (mostly used by MOS assemblers) are recognized for compatibility, but are otherwise ignored: XREF, NOXREF, COUNT, NOCOUNT, CNT, NOCNT, MEMORY, NOMEMORY, GENERATE, NOGENERATE.

Compatibility

This assembler is all about a quick assembly session without worrying too much about the specific syntax (starting with the format of the very first MOS cross-assembler and extending to more modern styles). As long as you do not require macros or conditional assembly, you should be able to throw about any style of source code at it.

E.g., the following examples are semantically identical and produce the same object code:

;MOS/traditional

* = $4000
TARGET = $20

       LDY *$20
LOOP   LDA $0080,Y
       ROL A
       STA (TARGET)Y
       DEY
       BNE LOOP
       RTS
.END
;modern style

.ORG 0x4000
TARGET EQU 0xC0

       LDY.b 0x20
LOOP:  LDA.w 0x80,Y
       ROL
       STA (TARGET),Y
       DEY
       BNE LOOP
       RTS
.END

Processing Example

Here is an example for a complete assembly of a short source:

Source code:

;fill a page with bytes,
;preserve program

*=$800

start
      ldx #offset
loop  txa
      sta start,x
      inx
      bne loop
      brk

;insert bytes here
offset=*-start
.end



Resulting object code:

0800: A2 0A 8A 9D 00 08 E8 D0
0808: F9 00
Listing:

pass 1

LINE  LOC          LABEL     PICT

   1               ;fill a page with bytes,
   2               ;preserve program

   4  0800                   * = $800
   6  0800         START
   7  0800                   LDX #OFFSET
   8  0802         LOOP      TXA
   9  0803                   STA START,X
  10  0806                   INX
  11  0807                   BNE LOOP
  12  0809                   BRK
  14                         ;insert bytes here
  15                         OFFSET = *-START
  16                         .END

symbols
 LOOP       $0802
 OFFSET       $0A
 START      $0800

pass 2

LOC   CODE         LABEL     INSTRUCTION

                   ;fill a page with bytes,
                   ;preserve program

0800                         * = $0800
0800               START
0800  A2 0A                  LDX #$0A
0802  8A           LOOP      TXA
0803  9D 00 08               STA $0800,X
0806  E8                     INX
0807  D0 F9                  BNE $0802
0809  00                     BRK
                             ;insert bytes here
                             OFFSET = $0A
                             .END

done (code: 0800..0809).

Illegal Opcodes

Support for "illegal" opcodes (undefined instructions) is enabled by the pragma ".OPT ILLEGALS".

The following mnemonics are implemented (supported synonyms given in parenthesis):

opc (synonyms) imp imm abs abX abY zpg zpX zpY inX inY
 
ALR (ASR) | 4B |
ANC | 0B |
ANC2 | 2B |
ANE (XAA) | 8B |
ARR | 6B |
DCP (DCM) | CF DF DB C7 D7 C3 D3 |
ISC (ISB, INS) | EF FF FB E7 F7 E3 F3 |
LAS (LAR, LAE) | BB |
LAX (ATX) | AB AF BF A7 B7 A3 B3 |
LXA (LAX imm) | AB |
RLA | 2F 3F 3B 27 37 23 33 |
RRA | 6F 7F 7B 67 77 63 73 |
SAX (AXS, AAX) | 8F 87 97 83 |
SBX | CB |
SHA (AXA, AHX) | 9F 93 |
SHX | 9E |
SHY (SAY, SYA) | 9C |
SLO (ASO) | 0F 1F 1B 07 17 03 13 |
SRE (LSE) | 4F 5F 5B 47 57 43 53 |
TAS (SHS, XAS) | 9B |
USBC | EB |
NOP | EA 80 0C 1C 04 14 |
DOP (SKB) | 80 04 14 |
TOP (SKW) | 0C 1C |
JAM (HLT, KIL) | 02 |

Notes:

Combining Assembler with BASIC

There are two general approaches to combing BASIC text with assembler code, one from the assembler side and another, probably more versatile one from the BASIC side of things. An important thing to note is that with both approaches you do not set the program counter, as its value will be determined by the assembler. Both methods produce a stand-alone program that can be saved or exported by other means as a single binary file.

BASIC Preambles in Assembler Sources

The first one is the pragma ".BASICSTART" already described above. You can add an optional line number for the final SYS statement and any number of strings, which will be prepended in REM statements starting at line number 0. The assembler will generate the required tokenized BASIC code with a SYS statement pointing to immediately after this short BASIC program and set the program counter for the assembly accordingly.

Example

Here’s another example (probably a bit more healthy and stable),

Assembler Code Appended to BASIC Sources

The other, probably more capable approach is appending an assembler source to a BASIC source file

This achieved by the special BASIC source tag "{ASM START}" (also "{ASM_START}" or "{ASMSTART}"). This will immediately terminate the processof tokenizing the BASIC program with the rest of the line ignored and will insert an ASCII sequence for the memory address following immediately after the BASIC program that is currently generated. Any source text following this will be assumed to be 6502 assembler code. (If no such code is found, a simple RTS instruction will be appended.)

The general idea is that you put this behind a "SYS" command to form a final statement that leads to the execution of the following machine language program.

Example

Here, the emulator behaves as it usually does, whenever we drop a BASIC source file onto it: the code will be transformed and loaded seamlessly, but you will be asked, whether you would want to review the assembler listing or not. In case there should be an error and the assembly fails, the listing will be presented anyways.
Mind that this can be used to push any configurations, etc, to dialogs written in BASIC, where this may be easier to handle than in assembler.

Notably, this can also be used without this assembler, just omit the assembler part, export the resulting program either as binary or as a hex-dump and use it in the assembler of your choice, replacing the final RTS (0x60) instruction by your code.

Another way of using this is by appending the assembler code to a final DATA statement, from where we can read the jump address to be used from anywhere in the BASIC program. Here, rather than using BASIC as a means to start our assembler program, we use the assembler to provide some fast routine(s) for BASIC.

Example
(Here, we deposite a screen code to be used for filling the screen in address 255 before calling our routine. This is either screen code 83 for a heart character or 211 for an inverted heart character. We use this to blink the screen three times and finally clear it by filling it using a space character. Mind how the call address is read into variable A from the final DATA statement.)

Using “Disassemble Program” from the emulator’s “Utils/Export” menu, we get the result of our combined BASIC and assembler efforts as in memory:

This mechanism can be used to integrate multiple machine language routines, but you will have to add any offset to the base address returned in the final DATA statement on your own.

Note: I’m not aware that this has been done before, so this could be well a genuine invention, since this requires some kind of engine capable of handling both BASIC source text and assembler code at once.

Fixed Start Addresses with BASIC Sources

In case you really want to use a fixed start address for your routine, you may either put the "{ASMSTART}" behind a dummy command or best in a "REM" statement. (Mind that hiding it in a string won’t work, as any such text is ignored by the parser.)

In the following example, the space between $0475, the end of the BASIC program, and $0480 (decimal 1152), the explicitly provided start of the 6502 code, will be filled by zero-bytes:

The various parts still form a homogeneous program as indicated by the system pointers TXTTAB (start of BASIC text) = $0401 and VARTAB (start of BASIC variables) = $0494.
(Use ”Utils/Export”“Show BASIC System Pointers” to view these pointers.)

Rationale — General Considerations

I’ve always looked with respectful envy at those BASIC dialects featuring in-line assembly, like BBC BASIC. Could we have similar for Commodore BASIC? I’d argue that this isn’t the way to go about this on the Commodore 8-bits, since the BASIC runtime shuffles variables around in memory, as new variables are encountered. This is especially true for subscripted variables (arrays), which are often used for a scheme like this, and there is no such thing as a stable location in memory.

The Commodore way of doing this — at least for me — is appending any machine language code to the BASIC program, but including it in the program range as set by the two system pointers TXTTAB and VARTAB, the former holding the start address of the tokenized BASIC text in memory (usually 0x401 on the PET), the latter providing the start of the memory available for variables, just after the last byte of BASIC text. This way, the machine language part is still an integral part of the program and won’t be affected by the runtime.

However, mind that, should your machine language routine(s) make use of some tables, you’d better reserve the space required. Since, if you were merely addressing some space beyond your program blindly, this potentially clashes with any variables managed by the BASIC runtime.

Finally, some useful addresses (new ROM / BASIC 2.0)

; PET 2001 system addresses (ROM 2.0)

USRPOK   = $00  ;$4C constant (JMP instruction)
USRADD   = $01  ;USR function addr. lo, hi ($02)
COUNT    = $05  ;BASIC input buffer pointer ("#" subscript)
VAUYP    = $07  ;variable flag, type: $FF=string, $00=numeric
INTFLG   = $08  ;integer flag: $80=integer, $00=floating point
GARBFL   = $09  ;flag for DATA, LIST quote, memory
SUBFLG   = $0A  ;flag for subscript, FNx
INPFLG   = $0B  ;input/read flag: $00=input, $40=get, $98=read
TANSGN   = $0C  ;flag ATN sign, comparision evaluation
LINNUM   = $11  ;BASIC integer address for SYS, GOTO, etc (lo, hi)
INDEX    = $1F  ;pointer for number transfer (lo, hi)
RESHO    = $23  ;product staging area for multiplication
TXTTAB   = $28  ;pointer: start of BASIC text in memory
VARTAB   = $2A  ;pointer: end of BASIC, start of variables
ARYTAB   = $2C  ;pointer: end of variables, start of arrays
STREND   = $2E  ;pointer: end of arrays
FRETOP   = $30  ;pointer: top of memory, bottom of strings
FRESPC   = $32  ;utility string pointer
MEMSIZ   = $34  ;pointer: limit of BASIC memory
CURLIN   = $36  ;current BASIC line number
OLDLIN   = $38  ;previous BASIC line number
OLDTXT   = $3A  ;pointer to BASIC statement for CONT
DATLIN   = $3C  ;line number, current DATA item
DATPTR   = $3E  ;pointer to current DATA item
INPPTR   = $40  ;input vector
VARNAM   = $42  ;current variable name
VARPNT   = $44  ;current variable address
FORPNT   = $46  ;variable pointer for FOR/NEXT
TEMPF1   = $54  ;misc numeric storage area
TEMPF2   = $59  ;misc numeric storage area
FACEXP   = $5E  ;floating point accumulator 1: exponent
FACHO    = $5F  ;floating point accumulator 1: mantissa (4 bytes)
FACSGN   = $63  ;floating point accumulator 1: sign
SGNFLG   = $64  ;series evaluation constant pointer
BITS     = $65  ;accumulator hi-order propagation word
ARGEXP   = $66  ;floating point accumulator 2: exponent
ARGHO    = $67  ;floating point accumulator 2: mantissa (4 bytes)
ARGSGN   = $6B  ;floating point accumulator 2: sign
ARISGN   = $6C  ;sign comparison (primary vs. secondary)
FACOV    = $6D  ;low-order rounding byte for FAC #1
FBUFPT   = $6E  ;cassette buffer length/series pointer
CHRGET   = $70  ;subroutine to get the next character
CHRGOT   = $76  ;character found by CHARGET
TXTPTR   = $77  ;pointer to source text for CHARGET
RNDX     = $88  ;round storage and work area
TIME     = $8D  ;jiffy clock in 1/60 sec for TI and TI$ (lo, hi)
CINV     = $90  ;IRQ vector (lo, hi), hardware interrupt
CBINV    = $92  ;BRK interrupt vector (lo, hi)
NMINV    = $94  ;NMI interrupt vector (lo, hi)
STATUS   = $96  ;status word ST
LSTX     = $97  ;which key? matrix coordinates of last key down: row/col, $FF=no key
SFDX     = $98  ;shift key: 1=pressed
STKEY    = $9B  ;last read from keyboard scan: STOP and RVS flags
SVXT     = $9C  ;timing constant buffer
VERCK    = $9D  ;flag: LOAD=0, VERIFY=1
NDX      = $9E  ;index into keyboard buffer
RVS      = $9F  ;screen reverse flag
C3PO     = $A0  ;IEEE output flag: $FF=character waiting
INDX     = $A1  ;pointer: end-of-line for input
LXSP     = $A3  ;cursor log (row, col)
BSOUR    = $A5  ;IEEE output character buffer
BLNSW    = $A7  ;flag: 0=flashing cursor, else no cursor
BLNCT    = $A8  ;countdownfor cursor timing
GDBLN    = $A9  ;character under cursor
BLNON    = $AA  ;cursor blink flag
SYNO     = $AB  ;EOT bit received
NXTBIT   = $AB  ;-- " --
CRSW     = $AC  ;input from screen/input from keyboard
LDTND    = $AE  ;number of open files, pointer into file table
DFLTN    = $AF  ;input device (normally 0)
DFLTO    = $B0  ;output CMD device (normally 3)
PRTY     = $B1  ;tape character parity
DPSW     = $B2  ;byte received flag
BUFPNT   = $BB  ;tape buffer #1 count ($BC: tape buffer #2 count)
INBIT    = $BD  ;write leader count, read pass 1/pass 2
BITCI    = $BE  ;write new byte, read error flag
RINONE   = $BF  ;write start bit, read bit seq error
FNMIDX   = $C0  ;pass 1 error log pointer
PTR1     = $C0  ;-- " --
PTR2     = $C1  ;pass 2 error correction pointer
RIDATA   = $C2  ;current function: 0=scan, $01-$0F=count, $40=load, $80=end
RIPRTY   = $C3  ;read checksum, write leader length
PNT      = $C4  ;pointer to screen line (lo, hi)
PNTR     = $C6  ;column position of cursor on above line
SAL      = $C7  ;utility pointer for tape buffer, scrolling
EAL      = $C9  ;tape end address / end of current program
QTSW     = $CD  ;flag for quote mode: 0=direct mode, else programmed cursor
BITTS    = $CE  ;timer 1 enabled for tape read, 0=disabled
FNLEN    = $D1  ;number of characters in file name
LA       = $D2  ;current logical file number
SA       = $D3  ;current secondary address, or R/W command
FA       = $D4  ;current device number
LNMX     = $D5  ;line length (39 or 79) for screen
TAPE1    = $D6  ;start of tape buffer (address lo, hi)
TBLX     = $D8  ;current line with cursor
DATAX    = $D9  ;last key input, buffer checksum, bit buffer
FNADR    = $DA  ;pointer to current file name
INSRT    = $DC  ;number of keyboard INSERTs outstanding
ROPRTY   = $DD  ;write shift word / receive input character
FSBLK    = $DE  ;number of blocks remaining for read/write
MYCH     = $DF  ;serial buffer word
LDTB1    = $E0  ;screen line table, hi order addr. and line wrap
CAS1     = $F9  ;interrupt driver flag for cassette #1 status switch
CAS2     = $FA  ;interrupt driver flag for cassette #2 status switch
STAL     = $FB  ;tape start address (lo, hi)
MEMUSS   = $FD  ;pointer for monitor (MLM)

BAD      = $0100  ;start of processor stack, tape error log
BUF      = $0200  ;MLM area
TBUFFR   = $027A  ;tape (cassette) buffer
TIMOUT   = $03FC  ;

; kernal addresses
OPEN     = $FFC0
CLOSE    = $FFC3
CHKIN    = $FFC6  ;set input device
CHKOUT   = $FFC9  ;set output device
CLRCHN   = $FFCC  ;restor I/O
CHRIN    = $FFCF  ;read a byte from input
CHROUT   = $FFD2  ;write a byte to output
LOAD     = $FFD5
SAVE     = $FFD8
VERIFY   = $FFDB
SYS      = $FFDE
STOP     = $FFE1  ;check STOP key (affects A only, zero-flag set: STOP pressed)
GETIN    = $FFE4  ;get a character
CLALL    = $FFE7  ;abort all I/O
INCTIME  = $FFEA  ;update clock, scan and store key

; hardware addresses
VIDEO    = $8000

PIA1_PA	 = $E810
PIA1_CRA = $E811
PIA1_PB	 = $E812
PIA1_CRB = $E813
PIA2_PA	 = $E820
PIA2_CRA = $E821
PIA2_PB	 = $E822
PIA2_CRB = $E823

VIA_DRB	 = $E840
VIA_DRA	 = $E841
VIA_DDRB = $E842
VIA_DDRA = $E843
VIA_T1CL = $E844
VIA_T1CH = $E845
VIA_T1LL = $E846
VIA_T1LH = $E847
VIA_T2CL = $E848
VIA_T2CH = $E849
VIA_SR	 = $E84A
VIA_ACR	 = $E84B
VIA_PCR	 = $E84C
VIA_IFR	 = $E84D
VIA_IER	 = $E84E
VIA_ANH	 = $E84F