Guide to the 6502 Assembler of the PET 2001 Emulator
- General Description
- Theory of Operation
- Basic Syntax
- Value Expressions
- The Program Counter
- Relative Offset Literals
- Labels and Symbols
- Anonymous (Temporary) Labels
- Pragmas and Directives
- Options
- Compatibility
- Illegal Opcodes
- Combining Assembler with BASIC
- Rationale — General Considerations
- Some Useful Addresses
General Description
The PET 2001 Emulator features a built-in assembler that is modelled closely after the original MOS 6502 assembler from the 1970s, but has been extended to support a variety of common syntax flavors. Still, it should accept and process sucessfully any original MOS source. It is based on the assembler found here and adjusted for the needs and requirements of the PET. Moreover, it provides facilities to compile stand-alone programs that can be loaded, started, and distributed on their own (see below, there are also a few sample files to start playing around with).
Please mind that that this is still a relatively simple assembler, which cames with a few restrictions and limitations:
- There is no support for macros or conditional assembly.
- As the principal lexer is based on symbols, there must be some separating white space between symbols, but there must be no white-space in any expressions. Otherwise, they will not be recognized as symbols or operand values.
- As the assembler uses its own encoding engine, there is (currently) no support for PETSCII markups as they are supported for BASIC sources.
Still, it should be good for some quick experiments, without having to worry about another syntax flavor. When it comes to complex projects, you will probably prefer an assembler and/or IDE of your own choice anway.
The assembler is fully compatible with the 6502 online assembler found at www.masswerk.at/6502/assembler.html and any source code generated by the associated disassembler. However, there is no support for external symbol tables. Please provide any symbol definitions in your source. (Some useful definitions can be found at the very end of this page.)
A general description of 6502 opcodes and what these do can be found at here: 6502 Instruction Set.
If you are already familiar with this particular assembler, there are just a few changes:
- Encoding formats have been adjusted for the PET, so there are really just two formats, PETSCII and screen codes. While other encoding options are still present, they always refer to either PETSCII or PET screen codes.
- There are additional, synonymous pragmas and options for PET screen encoding, namely
SCR
andSCREEN
(since any sources apply to the PET only.) - Similarly, there is
.BASICSTART
as a maybe more memorable and telling synonym for.PETSTART
. - Any special cases related to the BBC Micro have been stripped.
Operations
The assembler is invoked by either mounting an assembler source via the mount dialog or by dropping an assembler source on the emulated screen. An assembler source is a text file with a file name extension of either ".asm
", ".a65
", ".a
", ".src
", or just ".s
".
(The assembler may be also invoked from within a BASIC source, which is covered below in the section on integration with BASIC.)
The assembler is a simple 2-pass assembler, where a first pass determines instruction lengths and addresses, while the second pass resolves final values and generates the object code (the machine language program).
As the assembler has processed the suplied source, it will present a listing in a special dialog. In this listing, the first pass shows how the assembler "sees" the source, while the second pass represents what the assembler actually resolves and lists this in a normalized format, which is close to the original MOS notation.
The assembler will always fail on the first error, since — it’s the 1970s! :-)
If the assembly succeeds, you will be presented with some options below the listing, regarding how to proceed:
- “Load Code As Program” will load the resulting code just as if it were a binary PRG file. The associated checkbox gives a choice, whether or not to reset the virtual machine before loading the program. (Somewhat self-referentially, any other information or state of the emulated PET will be lost, if you choose to reset.)
- “Inject Code Into Memory” will load the resulting code directly into memory and leave the machine in the same state as it is now. The associated checkbox provides an option to additionally adjust the BASIC system pointers to the start and end of the code. Mind that any variables will be lost, when doing so. (This will option will be disabled, if the emulator detects that the object code is outside the range of safe user RAM, from 0x041 to the top of RAM.)
- “Export Code as PRG File” does exactly what it suggest, namely generating a link with the object code as a binary file that you can download to your local computer. For this, it will automatically prepend the start address as the two first bytes of this file, as it is required by the PRG file format common to all Commodore 8/bit machines.
- Finally, “Do Nothing” does exactly this, namely closing the dialog without further action. Maybe you just wanted to test something or review the assembly process?
To further interact with your code, you may refer to the debugger of the emulator, available by its icon on the top right of the window. Unlock the lock icon to the right of the register display to edit registers and flags. (The debugger is only interactive when the emulated PET is halted.)
Mind that the PET also comes with a built-in machine monitor in ROM, which is normally hooked to the interrupt vector (IRQ). Therefore, executing any zero-byte (a BRK
in 6502 machine code) will invoke the monitor. Such a zero-byte should be always present at address 0x400
, just before the start of any BASIC text, and can be called by the memorable BASIC command "SYS 1024
".
Basic Syntax
The assembler supports a variety of common 6502 assembler syntax styles. Mind that there must be a seperating white space between labels, opcodes, and any operands. Operands, on the other hand, must not contain any white space. Operands may be simple numeric values, defined symbols, labels, or complex expressions.
Compare the 6502 Instruction Set for instruction details and addressing modes.
Here, we use "HHLL" to represent a word-sized 16-bit operand, "LL" for a single-byte addresses, and "BB" for any other byte-sized operands. (In actuality, these may be any simple or complex expressions.)
- CLC
- immediate, no operand.
- ROR A
- instruction with accumulator as the operand.
- ROR
- same as above. "A" is optional and may be omitted.
- LDA #BB
- immediate mode, loading the literal value.
- LDA HHLL
- absolute, loads the value from the provided memory address.
- LDA HHLL,X
- absolute, X-indexed.
- LDA HHLL,Y
- absolute, Y-indexed.
- LDA LL
- zero-page address mode (with automatic address mode detection).
- LDA *LL
- forced zero-page address mode in the style of the original MOS assembler.
- LDA.b LL
- forced zero-page address mode, modern byte-size notation.
- LDA.w LL
- forced absolute address mode, modern word-size notation.
- LDA LL,X
- zero-page, X-indexed.
- LDA LL,Y
- zero-page, Y-indexed.
- LDA (LL,X)
- X-indexed, indirect.
- LDA (LL),Y
- indirect, Y-indexed.
- LDA (LL)Y
- indirect, Y-indexed, old MOS format (no comma).
- JMP (HHLL)
- indirect address.
- BEQ HHLL
- relative addresses (-127 ≤ offset ≤ +127) are computed from absolute target addresses.
Supported synonyms:
- LDA.byte LL
- forced zero-page address mode.
- LDA.by LL
- as above.
- LDA.word LL
- forced absolute address mode, word-size.
- LDA.wo LL
- as above.
- LDA+1 LL
- forced zero-page address mode (like ACME assembler).
- LDA+2 LL
- forced absolute address mode, word-size (like ACME assembler).
Comments:
- ;comment
- comments start with a semicolon and extend to the end of the line.
The assembler is generally case-insensitive, with the exception of strings and character literals.
Generally, there may be just a single instruction on each line.
Values and Numeric Representations
The assembler supports a variety of number formats:
- $12EF
- hexadecimal
[0-9A-F]
. - &12EF
- hexadecimal.
- 0x12EF
- hexadecimal.
- 1289
- decimal
[0-9]
. - 0d1289
- decimal.
- @1267
- octal
[0-7]
. - 0o1267
- octal.
- 01267
- octal.
- %1010101
- binary
[01]
. - 0b1010101
- binary.
- 'A
- character value of "A" (
$41
in ASCII)
Value Expressions
Anywhere a value mayn occur this may be a complex expression as well. Expressions may include addition, subtraction, multiplication, divisions, and unary minus (+-*/
and -
).
There are also the two special unary byte operators "<
" >
":
- <$12EF
- low-byte value
($EF)
. - >$12EF
- high-byte value
($12)
.
Expressions are evaluated strictly from left to right, without precedence, but may be grouped using round or square brackets ((
…)
, [
…]
).
The use of square brackets is recommended, though, as round brackets can be ambiguous in the context of certain 6502 instructions and their syntax.
- 1+2
- 3
- 2*3
- 6
- 1+2*3
- 9 (
1+2 => 3, 3*3 => 9
) - 1+[2*3]
- 7 (
[2*3] => 6, 1+6 => 7
) - 1+(2*3)
- same as above
Expressions may include defined symbols and instruction labels.
There must not be any white space in an expression!
The Program Counter
The program counter (also PC or location counter) represents the memory address of the current instruction. Outside of an instruction, it represents the address, where the next instruction will be inserted. There are several ways to address the program counter:
- * = $1234
- the asterisk represents the "native" (MOS) format. Assigning to it sets the program counter.
- BEQ *+2
- the asterisk may be used in expressions as well.
- * = *+4 $EA
- when assigning to the program counter, an optional second argument specifies a fill-byte to be applied to any gaps. Here, we advance the program counter by 4 locations and fill the gap with
NOP
instructions ($EA
). - P% = P%+2
- the symbol "
P%
" may be used synonymously to the asterisk anywhere the former may occur. - .ORG $1234
- the more modern-style directive "
.ORG
" may be used for setting PC, as well. (However, you can't use ist in an expression.) - .ORG = $1234
- you may use
.ORG
in assignment style, as well. - .ORG EQU $1234
- generally, "
EQU
" may be used as the assignment operator, as well.
(Mind that there must be white-space around "EQU
" in order for it to be recognized as a token, which is not a requirement with "=
".) - .RORG $1234
- synonym to "
.ORG
" (in many assemblers you are not allowed to alter the origin set by ".ORG
" and this is meant to provide compatibility.) - BEQ .+2
- in expressions, a dot (
.
) may be used synonymously for the asterisk. However, you can not assign to it. (Strictly speaking, this is local context, but, while the assembler doesn't implement macros, it's the same anyway.)
Relative Offset Literals
As an extension to the standard syntax the assembler also allows relative offset literals for branch instructions (the relative offset to PC+2
as in machine code, instead of the usual target address) with the "#
" prefix (same as immediate mode):
- BCS #0
- equivalent to "
BCS *+2
", results in "B0 00
" ($B0
: instruction code forBCS
). - BCS #4
- equivalent to "
BCS *+6
", results in "B0 04
". - BCS #-4
- equivalent to "
BCS *-2
", results in "B0 FC
" ($FC
:-4
in two's complement). - BCS #$FC
- as above, results in "
B0 FC
". - BCS #6-(2*5)
- expressions allowed, equivalent to "
BC #-4
", results in "B0 FC
".
Relative offset literals are automatically constrained to single-byte values in the range of $00…$FF
:
- BCS #$104
- results in "
B0 04
". - BCS #$1FC
- results in "
B0 FC
".
Labels and Symbols
Instruction labels and defined symbols start with a letter character or underscore and may contain, letters, digits, or the undesrcore. Please mind that, for compatibility with older and historic sources, only the first 12 characters are significant. Use option "LONGNAMES
" (see below) to disable this default.
Instruction labels may precede an instruction or may be the only entity on a line. They may be optionally end in a trailing colon. Labels may be used anywhere in an expression:
- LOOP LDA A,X
- declares the instruction label
LOOP
. - LOOP: LDA A,X
- labels may end in a colon (optional).
- BEQ LOOP
- using a label as an address value.
Optional "@
" prefix for further compatibility:
- @LOOP LDA,X
- Labels may be declared an optional "
@
" prefix. - @LOOP: LDA,X
- Same as above, but using a trailing colon.
- BNE @LOOP
- Labels may be referred to using an optional "
@
" prefix.
Symbols are declared by an assignment and may be used as values anywhere.
- TEST = $2000
- declares the symbol “
TEST
”. - TEST EQU $2000
- “
EQU
” may be used synonymously. - C = *+[TEST*2]
- assignments may be complex expressions.
Mind that — like with most assemblers — you may not redefine or reuse any symbols or labels per default. However, you may change this behavior by setting option "REDEF
" (see below).
Note on hexadecimal values and automatic zero-page mode
Any numeric values provided by at least 4 hexadecimal digits, where the two leading digits are zeros, will be considered to be of word-size and will effect absolute address modes, when used in ambiguuos context. This "word-size tainting" also propagates to expressions and assignments. (E.g., defining the symbol "C
" by "C = 0x0002
" and using this in "LDA C+2
" will result in a word-sized, absolute instruction, while the effective value is well inside single-byte range. Defining C
as "0x02
", on the other hand, would have resulted in a zero-page address mode instruction.)
If a label or symbol yet undefined is encountered in a value expression in pass #1, a word-size format will be automatically assumed and addresses will be reserved accordingly. If it is still undefined in pass #2, an error will be thrown. (In assignments to the program counter, however, an expression must resolve in pass #1 already, otherwise the assembly fails.)
Anonymous (Temporary) Labels
The assembler also supports anonymous labels for temporary branch and jump targets:
Just mark an instruction by "!
" or ":
" (empty label) and refer to this mark by either "!+
" (or ":+
") for the next anonymous label as a target or by "!-
" (or ":-
") for the previous one. You may refer to a target further away by repeating "+
" or "-
". E.g., "BNE !--
" branches to the second anonymous label before the insertion point. Mind that this counts anonymous labels and not addresses.
Example:
! START LDA #0 ;first anonymous label ;anonymous labels may precede a normal label LDX #0 ! ;just mark this address : STA $1000,X ;third label (same address), we may use ":" as well INX BNE !- ;select the closest previous anonymous label JMP :--- ;jump back 3 anonymous labels (same as START) ;again, ":" and "!" are synonymous
This will assemble to (with anonymous labels listed in a column of their own):
LOC CODE LABEL INSTRUCTION 0800 A9 00 ! START LDA #$00 0802 A2 00 LDX #$00 0804 ! 0804 9D 00 10 ! STA $1000,X 0807 E8 INX 0808 D0 FA BNE $0804 080A 4C 00 08 JMP $0800
There is also support for an alternative grammar for anonymous targets, marking forward and backward references separately (like it's used by the ACME cross-assembler.)
Here, instructions used for forward references are marked by "+
" and those to be used for backward references are marked by "−
", each contributing to a dedicated list of anonymous labels. These are then referred to as a target address as above, but without any leading "!
" or ":
". This is an important difference! Please mind that this is not just an alternative syntax, but comes with its own semantics.
(Hence, these targets are managed in separate lists. While not recommended, you could mix both grammars in a single source.)
* = $800 BCS + ;branch to exit LDY #3 LDA $3000 - CLC ;outer loop ADC #5 LDX #5 - STA $1000,x ;inner loop DEX BNE - DEY BNE -- + RTS ;forward target LOC CODE LABEL INSTRUCTION 0800 * = $0800 0800 B0 13 BCS $0815 ;branch to exit 0802 A0 03 LDY #$03 0804 AD 00 30 LDA $3000 0807 18 - CLC ;outer loop 0808 69 05 ADC #$05 080A A2 05 LDX #$05 080C 9D 00 10 - STA $1000,X ;inner loop 080F CA DEX 0810 D0 FA BNE $080C 0812 88 DEY 0813 D0 F2 BNE $0807 0815 60 + RTS ;forward target
Restrictions:
This feature is only supported for branch instructions and absolute jump targets. An anonymous target must be the sole operand and cannot be used in an arithmetic expression.
Note: Anonymous labels are not listed in symbol tables.
Pragmas and Directives
Pragmas and directives start generally with a dot. For enhanced compatibility, an exclamation mark ("!
") may be used as well, but will be normalized and show up as a dot in the listing of pass 2. The following examples use the dot for a general/canonical notation.
Directives for embedding data:
- .BYTE 1, $02
- embeds a single byte or a list of bytes at the current location. Lists are sperated by white-space and/or commas. (An optional "
#
", preceding any values, is ignored.) Values may be complex expressions, as well. - .DBYTE $12EF
- embeds a double byte given in LLHH memory order (little-endian). This inserts the bytes
$12
and$EF
at the current location. ".DBYTE
" takes a list of values, as well. - .WORD $12EF
- embeds a word given in HHLL order (human readable, big-endian). This inserts the bytes
$EF
and$12
at the current location. (Also, use this when using previously defined labels and symbols in an expression.)
Again, values and expressions may be also provided as a list, as well. - .TEXT "Abc"
- embeds a text literal (case-sensitive) using the current encoding, here always PETSCII.
- .PETSCII "Abc"
- embeds a text literal (case-sensitive) using Commodore 8-bit encoding.
- .PETSCR "Abc"
- embeds a text literal (case-sensitive) as Commodore 8-bit screen codes.
Supported synonyms:
- .WO $12EF
- synonym for "
.WORD
". - .BYT $01
- synonym for "
.BYTE
". - .BY $01
- synonym for "
.BYTE
". - .DB $02
- synonym for "
.BYTE
" (Define Byte). - .DCB $03
- synonym for "
.BYTE
" (Define Constant Byte). - .DBYT $12EF
- synonym for "
.DBYTE
". - .PET "Abc"
- synonym for "
.PETSCII
". - .SCREEN "Abc"
- synonym for "
.PETSCR
". - .SCR "Abc"
- synonym for "
.PETSCR
". - .TX "Abc"
- synonym for "
.TEXT
". - .ASCII "Abc"
- here the same as "
.PETSCII
".
Directives for aligning code or filling space:
- .ALIGN $100
- advances the program counter to the next multiple of the value provided (here, we align to the next memory page). Any gaps will be filled by zero. If no argument is provided "
.ALIGN
" aligns to the next even memory location. - .ALIGN $100 $EA
- an optional second byte may specify a byte value to be used to fill any gaps (here
$EA
, "NOP
", as used by most Commodre 8-bit machines). - .FILL $20 $EA
- fill the next n bytes using the value provided by the second argument. If no second argument is providing, zero will be used as the fill-byte.
- .REPEAT n
- repeats the instruction or directive following this directive on the same line n times. An optional "
STEP
" parameter defines an increment to be applied to the repeat-counter on each iteration (default1
). The repeat-counter is accessibly as "R%
".
E.g.,.REPEAT 26 .BYTE 'A+R%
will fill the next 26 memory locations with the letters of the alphabet.ODD_NUMS ;generate list of odd numbers
will fill the next 5 memory locations with the odd number series 1,3,5,7,9.
.REPEAT 5 STEP 2 .BYTE 1+R%
And this will fill the next 6 bytes by the sequence0x00
,0x00
,0x02
,0x02
,0x04
,0x04
:.REPEAT 3 STEP 2 *=*+2 R% ;PC += 2, fill-byte R%
Other directives:
- .END
- ends the source code, any remaining text is ignored. (optional)
- .NOLIST
- switches listing output off (e.g, for data sections. This is also available as an option.)
- .LIST
- switches listing output on (default, also available as an option).
- .SKIP
- inserts a blank line in the listing (pass #2). This is mostly for compatibility.
- .PAGE
- inserts a blank line and a page number in the listing (pass #2). Any comment found at the head of the source code will be used as a title. Again, this is mostly for compatibility.
- .DATA
- any such directive is ignored (this merely exists to ensure compatibility with symbol tables used by this stand-alone disassembler.
Special directives for Commodore BASIC:
- .BASICSTART
- Generates a short BASIC program, consisting of optional REM-lines and a line with a "
SYS
" command, jumping to the next available address immediately following this BASIC text (which starts at0x0401
, the BASIC start address off the Commodore PET). The program counter will be advanced to this start address automatically.
Without any arguments, just a line with theSYS
command will be generated, using the current year as the line number:.BASICSTART > 2023 SYS 1038
If a first, numeric argument is provided, this will be used as a line number for the line holding theSYS
statement:.BASICSTART 10 > 10 SYS 1038
If a string argument is provided, the assembler will generate a heading line with line number "0
" and aREM
statement using this string. If a list of strings (separated by white-space and optionally commas) is provided, or a string contains a line-break ("\n
"), multipleREM
lines will be generated:.BASICSTART 2001 "*** a program ***", "(c) example.com" > 0 REM *** A PROGRAM *** > 1 REM (c) EXAMPLE.COM > 2001 SYS 1084
(Mind that lower case letters will appear as upper-case and upper-case letters as graphics characters in standard PETSCII upper-case/graphics mode.) - .PETSTART
- Same as "
.BASICSTART
" (see above).
Options
Options are a special set of directives switching the behavior of the assembler. Like other pragmas, they start with a dot (.
) or an eclamation mark (!
).
- .OPT WORDA
- switches automatic zero-page detection for address modes off. All addresses default to word-size and zero-page address modes must be specified manually by a leading asterisk ("
*
") or the byte extension (".b
"). Use this for fine grain control and/or compatibility with old sources. - .OPT ZPGA
- switches automatic zero-page detection to on (default).
- .OPT ZPA
- synonym to option "
ZPGA
". - .OPT ILLEGALS
- enables support for “illegal” op-codes (see below).
- .OPT LEGALS
- disables support for “illegal” op-codes (default).
- .OPT NOILLEGALS
- synonym to option "
LEGALS
". - .OPT REDEF
- allows symbols and labels to be redefined / reused.
- .OPT NOREDEF
- reuse of symbols is not allowed and will throw an error (default).
- .OPT ASCII
- set character encoding for “
.TEXT
”-directives and character literals
Here, this is only included for compatibility reasons and the encoding always defaults to PETSCII. - .OPT PETSCII
- set the default character encoding to PETSCII.
- .OPT PETSCR
- set the default character encoding to Coomodore 8-bit screen characters.
- .OPT SCREEN
- synonym to option "
PETSCR
". - .OPT SCR
- same as above.
- .OPT NOLIST
- switches listing output off.
- .OPT LIST
- witches listing output on (default).
- .OPT LONGNAMES
- disables the default 12 character limit for the significance of labels and identifiers for unlimited length.
Further, the following options (mostly used by MOS assemblers) are recognized for compatibility, but are otherwise ignored: XREF
, NOXREF
, COUNT
, NOCOUNT
, CNT
, NOCNT
, MEMORY
, NOMEMORY
, GENERATE
, NOGENERATE
.
Compatibility
This assembler is all about a quick assembly session without worrying too much about the specific syntax (starting with the format of the very first MOS cross-assembler and extending to more modern styles). As long as you do not require macros or conditional assembly, you should be able to throw about any style of source code at it.
E.g., the following examples are semantically identical and produce the same object code:
;MOS/traditional * = $4000 TARGET = $20 LDY *$20 LOOP LDA $0080,Y ROL A STA (TARGET)Y DEY BNE LOOP RTS .END |
;modern style .ORG 0x4000 TARGET EQU 0xC0 LDY.b 0x20 LOOP: LDA.w 0x80,Y ROL STA (TARGET),Y DEY BNE LOOP RTS .END |
Processing Example
Here is an example for a complete assembly of a short source:
Source code:;fill a page with bytes, ;preserve program *=$800 start ldx #offset loop txa sta start,x inx bne loop brk ;insert bytes here offset=*-start .end Resulting object code: 0800: A2 0A 8A 9D 00 08 E8 D0 0808: F9 00 |
Listing:pass 1 LINE LOC LABEL PICT 1 ;fill a page with bytes, 2 ;preserve program 4 0800 * = $800 6 0800 START 7 0800 LDX #OFFSET 8 0802 LOOP TXA 9 0803 STA START,X 10 0806 INX 11 0807 BNE LOOP 12 0809 BRK 14 ;insert bytes here 15 OFFSET = *-START 16 .END symbols LOOP $0802 OFFSET $0A START $0800 pass 2 LOC CODE LABEL INSTRUCTION ;fill a page with bytes, ;preserve program 0800 * = $0800 0800 START 0800 A2 0A LDX #$0A 0802 8A LOOP TXA 0803 9D 00 08 STA $0800,X 0806 E8 INX 0807 D0 F9 BNE $0802 0809 00 BRK ;insert bytes here OFFSET = $0A .END done (code: 0800..0809). |
Illegal Opcodes
Support for "illegal" opcodes (undefined instructions) is enabled by the pragma ".OPT ILLEGALS
".
The following mnemonics are implemented (supported synonyms given in parenthesis):
opc (synonyms) imp imm abs abX abY zpg zpX zpY inX inY |
ALR (ASR) | 4B | |
ANC | 0B | |
ANC2 | 2B | |
ANE (XAA) | 8B | |
ARR | 6B | |
DCP (DCM) | CF DF DB C7 D7 C3 D3 | |
ISC (ISB, INS) | EF FF FB E7 F7 E3 F3 | |
LAS (LAR, LAE) | BB | |
LAX (ATX) | AB AF BF A7 B7 A3 B3 | |
LXA (LAX imm) | AB | |
RLA | 2F 3F 3B 27 37 23 33 | |
RRA | 6F 7F 7B 67 77 63 73 | |
SAX (AXS, AAX) | 8F 87 97 83 | |
SBX | CB | |
SHA (AXA, AHX) | 9F 93 | |
SHX | 9E | |
SHY (SAY, SYA) | 9C | |
SLO (ASO) | 0F 1F 1B 07 17 03 13 | |
SRE (LSE) | 4F 5F 5B 47 57 43 53 | |
TAS (SHS, XAS) | 9B | |
USBC | EB | |
NOP | EA 80 0C 1C 04 14 | |
DOP (SKB) | 80 04 14 | |
TOP (SKW) | 0C 1C | |
JAM (HLT, KIL) | 02 | |
Notes:
NOP
: There are severalNOP
instructions, but these are the ones commonly used.DOP
: "doubleNOP
" (single-byte operand/address, 2 bytes total).TOP
: "tripleNOP
" (word-address, 3 bytes total).JAM
: freezes the CPU (again, there are several equivalent instructions).
Combining Assembler with BASIC
There are two general approaches to combing BASIC text with assembler code, one from the assembler side and another, probably more versatile one from the BASIC side of things. An important thing to note is that with both approaches you do not set the program counter, as its value will be determined by the assembler. Both methods produce a stand-alone program that can be saved or exported by other means as a single binary file.
BASIC Preambles in Assembler Sources
The first one is the pragma ".BASICSTART
" already described above. You can add an optional line number for the final SYS statement and any number of strings, which will be prepended in REM
statements starting at line number 0. The assembler will generate the required tokenized BASIC code with a SYS
statement pointing to immediately after this short BASIC program and set the program counter for the assembly accordingly.
Example
- Source (assembler text “proud-demo.asm”)
.BASICSTART 100 "*** my great program ***","(c) by me and me alone" RTS ; that's it
- Assembler Listing
pass 1 LINE LOC LABEL PICT 1 0401 .BASICSTART 100 "*** my great program ***\n(c) by me and me alone" 3 044A RTS ; that's it pass 2 LOC CODE LABEL INSTRUCTION 0401 .BASICSTART 100 "*** my great program ***\n(c) by me and me alone" >>>> COMPILING BASIC PREAMBLE... 0401 20 04 $0420 ;LINE LINK 0403 00 00 $0000 ;LINE NO. ("0") 0405 8F 20 ;"REM " 0407 2A 2A 2A ;TEXT "***" 040A 20 4D 59 ;TEXT " MY" 040D 20 47 52 ;TEXT " GR" 0410 45 41 54 ;TEXT "EAT" 0413 20 50 52 ;TEXT " PR" 0416 4F 47 52 ;TEXT "OGR" 0419 41 4D 20 ;TEXT "AM " 041C 2A 2A 2A ;TEXT "***" 041F 00 $00 ;EOL 0420 3D 04 $043D ;LINE LINK 0422 01 00 $0001 ;LINE NO. ("1") 0424 8F 20 ;TOKEN REM, " " 0426 28 43 29 ;TEXT "(C)" 0429 20 42 59 ;TEXT " BY" 042C 20 4D 45 ;TEXT " ME" 042F 20 41 4E ;TEXT " AN" 0432 44 20 4D ;TEXT "D M" 0435 45 20 41 ;TEXT "E A" 0438 4C 4F 4E ;TEXT "LON" 043B 45 ;TEXT "E" 043C 00 $00 ;EOL 043D 48 04 $0448 ;LINE LINK 043F 64 00 $0064 ;LINE NO. ("100") 0441 9E 20 ;TOKEN SYS, " " 0443 31 30 39 ;TEXT "109 0446 38 ;TEXT "8" 0447 00 $00 ;EOL 0448 00 00 $0000 ;END OF BASIC TEXT (EMPTY LINK) >>>> START OF ASSEMBLY AT $044A ("SYS 1098") 044A 60 RTS ; that's it done (code: 0401..044A).
- Listing in BASIC (
0x044A
= dec.1098
)LIST 0 REM *** MY GREAT PROGRAM *** 1 REM (C) BY ME AND ME ALONE 100 SYS 1098 READY.
Here’s another example (probably a bit more healthy and stable),
- hearts.asm (click for the source file)
.basicstart 100 "with love..." lda #211 ;screen code inverted heart ldx #0 loop: sta $8000,x ;fill the screen sta $8100,x sta $8200,x sta $8300,x dex bne loop rts ;done
Assembler Code Appended to BASIC Sources
The other, probably more capable approach is appending an assembler source to a BASIC source file
This achieved by the special BASIC source tag "{ASM START}
" (also "{ASM_START}
" or "{ASMSTART}
"). This will immediately terminate the processof tokenizing the BASIC program with the rest of the line ignored and will insert an ASCII sequence for the memory address following immediately after the BASIC program that is currently generated. Any source text following this will be assumed to be 6502 assembler code. (If no such code is found, a simple RTS
instruction will be appended.)
The general idea is that you put this behind a "SYS
" command to form a final statement that leads to the execution of the following machine language program.
Example
- hearts-demo-1.txt (click for the source file)
100 REM HEARTS DEMO 1 110 PRINT "READY FOR SOME HEARTS?" 120 GET K$:IF K$="" GOTO 130 130 SYS {ASMSTART} ;routine to fill the screen with hearts lda #83 ;screen code for heart character ldx #0 loop: sta $8000,x sta $8100,x sta $8200,x sta $8300,x dex bne loop rts ;return to BASIC
- BASIC Listing
LIST 100 REM HEARTS DEMO 1 110 PRINT "READY FOR SOME HEARTS?" 120 GET K$:IF K$="" GOTO 130 130 SYS 1112 READY.
Here, the emulator behaves as it usually does, whenever we drop a BASIC source file onto it: the code will be transformed and loaded seamlessly, but you will be asked, whether you would want to review the assembler listing or not. In case there should be an error and the assembly fails, the listing will be presented anyways.
Mind that this can be used to push any configurations, etc, to dialogs written in BASIC, where this may be easier to handle than in assembler.
Notably, this can also be used without this assembler, just omit the assembler part, export the resulting program either as binary or as a hex-dump and use it in the assembler of your choice, replacing the final RTS
(0x60) instruction by your code.
Another way of using this is by appending the assembler code to a final DATA
statement, from where we can read the jump address to be used from anywhere in the BASIC program. Here, rather than using BASIC as a means to start our assembler program, we use the assembler to provide some fast routine(s) for BASIC.
Example
(Here, we deposite a screen code to be used for filling the screen in address 255 before calling our routine. This is either screen code 83 for a heart character or 211 for an inverted heart character. We use this to blink the screen three times and finally clear it by filling it using a space character. Mind how the call address is read into variable A
from the final DATA
statement.)
- hearts-demo-2.txt (click for the source file)
100 REM HEARTS DEMO 2 110 READ A: REM READ CALL ADDRESS 120 PRINT "READY FOR SOME HEARTS?" 130 GET K$:IF K$="" GOTO 130 140 FOR I=0 TO 4 150 POKE 255, 83:SYS A 160 FOR D=0 TO 300:NEXT D 170 POKE 255,211:SYS A 180 FOR D=0 TO 300:NEXT D 190 NEXT I 200 POKE 255,32:SYS A 210 PRINT "THIS WAS FUN!" 220 DATA {ASMSTART} ;routine to fill the screen with char in $ff lda $ff ;unused zeropage addr ldx #0 loop: sta $8000,x sta $8100,x sta $8200,x sta $8300,x dex bne loop rts
Using “Disassemble Program” from the emulator’s “Utils/Export” menu, we get the result of our combined BASIC and assembler efforts as in memory:
- Program Disassembly ($0401-$0511)
.[tokenized BASIC text] 0401 15 04 link: $0415 0403 64 00 line# 100 0405 8F token REM 0406 20 48 45 41 52 54 ascii « HEART» 040C 53 20 44 45 4D 4F ascii «S DEMO» 0412 20 32 ascii « 2» 0414 00 -EOL- 0415 32 04 link: $0432 0417 6E 00 line# 110 0419 87 token READ 041A 20 41 3A 20 ascii « A: » 041E 8F token REM 041F 20 52 45 41 44 20 ascii « READ » 0425 43 41 4C 4C 20 41 ascii «CALL A» 042B 44 44 52 45 53 53 ascii «DDRESS» 0431 00 -EOL- 0432 51 04 link: $0451 0434 78 00 line# 120 0436 99 token PRINT 0437 20 22 52 45 41 44 ascii « "READ» 043D 59 20 46 4F 52 20 ascii «Y FOR » 0443 53 4F 4D 45 20 48 ascii «SOME H» 0449 45 41 52 54 53 3F ascii «EARTS?» 044F 22 ascii «"» 0450 00 -EOL- 0451 68 04 link: $0468 0453 82 00 line# 130 0455 A1 token GET 0456 20 4B 24 3A ascii « K$:» 045A 8B token IF 045B 20 4B 24 ascii « K$» 045E B2 token = 045F 22 22 20 ascii «"" » 0462 89 token GOTO 0463 20 31 33 30 ascii « 130» 0467 00 -EOL- 0468 76 04 link: $0476 046A 8C 00 line# 140 046C 81 token FOR 046D 20 49 ascii « I» 046F B2 token = 0470 30 20 ascii «0 » 0472 A4 token TO 0473 20 34 ascii « 4» 0475 00 -EOL- 0476 88 04 link: $0488 0478 96 00 line# 150 047A 97 token POKE 047B 20 32 35 35 2C 20 ascii « 255, » 0481 38 33 3A ascii «83:» 0484 9E token SYS 0485 20 41 ascii « A» 0487 00 -EOL- 0488 9C 04 link: $049C 048A A0 00 line# 160 048C 81 token FOR 048D 20 44 ascii « D» 048F B2 token = 0490 30 20 ascii «0 » 0492 A4 token TO 0493 20 33 30 30 3A ascii « 300:» 0498 82 token NEXT 0499 20 44 ascii « D» 049B 00 -EOL- 049C AE 04 link: $04AE 049E AA 00 line# 170 04A0 97 token POKE 04A1 20 32 35 35 2C 32 ascii « 255,2» 04A7 31 31 3A ascii «11:» 04AA 9E token SYS 04AB 20 41 ascii « A» 04AD 00 -EOL- 04AE C2 04 link: $04C2 04B0 B4 00 line# 180 04B2 81 token FOR 04B3 20 44 ascii « D» 04B5 B2 token = 04B6 30 20 ascii «0 » 04B8 A4 token TO 04B9 20 33 30 30 3A ascii « 300:» 04BE 82 token NEXT 04BF 20 44 ascii « D» 04C1 00 -EOL- 04C2 CA 04 link: $04CA 04C4 BE 00 line# 190 04C6 82 token NEXT 04C7 20 49 ascii « I» 04C9 00 -EOL- 04CA DB 04 link: $04DB 04CC C8 00 line# 200 04CE 97 token POKE 04CF 20 32 35 35 2C 33 ascii « 255,3» 04D5 32 3A ascii «2:» 04D7 9E token SYS 04D8 20 41 ascii « A» 04DA 00 -EOL- 04DB F1 04 link: $04F1 04DD D2 00 line# 210 04DF 99 token PRINT 04E0 20 22 54 48 49 53 ascii « "THIS» 04E6 20 57 41 53 20 46 ascii « WAS F» 04EC 55 4E 21 22 ascii «UN!"» 04F0 00 -EOL- 04F1 FC 04 link: $04FC 04F3 DC 00 line# 220 04F5 83 token DATA 04F6 20 31 32 37 38 ascii « 1278» 04FB 00 -EOL- 04FC 00 00 -EOP- (link = null) .[end of BASIC text] * = $04FE 04FE A5 FF LDA $FF 0500 A2 00 LDX #$00 0502 9D 00 80 L0502 STA $8000,X 0505 9D 00 81 STA $8100,X 0508 9D 00 82 STA $8200,X 050B 9D 00 83 STA $8300,X 050E CA DEX 050F D0 F1 BNE L0502 0511 60 RTS .end
This mechanism can be used to integrate multiple machine language routines, but you will have to add any offset to the base address returned in the final DATA
statement on your own.
Note: I’m not aware that this has been done before, so this could be well a genuine invention, since this requires some kind of engine capable of handling both BASIC source text and assembler code at once.
Fixed Start Addresses with BASIC Sources
In case you really want to use a fixed start address for your routine, you may either put the "{ASMSTART}
" behind a dummy command or best in a "REM
" statement. (Mind that hiding it in a string won’t work, as any such text is ignored by the parser.)
In the following example, the space between $0475
, the end of the BASIC program, and $0480
(decimal 1152
), the explicitly provided start of the 6502 code, will be filled by zero-bytes:
- hearts-demo-3.txt (click for the source file)
100 REM HEARTS DEMO 3 110 PRINT "READY FOR SOME HEARTS?" 120 GET K$:IF K$="" GOTO 130 130 SYS 1152 140 REM ML RANGE STARTS AT {ASMSTART} * = $0480 ;routine to fill the screen with hearts lda #83 ;screen code for heart character ldx #0 loop: sta $8000,x sta $8100,x sta $8200,x sta $8300,x dex bne loop rts ;return to BASIC
- This will result in the following hex-dump (”Utils/Export” → “Hex-Dump Program”):
0400: .. 15 04 64 00 8F 20 48 ..... H 0408: 45 41 52 54 53 20 44 45 EARTS DE 0410: 4D 4F 20 33 00 34 04 6E MO 3.4.. 0418: 00 99 20 22 52 45 41 44 .. "READ 0420: 59 20 46 4F 52 20 53 4F Y FOR SO 0428: 4D 45 20 48 45 41 52 54 ME HEART 0430: 53 3F 22 00 4B 04 78 00 S?".K... 0438: A1 20 4B 24 3A 8B 20 4B . K$:. K 0440: 24 B2 22 22 20 89 20 31 $."" . 1 0448: 33 30 00 56 04 82 00 9E 30.V.... 0450: 20 31 31 35 32 00 74 04 1152... 0458: 8C 00 8F 20 4D 4C 20 52 ... ML R 0460: 41 4E 47 45 20 53 54 41 ANGE STA 0468: 52 54 53 20 41 54 20 31 RTS AT 1 0470: 31 34 32 00 00 00 00 00 142..... 0478: 00 00 00 00 00 00 00 00 ........ 0480: A9 53 A2 00 9D 00 80 9D .S...... 0488: 00 81 9D 00 82 9D 00 83 ........ 0490: CA D0 F1 60 ...`
(Orange: end of BASIC text, blue: filled by assembler until start of 6502 code at0x480
.)
The various parts still form a homogeneous program as indicated by the system pointers TXTTAB
(start of BASIC text) = $0401
and VARTAB
(start of BASIC variables) = $0494
.
(Use ”Utils/Export” → “Show BASIC System Pointers” to view these pointers.)
Rationale — General Considerations
I’ve always looked with respectful envy at those BASIC dialects featuring in-line assembly, like BBC BASIC. Could we have similar for Commodore BASIC? I’d argue that this isn’t the way to go about this on the Commodore 8-bits, since the BASIC runtime shuffles variables around in memory, as new variables are encountered. This is especially true for subscripted variables (arrays), which are often used for a scheme like this, and there is no such thing as a stable location in memory.
The Commodore way of doing this — at least for me — is appending any machine language code to the BASIC program, but including it in the program range as set by the two system pointers TXTTAB
and VARTAB
, the former holding the start address of the tokenized BASIC text in memory (usually 0x401
on the PET), the latter providing the start of the memory available for variables, just after the last byte of BASIC text. This way, the machine language part is still an integral part of the program and won’t be affected by the runtime.
However, mind that, should your machine language routine(s) make use of some tables, you’d better reserve the space required. Since, if you were merely addressing some space beyond your program blindly, this potentially clashes with any variables managed by the BASIC runtime.
— ❦ —
Finally, some useful addresses (new ROM / BASIC 2.0)
; PET 2001 system addresses (ROM 2.0) USRPOK = $00 ;$4C constant (JMP instruction) USRADD = $01 ;USR function addr. lo, hi ($02) COUNT = $05 ;BASIC input buffer pointer ("#" subscript) VAUYP = $07 ;variable flag, type: $FF=string, $00=numeric INTFLG = $08 ;integer flag: $80=integer, $00=floating point GARBFL = $09 ;flag for DATA, LIST quote, memory SUBFLG = $0A ;flag for subscript, FNx INPFLG = $0B ;input/read flag: $00=input, $40=get, $98=read TANSGN = $0C ;flag ATN sign, comparision evaluation LINNUM = $11 ;BASIC integer address for SYS, GOTO, etc (lo, hi) INDEX = $1F ;pointer for number transfer (lo, hi) RESHO = $23 ;product staging area for multiplication TXTTAB = $28 ;pointer: start of BASIC text in memory VARTAB = $2A ;pointer: end of BASIC, start of variables ARYTAB = $2C ;pointer: end of variables, start of arrays STREND = $2E ;pointer: end of arrays FRETOP = $30 ;pointer: top of memory, bottom of strings FRESPC = $32 ;utility string pointer MEMSIZ = $34 ;pointer: limit of BASIC memory CURLIN = $36 ;current BASIC line number OLDLIN = $38 ;previous BASIC line number OLDTXT = $3A ;pointer to BASIC statement for CONT DATLIN = $3C ;line number, current DATA item DATPTR = $3E ;pointer to current DATA item INPPTR = $40 ;input vector VARNAM = $42 ;current variable name VARPNT = $44 ;current variable address FORPNT = $46 ;variable pointer for FOR/NEXT TEMPF1 = $54 ;misc numeric storage area TEMPF2 = $59 ;misc numeric storage area FACEXP = $5E ;floating point accumulator 1: exponent FACHO = $5F ;floating point accumulator 1: mantissa (4 bytes) FACSGN = $63 ;floating point accumulator 1: sign SGNFLG = $64 ;series evaluation constant pointer BITS = $65 ;accumulator hi-order propagation word ARGEXP = $66 ;floating point accumulator 2: exponent ARGHO = $67 ;floating point accumulator 2: mantissa (4 bytes) ARGSGN = $6B ;floating point accumulator 2: sign ARISGN = $6C ;sign comparison (primary vs. secondary) FACOV = $6D ;low-order rounding byte for FAC #1 FBUFPT = $6E ;cassette buffer length/series pointer CHRGET = $70 ;subroutine to get the next character CHRGOT = $76 ;character found by CHARGET TXTPTR = $77 ;pointer to source text for CHARGET RNDX = $88 ;round storage and work area TIME = $8D ;jiffy clock in 1/60 sec for TI and TI$ (lo, hi) CINV = $90 ;IRQ vector (lo, hi), hardware interrupt CBINV = $92 ;BRK interrupt vector (lo, hi) NMINV = $94 ;NMI interrupt vector (lo, hi) STATUS = $96 ;status word ST LSTX = $97 ;which key? matrix coordinates of last key down: row/col, $FF=no key SFDX = $98 ;shift key: 1=pressed STKEY = $9B ;last read from keyboard scan: STOP and RVS flags SVXT = $9C ;timing constant buffer VERCK = $9D ;flag: LOAD=0, VERIFY=1 NDX = $9E ;index into keyboard buffer RVS = $9F ;screen reverse flag C3PO = $A0 ;IEEE output flag: $FF=character waiting INDX = $A1 ;pointer: end-of-line for input LXSP = $A3 ;cursor log (row, col) BSOUR = $A5 ;IEEE output character buffer BLNSW = $A7 ;flag: 0=flashing cursor, else no cursor BLNCT = $A8 ;countdownfor cursor timing GDBLN = $A9 ;character under cursor BLNON = $AA ;cursor blink flag SYNO = $AB ;EOT bit received NXTBIT = $AB ;-- " -- CRSW = $AC ;input from screen/input from keyboard LDTND = $AE ;number of open files, pointer into file table DFLTN = $AF ;input device (normally 0) DFLTO = $B0 ;output CMD device (normally 3) PRTY = $B1 ;tape character parity DPSW = $B2 ;byte received flag BUFPNT = $BB ;tape buffer #1 count ($BC: tape buffer #2 count) INBIT = $BD ;write leader count, read pass 1/pass 2 BITCI = $BE ;write new byte, read error flag RINONE = $BF ;write start bit, read bit seq error FNMIDX = $C0 ;pass 1 error log pointer PTR1 = $C0 ;-- " -- PTR2 = $C1 ;pass 2 error correction pointer RIDATA = $C2 ;current function: 0=scan, $01-$0F=count, $40=load, $80=end RIPRTY = $C3 ;read checksum, write leader length PNT = $C4 ;pointer to screen line (lo, hi) PNTR = $C6 ;column position of cursor on above line SAL = $C7 ;utility pointer for tape buffer, scrolling EAL = $C9 ;tape end address / end of current program QTSW = $CD ;flag for quote mode: 0=direct mode, else programmed cursor BITTS = $CE ;timer 1 enabled for tape read, 0=disabled FNLEN = $D1 ;number of characters in file name LA = $D2 ;current logical file number SA = $D3 ;current secondary address, or R/W command FA = $D4 ;current device number LNMX = $D5 ;line length (39 or 79) for screen TAPE1 = $D6 ;start of tape buffer (address lo, hi) TBLX = $D8 ;current line with cursor DATAX = $D9 ;last key input, buffer checksum, bit buffer FNADR = $DA ;pointer to current file name INSRT = $DC ;number of keyboard INSERTs outstanding ROPRTY = $DD ;write shift word / receive input character FSBLK = $DE ;number of blocks remaining for read/write MYCH = $DF ;serial buffer word LDTB1 = $E0 ;screen line table, hi order addr. and line wrap CAS1 = $F9 ;interrupt driver flag for cassette #1 status switch CAS2 = $FA ;interrupt driver flag for cassette #2 status switch STAL = $FB ;tape start address (lo, hi) MEMUSS = $FD ;pointer for monitor (MLM) BAD = $0100 ;start of processor stack, tape error log BUF = $0200 ;MLM area TBUFFR = $027A ;tape (cassette) buffer TIMOUT = $03FC ; ; kernal addresses OPEN = $FFC0 CLOSE = $FFC3 CHKIN = $FFC6 ;set input device CHKOUT = $FFC9 ;set output device CLRCHN = $FFCC ;restor I/O CHRIN = $FFCF ;read a byte from input CHROUT = $FFD2 ;write a byte to output LOAD = $FFD5 SAVE = $FFD8 VERIFY = $FFDB SYS = $FFDE STOP = $FFE1 ;check STOP key (affects A only, zero-flag set: STOP pressed) GETIN = $FFE4 ;get a character CLALL = $FFE7 ;abort all I/O INCTIME = $FFEA ;update clock, scan and store key ; hardware addresses VIDEO = $8000 PIA1_PA = $E810 PIA1_CRA = $E811 PIA1_PB = $E812 PIA1_CRB = $E813 PIA2_PA = $E820 PIA2_CRA = $E821 PIA2_PB = $E822 PIA2_CRB = $E823 VIA_DRB = $E840 VIA_DRA = $E841 VIA_DDRB = $E842 VIA_DDRA = $E843 VIA_T1CL = $E844 VIA_T1CH = $E845 VIA_T1LL = $E846 VIA_T1LH = $E847 VIA_T2CL = $E848 VIA_T2CH = $E849 VIA_SR = $E84A VIA_ACR = $E84B VIA_PCR = $E84C VIA_IFR = $E84D VIA_IER = $E84E VIA_ANH = $E84F
— ❦ —
Norbert Landsteiner
Feb. 2023
www.masswerk.at