“PET-Globe” Demo
An avidly spinning globe for the Commodore PET — and some bit-vectors.
Well, I made another thing — in #6502 code and #PETSCII.
I’ve a new PET game in the works, aptly named “PET Invaders”, since it’s for the Commodore PET and, yes, another Space Invaders-style game. (I told you that there may be some forshadowing involved in the New Year’s post.) The game is all about rendering “fat pixels” in PETSCII quarter-block characters (trading static resolution for higher dynamic resolution) and currently features a splash screen that looks like this:
The versed retrogamer may have spotted it already: this is a tribute, heavily inspired by the intro sequence to the “Space Invader” game for the Sharp MZ-80 (which is restricted to character graphics, as well, similar to PETSCII). And this original inspiration isn’t just static, it features an animation of (I think) twelve static slides. So, can we do something similar for the PET, but even improve on this, by rendering a real animated globe?
And here’s our little attempt at that global challenge:
I will probably keep the static splash screen for an 8K version of the game and use this for a 16K version.
The rendering area of the animation is a bit smaller than the depiction in the static version, but it runs surprisingly fast and fluidly. (This reduced size is the price, we have to pay, in order to fit everything into 8-bit suitable data structures and to maintain a not too extensive program size.)
Here’s a short video of the program running, which is now available as a stand-alone demo, and the globe spinning:
Or just click here, for a direct link to run the “PET-Globe” in in-browser emulation:
The “PET-Globe” demo, as a stand-alone program (PRG), is available for download here:
Requirements: any 40-columns PET, any ROM version, 8K of RAM or better.
“PET-Globe” — Making-Of
Obviously, we have to resort to machine language (assembler), since there is no way we could do this in BASIC. The 6502 assembler, I used for this, is my own, the one embedded into the PET 2001 emulator, which allows for rapid development by simply dragging & dropping assembler source files.
The crucial question, though, is the one of data structures and their organization. How do fit the entire world into about 4.8K and how can we access this suitably fast?
Here are the ingredients:
- a world map in cylindrical projection (preprocessed and then hand edited in ASCII format)
- a displacement map for spherical projection
- a mask defining the rendering area, off-pixels and some static pixels indicating the border of the globe. (Further down the road, this may also allow us to leave some areas of the image untouched, like a superimposed Space Invader.)
As we’re going to render this in “fat-pixels”, AKA quarter-block PETSCII characters (◲
, ◱
, ◳
, ◰
, and any combinations thereof), we also need to transition from a full resolution of 32 × 32 pixels to a half-size video resolution of 16 × 16 characters.For this we also need:
- a 16 × 16 bytes (256 bytes) scratch area, where we store set pixels as 4-bit (a bit for every fat pixel) bit-vectors and then render them to PETSCII characters using a lookup table.
(As we’ll se later, the 16 × 16 characters size limitation doesn‘t come from the total of 256 bytes, as may be presumed on first sight, but rather from the size of the displacement maps and handy 8-bit displacement offsets, which really limits our pixel resolution to 32 × 32.)
Preparing the Data
The preparation of this data wouldn’t have been possible without modern equipment. Of course, we could have processed our displacement in BASIC and encoded the world map by hand, but assembling this into something runable, so that we can test it, would have been unspeakably tedious with all the intermediate steps required on each iteration.
So I rather built a tiny web page, which first samples and rendered a world map to a suitable size and then exported this as ASCII strings, an easily editable format (and also appropriately old-school), which was used for any further steps. This page was also used to generate and export a static spherical projection map.
As the individual positions hold just 1-bit data, as in on/off, this can be obviously compressed into 8-bit bit-vectors, resulting in a width of the data stream of just 4 bytes. (4 × 8 = 32)
For this, we’ll rotate the map data by 90°, so that it runs column by column down in rows. This way, we can easily iterate over them per column and (spoilers) it will facilitate animation.
bit-vector map col #0 map col #1 ┌─ . 0 ┌─ . 0 │ x 1 │ x 1 │ . 2 │ x 2 [-1--1-1-] <──┤ x 3 [11--111-] <──┤ x 3 76543210 │ . 4 76543210 │ . 4 │ . 5 │ . 5 cv0 = 0x4A │ x 6 = 0xCE │ x 6 └─ . 7 └─ x 7 ┌─ x 0 │ x 1 (...) │ x 2 [11--1111] <──┤ x 3 76543210 │ . 4 │ . 5 cv1 = 0xC3 │ x 6 └─ x 7 ┌─ x 0 │ x 1 (...) │ . 2 [11-11-11] <──┤ x 3 76543210 │ x 4 │ . 5 cv2 = 0xDB │ x 6 └─ x 7 ┌─ x 0 │ x 1 (...) │ . 2 [----1-11] <──┤ x 3 76543210 │ . 4 │ . 5 cv3 = 0x0B │ . 6 └─ . 7 worldData ; cv0 cv1 cv2 cv3 --> columns .byte $4A, $C3, $DB, $0B ; r0 │ .byte $CE, ... ; r1 v ; r2 rows
The displacement map for the spherical projection is a static lookup table, giving a lookup-address relative to the top-left origin of the map data (at the current view state). The beauty of this is that this remains always the same for each animation step, as long if we do not aspire to tilt the rotational axis or our viewing angle.
As it happens, the lookup vectors into the map data are for a 32 × 32 viewport up to 11 bits wide. How could we encode this into 8-bits? Well, we do something similar as before, splitting this into an 8-bit byte offset (0…255) and a 3-bit mask (0…7), which should allow us to easily retrieve the respective byte from the map data and then simply AND
this with the mask, which represents a bit-vector of the same type. If this gives a non-zero result, we’ll set that pixel, otherwise, we’ll let it unset. Notably, we won’t store the low part of the displacement vector in binary, rather, we’ll set the nth bit high (1 << n
), so that there is single bit set, everytime. In other words, this mask selects from the bit-vector retrieved from the map data. — Yes, this is a pre-computed transformer. But certainly not a LLM, more like a (very) small dot displacement model. ;-)
The downside of this is the we actually need two tables, one for the address vector and one for the mask. By 32 × 32 = 1024 bytes = 1KB, this each of this maps is quite substantial for our humble target machine. Also, we’ll need another one, for the view mask, we’ve already mentioned before. That’s 3KB in auxiliary tables for just 384 bytes (4 × 96) of actual display data!
(Notably, this 8-bit byte offset is ideal for indexed addressing. I would have loved to make this 40 × 40, but this would have resulted in some 12-bit offset, totally ruining our 8 + 3 bits lookup scheme.)
Sampling
Anyways, this is how we do it:
Reducing the final rendering dimension to half is as easy as an arithmetic shift left (ASL
). However, what we write into this scratch area is yet another bit-vector — and we have to come up with a convention for this:
; bitmap codes for quarter-blocks ; ; x. … 1 ; .. ; ; .x … 2 ; .. ; ; .. … 4 ; x. ; ; .. … 8 ; .x petscii ;bitmap codes to screen codes .byte $20 ; 0 .byte $7E ; 1 .byte $7C ; 2 .byte $E2 ; 3 .byte $7B ; 4 .byte $61 ; 5 .byte $FF ; 6 .byte $EC ; 7 .byte $6C ; 8 .byte $7F ; 9 .byte $E1 ; 10 .byte $FB ; 11 .byte $62 ; 12 .byte $FC ; 13 .byte $FE ; 14 .byte $A0 ; 15
We could determine the appropriate code by the value in the carry flag, which holds the least significant bit as it has been shifted out, but we opt for an XOR
operation instead (which is actually a bit slower, but conceptionally pleasing).
column XOR row even odd even 1 2 ... 3 odd 4 8 ... 12
The final operation is then to transform these bitmap codes into PETSCII characters by the use of the above lookup table for PETSCII characters and render this to the screen.
This is what this looks like in 6502 assembler code:
sample
lda #0 ;initialize row count
sta TY ;zeropage address
;(...) ;set up some further values…
lda #1 ;initialize which bit to render
sta SCRATCHBIT ;zeropage address
lda #3 :XOR-mask to alternate between states
sta SCRATCHMASK ;zeropage address
sampleRow
ldx #0 ;X-register holds current column to scan
samplePoint
readMask lda $ffff,x ;test MASK
bmi skip ;$80: ignore / skip
asl
bmi setPixel ;$40: set uncondionally
readProjA ldy $ffff,x ;PROJECTION ADDRESS
readWData lda worldData,y ;use it as offset into worldData
sta TEMP
readProjB lda $ffff,x ;PROJECTION BIT-MASK
and TEMP ;select
beq skip ;skip, unless active
setPixel txa
lsr ;halve the resolution
tay
lda SCRATCHBIT ;load and OR pattern for scratch area
ora (SCRATCHPTR),y ;indirect indexed reference using a pointer
sta (SCRATCHPTR),y ; in the zero-page
skip inx ;iterate for next data point
cpx #32 ;row done?
beq nextRow
lda SCRATCHBIT ;next data-point, same row
eor SCRATCHMASK ;alternate value in SCRATCHBIT
sta SCRATCHBIT
jmp samplePoint ;loop for next column
nextRow ldy TY ;increment row count
iny
cpy #32 ;32 rows done?
beq sampleDone
sty TY ;prepare for next row
{set SCRATCHBIT & SCRATCHMASK according to even and odd rows}
{increment SCRATCHPTR by 16 on even rows}
{increment MASK by 32}
{increment PROJECTION ADDRESS by 32}
{increment PROJECTION BIT-MASK by 32}
jmp sampleRow ;loop for next row
sampleDone rts
Note: “{…}” indicates trivial code and tasks skipped for brevity.
Thanks to our data organisation, we can iterate over consecuitive pixels by simple indexed addressing. It would have been nice to use the BIT
instruction for testing the mask state, but there is no indexed addressing mode available for BIT
on the MOS 6502. It would have been so nice: we have to encode two crucial states of the mask array (“skip” and “set uncondinally”) and the BIT
instruction transfers bit 7 (the sign-bit) and the bit 6 into the zero and the overflow flags, respectively, for easy testing by BEQ
/BNE
and BVC
/BVS
! Ideal for our task. But — Alas! —, we may either rip out the 6502 from our beloved PET and replace it by the WDC 65C02(S), which features such an addressimng mode as an extension to the standard instruction set, or we have to resort to LDA
and an extra ASL
instruction.
Moreover, indexed addressing will take us only that far. 255 bytes (or $FF
), exactly. Since we have to iterate over a total of 32 × 32 = 1024 sampling positions, we’ll have to update our various base addresses at some point. We could have done this using pointers in the zero-page and indirect indexed address mode (as in, “LDA (zpg),Y
”, etc.), but, in order to save a CPU few cycles, we’ll choose to go down the road of self-modification. It doesn’t really matter where we write our updated addresses to, and this way we can save a few cycles on each of the 1024 iterations (“LDA absolute,X
” takes 4 cycles, whereas “LDA (zpg),Y
” takes 5.) Wherever there is the quite impossible address $FFFF
(which is actually the high-byte of the hard-wired IRQ vector), we’ll replace this by the respective base address, as annotated by “MASK
”, “PROJECTION ADDRESS
” and “PROJECTION BIT-MASK
”.
Rendering the result in the scratch area to screen codes for display purpose is as easy looking up the respective PETSCII characters our lookup table (see above) and replacing our made-up bit-vectors by actual Commodore screen codes that we can write to the video memory.
render ldx #0 renderLoop ldy SCRATCH,x lda petscii,y sta SCRATCH,x inx bne renderLoop rts
The final step is left to a separate routine, which copies this as rapidly as possible to screen memory. The reason for this is that we’re doing this for the Commodore PET and especially for its original incarnation, the PET 2001. This features slow SRAM memory with an access time of 1MHz, which is the same as the CPU cycle time, leaving no gap for an exclusive access by the video circuitry. Hence, every time a bus conflict occurs when the CPU and the video circuitry access the video RAM at the same time, we get “snow” on the screen. To avoid this, we’ll have to restrict our acces to video RAM to the the vertical retrace interval of the screen (V-BLANK
), when video is off. Which can be done by taking over the system interrupt, which triggers exactly at the start of V-BLANK
.
Our efforts won’t be lost to more recent editions of the PET, featuring faster dynamic RAM, where this V-sync will prevent “screen tearing”, the nemesis of all computer animation.
There are some obvious opportunities to improve on our sampling routine in terms of run time and cycle counts, but we’re kind of relaxed towards this, since (a) we’ll have to wait for V-BLANK
anyways (and we’re hardly to go to win an entire 60Hz frame), and (b) our rendering speed is already quite fine as-is.
Animation
So, how are we going to animate this into a spinning globe?
As may have become obvious from the above code example, we’re not iterating over the map data. Rather, we’re iterating over our lookup tables, retrieving offsets. The only thing that changes for another animation frame is the point of origin for lookup into the map data. The rules of projection and the related transformation, effecting the displacement, stay the same. So we just add 4 to our base address (where there’s the hard-coded address label “worldData
” in our above code example) to shift this by one column, and we’re ready for our next animation frame.
sampleDone ldy ANIMCOUNT ;animate dey ;decrement beq sampleReset ;have we reached the border? sty ANIMCOUNT ;no, add 4 to lookup origin lda readWData+1 ;subtract 4 clc adc #4 sta readWData+1 lda #0 adc readWData+2 sta readWData+2 rts sampleReset lda #<worldData ;reset to first frame (base address) sta readWData+1 lda #>worldData sta readWData+2 lda #ANIMFRAMES sta ANIMCOUNT rts
There’s still an issue left: namely, what happens when we reach the far Eastern portions of the world map, where we’re meant to wrap around?
In a higher level language, we’d probably resort to modular operations. Something like,
mapAddr = displacementMap[row * 32 + column]; mapByte = worldData[(mapAddr + animCount * 4) % worldData.length]; // ...
However, we can’t do this in time critical machine code, since this would put an end to our nice serial indexing scheme. Instead of this, we had to compute addresses by a series of additions and comparisons, checks and subsequent subtractions, for each of our 1024 sampling points.
The solution is an easy one, though: just copy the required overlap to the end of the map data. — Done.
(And, yes, this is also relevant for time-critical higher language code.)
worldData .byte $00,$00,$00,$00 .byte $00,$00,$00,$00 .byte $00,$00,$00,$00 .byte $00,$00,$00,$00 .byte $02,$00,$00,$00 ... .byte $02,$00,$00,$04 .byte $02,$00,$00,$06 .byte $02,$00,$10,$00 worldWrap ;repeat… .byte $00,$00,$00,$00 .byte $00,$00,$00,$00 .byte $00,$00,$00,$00 .byte $00,$00,$00,$00 .byte $02,$00,$00,$00 ...
And, in order to rotate in the right direction, towards the East, we will actually start at the far end of the world map (just before the wrap) and subtract 4 on each animation frame, rather than adding 4 (as exemplified above).
And this is how we spend a total of 4.8 KB on rendering just 384 bytes of map data. :-)
PS: For how to generate a displacement map for a spherical projection, see for example the related article “Sphere Mapping” by Frédéric Goset.
Norbert Landsteiner,
Vienna, 2025-01-08