“PET-Globe” Demo

An avidly spinning globe for the Commodore PET — and some bit-vectors.

Well, I made another thing — in #6502 code and #PETSCII.

I’ve a new PET game in the works, aptly named “PET Invaders”, since it’s for the Commodore PET and, yes, another Space Invaders-style game. (I told you that there may be some forshadowing involved in the New Year’s post.) The game is all about rendering “fat pixels” in PETSCII quarter-block characters (trading static resolution for higher dynamic resolution) and currently features a splash screen that looks like this:

The splash screen for "PET Invaders" (WIP)
PET Invaders (WIP, unreleased as of writing), static splash screen.

The versed retrogamer may have spotted it already: this is a tribute, heavily inspired by the intro sequence to the “Space Invader” game for the Sharp MZ-80 (which is restricted to character graphics, as well, similar to PETSCII). And this original inspiration isn’t just static, it features an animation of (I think) twelve static slides. So, can we do something similar for the PET, but even improve on this, by rendering a real animated globe?

Twelve phase animation cycle of Space Invader game for the Sharp MZ-80
Animation frames of the intro to the game “Space Invader” for the Sharp MZ-80 (1982).
Earth is surrounded by (attacking) alien saucers, which are part of the MZ-80 character set.
Source: YouTube, @Sharp MZ-80A.

And here’s our little attempt at that global challenge:

"PET-Globe" demo for Commodore PET, screenshot: a green screen showing an animated globe rendered in PETSCII block characters.
Click to see the program run in online emulation.

I will probably keep the static splash screen for an 8K version of the game and use this for a 16K version.
The rendering area of the animation is a bit smaller than the depiction in the static version, but it runs surprisingly fast and fluidly. (This reduced size is the price, we have to pay, in order to fit everything into 8-bit suitable data structures and to maintain a not too extensive program size.)
Here’s a short video of the program running, which is now available as a stand-alone demo, and the globe spinning:

Or just click here, for a direct link to run the “PET-Globe” in in-browser emulation:

The “PET-Globe” demo, as a stand-alone program (PRG), is available for download here:

Requirements: any 40-columns PET, any ROM version, 8K of RAM or better.

“PET-Globe” — Making-Of

Obviously, we have to resort to machine language (assembler), since there is no way we could do this in BASIC. The 6502 assembler, I used for this, is my own, the one embedded into the PET 2001 emulator, which allows for rapid development by simply dragging & dropping assembler source files.

The crucial question, though, is the one of data structures and their organization. How do fit the entire world into about 4.8K and how can we access this suitably fast?

Here are the ingredients:

As we’re going to render this in “fat-pixels”, AKA quarter-block PETSCII characters (, , , , and any combinations thereof), we also need to transition from a full resolution of 32 × 32 pixels to a half-size video resolution of 16 × 16 characters.
For this we also need:

(As we’ll se later, the 16 × 16 characters size limitation doesn‘t come from the total of 256 bytes, as may be presumed on first sight, but rather from the size of the displacement maps and handy 8-bit displacement offsets, which really limits our pixel resolution to 32 × 32.)

Preparing the Data

The preparation of this data wouldn’t have been possible without modern equipment. Of course, we could have processed our displacement in BASIC and encoded the world map by hand, but assembling this into something runable, so that we can test it, would have been unspeakably tedious with all the intermediate steps required on each iteration.

So I rather built a tiny web page, which first samples and rendered a world map to a suitable size and then exported this as ASCII strings, an easily editable format (and also appropriately old-school), which was used for any further steps. This page was also used to generate and export a static spherical projection map.

Preparing the data for the “PET-Globe” demo.
Processing and exporting the data.

As the individual positions hold just 1-bit data, as in on/off, this can be obviously compressed into 8-bit bit-vectors, resulting in a width of the data stream of just 4 bytes. (4 × 8 = 32)

For this, we’ll rotate the map data by 90°, so that it runs column by column down in rows. This way, we can easily iterate over them per column and (spoilers) it will facilitate animation.

bit-vector   map col #0               map col #1

              ┌─ .  0                  ┌─ .  0
                x  1                    x  1
                .  2                    x  2
[-1--1-1-] <──┤  x  3    [11--111-] <──┤  x  3
 76543210       .  4     76543210       .  4
                .  5                    .  5
cv0 = 0x4A      x  6      = 0xCE        x  6
              └─ .  7                  └─ x  7
              ┌─ x  0
                x  1                   (...)
                x  2
[11--1111] <──┤  x  3
 76543210       .  4
                .  5
cv1 = 0xC3      x  6
              └─ x  7
              ┌─ x  0
                x  1                   (...)
                .  2
[11-11-11] <──┤  x  3
 76543210       x  4
                .  5
cv2 = 0xDB      x  6
              └─ x  7
              ┌─ x  0
                x  1                   (...)
                .  2
[----1-11] <──┤  x  3
 76543210       .  4
                .  5
cv3 = 0x0B      .  6
              └─ .  7


worldData         
;               cv0  cv1  cv2  cv3  --> columns

          .byte $4A, $C3, $DB, $0B  ; r0   
          .byte $CE, ...            ; r1   v
                                    ; r2  rows

The displacement map for the spherical projection is a static lookup table, giving a lookup-address relative to the top-left origin of the map data (at the current view state). The beauty of this is that this remains always the same for each animation step, as long if we do not aspire to tilt the rotational axis or our viewing angle.

from cylindrical to spherical projection.
From cylindrical to spherical projection by a stored displacement map.

As it happens, the lookup vectors into the map data are for a 32 × 32 viewport up to 11 bits wide. How could we encode this into 8-bits? Well, we do something similar as before, splitting this into an 8-bit byte offset (0…255) and a 3-bit mask (0…7), which should allow us to easily retrieve the respective byte from the map data and then simply AND this with the mask, which represents a bit-vector of the same type. If this gives a non-zero result, we’ll set that pixel, otherwise, we’ll let it unset. Notably, we won’t store the low part of the displacement vector in binary, rather, we’ll set the nth bit high (1 << n), so that there is single bit set, everytime. In other words, this mask selects from the bit-vector retrieved from the map data. — Yes, this is a pre-computed transformer. But certainly not a LLM, more like a (very) small dot displacement model. ;-)

The downside of this is the we actually need two tables, one for the address vector and one for the mask. By 32 × 32 = 1024 bytes = 1KB, this each of this maps is quite substantial for our humble target machine. Also, we’ll need another one, for the view mask, we’ve already mentioned before. That’s 3KB in auxiliary tables for just 384 bytes (4 × 96) of actual display data!

(Notably, this 8-bit byte offset is ideal for indexed addressing. I would have loved to make this 40 × 40, but this would have resulted in some 12-bit offset, totally ruining our 8 + 3 bits lookup scheme.)

Sampling

Anyways, this is how we do it:

"PET-Globe" rendering procedures.
Rendering procedures.

Reducing the final rendering dimension to half is as easy as an arithmetic shift left (ASL). However, what we write into this scratch area is yet another bit-vector — and we have to come up with a convention for this:

;	bitmap codes for quarter-blocks
;
;	x.   … 1
;	..
;
;	.x   … 2
;	..
;
;	..   … 4
;	x.
;
;	..   … 8
;	.x

petscii             ;bitmap codes to screen codes
	.byte $20   ;  0
	.byte $7E   ;  1
	.byte $7C   ;  2
	.byte $E2   ;  3
	.byte $7B   ;  4
	.byte $61   ;  5
	.byte $FF   ;  6
	.byte $EC   ;  7
	.byte $6C   ;  8
	.byte $7F   ;  9
	.byte $E1   ; 10
	.byte $FB   ; 11
	.byte $62   ; 12
	.byte $FC   ; 13
	.byte $FE   ; 14
	.byte $A0   ; 15

We could determine the appropriate code by the value in the carry flag, which holds the least significant bit as it has been shifted out, but we opt for an XOR operation instead (which is actually a bit slower, but conceptionally pleasing).

           column           XOR
row
        even   odd

even     1      2    ...     3

odd      4      8    ...    12

The final operation is then to transform these bitmap codes into PETSCII characters by the use of the above lookup table for PETSCII characters and render this to the screen.

This is what this looks like in 6502 assembler code:

sample
	lda #0                  ;initialize row count
	sta TY                  ;zeropage address
	;(...)                  ;set up some further values… 
	lda #1                  ;initialize which bit to render
	sta SCRATCHBIT          ;zeropage address
	lda #3                  :XOR-mask to alternate between states
	sta SCRATCHMASK         ;zeropage address
	
sampleRow
	ldx #0                  ;X-register holds current column to scan
	
samplePoint
readMask	lda $ffff,x             ;test MASK
	bmi skip                ;$80: ignore / skip
	asl
	bmi setPixel            ;$40: set uncondionally
readProjA	ldy $ffff,x             ;PROJECTION ADDRESS
readWData	lda worldData,y         ;use it as offset into worldData
	sta TEMP
readProjB	lda $ffff,x             ;PROJECTION BIT-MASK
	and TEMP                ;select
	beq skip                ;skip, unless active
setPixel	txa
	lsr                     ;halve the resolution
	tay
	lda SCRATCHBIT          ;load and OR pattern for scratch area
	ora (SCRATCHPTR),y      ;indirect indexed reference using a pointer
	sta (SCRATCHPTR),y      ;                          in the zero-page
skip	inx                     ;iterate for next data point
	cpx #32                 ;row done?
	beq nextRow
	lda SCRATCHBIT          ;next data-point, same row
	eor SCRATCHMASK         ;alternate value in SCRATCHBIT
	sta SCRATCHBIT
	jmp samplePoint         ;loop for next column

nextRow	ldy TY                  ;increment row count
	iny
	cpy #32                 ;32 rows done?
	beq sampleDone
	sty TY                  ;prepare for next row

	{set SCRATCHBIT & SCRATCHMASK according to even and odd rows}
	{increment SCRATCHPTR by 16 on even rows}
	{increment MASK by 32}
	{increment PROJECTION ADDRESS by 32}
	{increment PROJECTION BIT-MASK by 32}

	jmp sampleRow           ;loop for next row

sampleDone	rts


Note: “{…}” indicates trivial code and tasks skipped for brevity.

Thanks to our data organisation, we can iterate over consecuitive pixels by simple indexed addressing. It would have been nice to use the BIT instruction for testing the mask state, but there is no indexed addressing mode available for BIT on the MOS 6502. It would have been so nice: we have to encode two crucial states of the mask array (“skip” and “set uncondinally”) and the BIT instruction transfers bit 7 (the sign-bit) and the bit 6 into the zero and the overflow flags, respectively, for easy testing by BEQ/BNE and BVC/BVS! Ideal for our task. But — Alas! —, we may either rip out the 6502 from our beloved PET and replace it by the WDC 65C02(S), which features such an addressimng mode as an extension to the standard instruction set, or we have to resort to LDA and an extra ASL instruction.

Moreover, indexed addressing will take us only that far. 255 bytes (or $FF), exactly. Since we have to iterate over a total of 32 × 32 = 1024 sampling positions, we’ll have to update our various base addresses at some point. We could have done this using pointers in the zero-page and indirect indexed address mode (as in, “LDA (zpg),Y”, etc.), but, in order to save a CPU few cycles, we’ll choose to go down the road of self-modification. It doesn’t really matter where we write our updated addresses to, and this way we can save a few cycles on each of the 1024 iterations (LDA absolute,X takes 4 cycles, whereas LDA (zpg),Y takes 5.) Wherever there is the quite impossible address $FFFF (which is actually the high-byte of the hard-wired IRQ vector), we’ll replace this by the respective base address, as annotated by “MASK”, “PROJECTION ADDRESS” and “PROJECTION BIT-MASK”.

Rendering the result in the scratch area to screen codes for display purpose is as easy looking up the respective PETSCII characters our lookup table (see above) and replacing our made-up bit-vectors by actual Commodore screen codes that we can write to the video memory.

render
	ldx #0
renderLoop	
	ldy SCRATCH,x
	lda petscii,y
	sta SCRATCH,x
	inx
	bne renderLoop
	rts

The final step is left to a separate routine, which copies this as rapidly as possible to screen memory. The reason for this is that we’re doing this for the Commodore PET and especially for its original incarnation, the PET 2001. This features slow SRAM memory with an access time of 1MHz, which is the same as the CPU cycle time, leaving no gap for an exclusive access by the video circuitry. Hence, every time a bus conflict occurs when the CPU and the video circuitry access the video RAM at the same time, we get “snow” on the screen. To avoid this, we’ll have to restrict our acces to video RAM to the the vertical retrace interval of the screen (V-BLANK), when video is off. Which can be done by taking over the system interrupt, which triggers exactly at the start of V-BLANK.
Our efforts won’t be lost to more recent editions of the PET, featuring faster dynamic RAM, where this V-sync will prevent “screen tearing”, the nemesis of all computer animation.

There are some obvious opportunities to improve on our sampling routine in terms of run time and cycle counts, but we’re kind of relaxed towards this, since (a) we’ll have to wait for V-BLANK anyways (and we’re hardly to go to win an entire 60Hz frame), and (b) our rendering speed is already quite fine as-is.

Animation

So, how are we going to animate this into a spinning globe?

As may have become obvious from the above code example, we’re not iterating over the map data. Rather, we’re iterating over our lookup tables, retrieving offsets. The only thing that changes for another animation frame is the point of origin for lookup into the map data. The rules of projection and the related transformation, effecting the displacement, stay the same. So we just add 4 to our base address (where there’s the hard-coded address label “worldData” in our above code example) to shift this by one column, and we’re ready for our next animation frame.

sampleDone	ldy ANIMCOUNT        ;animate
	dey                  ;decrement
	beq sampleReset      ;have we reached the border?
	sty ANIMCOUNT        ;no, add 4 to lookup origin
	lda readWData+1      ;subtract 4
	clc
	adc #4
	sta readWData+1
	lda #0
	adc readWData+2
	sta readWData+2
	rts
sampleReset	lda #<worldData   ;reset to first frame (base address)
	sta readWData+1
	lda #>worldData
	sta readWData+2
	lda #ANIMFRAMES
	sta ANIMCOUNT
	rts

There’s still an issue left: namely, what happens when we reach the far Eastern portions of the world map, where we’re meant to wrap around?

In a higher level language, we’d probably resort to modular operations. Something like,

mapAddr = displacementMap[row * 32 + column];
mapByte = worldData[(mapAddr + animCount * 4) % worldData.length];
// ...

However, we can’t do this in time critical machine code, since this would put an end to our nice serial indexing scheme. Instead of this, we had to compute addresses by a series of additions and comparisons, checks and subsequent subtractions, for each of our 1024 sampling points.

The solution is an easy one, though: just copy the required overlap to the end of the map data. — Done.
(And, yes, this is also relevant for time-critical higher language code.)

worldData
	.byte $00,$00,$00,$00
	.byte $00,$00,$00,$00
	.byte $00,$00,$00,$00
	.byte $00,$00,$00,$00
	.byte $02,$00,$00,$00
	         ...
	.byte $02,$00,$00,$04
	.byte $02,$00,$00,$06
	.byte $02,$00,$10,$00

worldWrap                             ;repeat…
	.byte $00,$00,$00,$00
	.byte $00,$00,$00,$00
	.byte $00,$00,$00,$00
	.byte $00,$00,$00,$00
	.byte $02,$00,$00,$00
	         ...

And, in order to rotate in the right direction, towards the East, we will actually start at the far end of the world map (just before the wrap) and subtract 4 on each animation frame, rather than adding 4 (as exemplified above).

And this is how we spend a total of 4.8 KB on rendering just 384 bytes of map data. :-)

PS: For how to generate a displacement map for a spherical projection, see for example the related article “Sphere Mapping” by Frédéric Goset.