Gerardo Enrique Arriaga Rendon
Posted on September 30, 2021
Two weeks ago, I started learning assembly for my Software Portability and Optimization class. Our professor explained us that, to get the most of our processors, we need to go the lowest level and manage the most heavy processing parts of our program. He also mentioned that portable software is quite difficult to make, due to the higher level assumptions that programmers make when writing code on a specific architecture, and thus, to get rid of those assumptions, you need to go the lowest level and manage those assumptions as specific cases.
Our motivation for learning assembly is clear: we want to access the low level parts of our program, so we can fine tune. That way, we can keep our high level code, and we only modify the bits of the program that require a change to be faster or portable (or both).
Of course, learning assembly for the first time ever with the current architectures available is something of a hard task, since modern processors have more mechanisms to be more efficient, and thus more complex overall. While it is true that learning assembly is not necessary for an average programmer, someone who is interested on the portability aspect, or even just a low level concept, would benefit from learning assembly and the concepts around it. Thus, our professor decided that, instead of outright learning ARM assembly, or x86 assembly, we would learn the assembly language for the 6502 processor.
We started learning what opcodes are and which ones are supported by the processor. We also learned how to do very basic math in the processor, despite lacking the support for multiplication and division. Overall, we were making our first steps with the 6502.
One of our exercises for learning with the 6502 was a small exercise that tells us to draw 4 lines in the screen, each line for each side of the screen.
For my attempt, I wrote the following code:
define GREEN $5
define BLUE $6
define YELLOW $7
define PURPLE $4
LDA #$00 ; start green line
STA $0
LDA #$02
STA $1
LDA #GREEN
greenLoop: STA ($00), Y
INY
CPY #$20
BNE greenLoop
; end green line
LDA #$E0 ; start blue line
STA $0
LDA #$05
STA $1
LDA #BLUE
LDY #$0
blueLoop: STA ($00), Y
INY
CPY #$20
BNE blueLoop
; end blue line
LDA #$20 ; start yellow line
STA $0
LDA #$02
STA $1
LDY #$0
yellowLoop: LDA #YELLOW
INX
STA ($00), Y
TYA
CLC
ADC #$20
TAY
BCC yellowLoopCheck
INC $01
yellowLoopCheck: CPX #$1E
BNE yellowLoop
; end yellow line
LDA #$3F ; start purple line
STA $0
LDA #$02
STA $1
LDY #$0
LDX #$0
purpleLoop: LDA #PURPLE
INX
STA ($00), Y
TYA
CLC
ADC #$20
TAY
BCC purpleLoopCheck
INC $01
purpleLoopCheck: CPX #$1E
BNE purpleLoop
; end purple line
Since it looks very lengthy, I will break down the code in several chunks so I can talk about it properly.
The first four lines are called assembler directives, which are instructions for the assembler to carry out during translation from assembly language to machine language. Directives are not part of the instruction set of the processor, but a part of the assembler that one uses. There could be an assembler that supports certain directives that no other assembler can, and there could be an assembler that has no directives (and that very few people would use).
The directive used for the four lines is the define
directive, which essentially associates a "find-and-replace" symbol with a replacement. In the first line, for example, we associate the string GREEN
with $5
. This means that when the assembler finds GREEN
in our code, it will replace it with $5
prior to completion of the translation.
These directives are for associating color values with their names so that it is easier to understand what those numbers represent.
Now, I will start explaining the first block of assembly code:
LDA #$00 ; start green line
STA $0
LDA #$02
STA $1
LDA #GREEN
greenLoop: STA ($00), Y
INY
CPY #$20
BNE greenLoop
; end green line
This block of code draws a green line on the top of a 32 by 32 pixel screen.
I will not be explaining every little line of code, specially since they do not offer much context on their own. They make sense when viewed as a whole. However, it would be a good idea to follow with this neat opcode reference of the 6502 while reading the code snippets.
The first five lines are the set up: we are preparing the memory location that we are going to be accessing later in the code. We create a pointer at location $0000, and that pointer will point at the first byte of our screen, $0200. Since the 6502 has a little-endian architecture, we need to store the lower bytes first in the memory. We then load the accumulator register with the number that represents the color green. We use the accumulator as our brush.
The next line has something called a label. A label in assembly programming is used to name a specific address. The label will be replaced with an actual address set by the assembler. We label the line STA ($00), Y
with greenLoop
. This way, we can jump back to this instruction with the label.
The code after the label does the following: draws one pixel in the cell with address ($00)
+ Y. The parenthesis stand for dereferencing a cell's contents. Thus, we will treat the cell with address $00
as a pointer. Since $00
contains $0200
, the resultant address of the cell is $0200 + Y
. Since Y
is 0, we will store whatever the accumulator register contains in the cell with address $0200
, thus drawing our first pixel with the color green.
After that we increase the Y register. We use Y
as an offset from $0200
, since this let us be more flexible to what cell we can access.
We then compare the register Y
with the value $20
, which stands for 32 in hexadecimal. The reason for that is because the screen in 32 pixel wide, so we only want to draw pixels 32 times, one for each cell in the line. Thus, we check whether the register Y
is equal to 32.
If Y
is not equal to 32, we would like to jump to another location, that being the location labelled with greenLoop
. This is one of the uses of labels. It lets us specify locations without having to use an absolute or relative offset that may change when we add more lines to our code. In the case that Y
is indeed equal to 32, the instruction will not jump to greenLoop
, instead it will just continue to the next instruction at hand.
You may notice that this creates a sort of loop, just like in high level languages. In JavaScript, for example, we may have something like this:
let a = 4; // number for the color green
for (let y = 0; y < 32; ++y) {
screen[y] = a;
}
Thus, we draw a green line across the top part of the screen. If we checked the next part, you would notice some similarities:
LDA #$E0 ; start blue line
STA $0
LDA #$05
STA $1
LDA #BLUE
LDY #$0
blueLoop: STA ($00), Y
INY
CPY #$20
BNE blueLoop
; end blue line
This code follows the same principle as the code block I explained before. There is a small difference between this one, however. In the set up, we save the values $E0
and $05
in the locations $0000
and $0001
. The reason for this change is that we want to start pointing at a different cell on the screen.
In the previous block, we were pointing at the cell that was at the top left, however, if we want to draw a line on the bottom part of the screen, we cannot reach with our current address, since we can only reach 256 cells from the cell at address $0200
, with our register Y
working as the offset.
We need to set a different address where we can reach the last line in the screen, that location being at address $05E0
. The reason for this has to do with the way that the screen is mapped out in memory. I will not go on a deep explanation of how the screen is laid out in memory, so if you are interested to know more about this, I suggest you check out this website.
In any case, we essentially set a new address to start drawing our line from, but the whole logic is the same. We need one more line, the LDY #$0
, so that we can reset our offset to 0 (we modified it in the previous code block).
With this two code blocks, we have two lines draw on the screen, one green on the top, and one blue on the bottom.
The next block is this one:
LDA #$20 ; start yellow line
STA $0
LDA #$02
STA $1
LDY #$0
yellowLoop: LDA #YELLOW
INX
STA ($00), Y
TYA
CLC
ADC #$20
TAY
BCC yellowLoopCheck
INC $01
yellowLoopCheck: CPX #$1E
BNE yellowLoop
; end yellow line
This block might be seem more complicated than the previous two, but it is a matter of breaking down the block into smaller sections. The first four lines might seem familiar, since they correspond to the set up of our code. We set the starting point (saved at $0000
) to be $0220
, which is essentially the first pixel of the second line in our screen. Our main objective is to draw a line on the side of the screen, so this requires a slightly different logic to advance across every pixel.
After the yellowLoop
label, we load the color yellow in our accumulator. The reason we do this after the label instead of before will be made clear later.
After that we increase our register X
by one. We will use X
as our counter, since Y
cannot work as a counter anymore. This reason will be explained later.
After that, we draw our pixel on the address $0220 + Y
. Since we are drawing the first pixel, Y
will be 0.
Then we are met with the following four lines:
TYA
CLC
ADC #$20
TAY
I grouped these lines since they actually do an action that some people might view as a single line statement.
What we are doing here is increasing our Y
register by $20, that is, by 32, so, why so many lines? Well, the 6502 has a very limited set of operations, and the operations that are allowed, only allow to do subtraction and addition on the accumulator register (the A
register). We cannot directly add numbers to Y
, but we can transfer the value of Y
to A
(TYA
operation), increase A
by 32 (ADC #$20
) and then transferring it back to the register Y
. The CLC
instruction is to clear the carry flag so that we don't accidentally add an extra 1.
So, what is the purpose of the Y
register, then? Well, as we have said before, the Y
is the offset from our point of reference (the one stored at $0000
). If we want to draw a pixel horizontally, that is, next to each other, we increase Y
by one. However, to draw pixels vertically, that is, one on top of each other, we need to increase Y
by 32. This is because of how the screen is laid out in memory. This is the reason we use the register X
to count how many times we have drawn. Before, we used Y
as both an offset and a counter, since it worked just fine for our purposes, but now that Y
is increased by another amount, counting how many pixels we have drawn has to be kept tracked by another cell, that being the X
register.
The next step is to check whether the addition has 'carried', since we might need to change our original point of reference. Since the screen is divided into 4 pages, that is, 4 groups of 256 bytes, we can only access one page within our original point of reference and our Y
offset. However, how can draw we a line from top to bottom if we only have access to 256 cells? After all, each line is 32 cells wide, and if every page is 256 cells in total, then the vertical length of a page is 8 cells, so that means we only have access to 8 cells within one page.
Thus, the only solution is to shift our point of reference so that is it found in another page. This is what the checking for the carry is for. If we carried, that means that we have covered all cells we could in this page, and we need to go to the next one. To go to the next page, we need to change our location from $0220
to $0300
. We can just increase the higher part of $0220
by one, and we would end up with $0320
. The higher byte of our address is found in location $1
, which is why we have the instruction INC $01
. Remember that we will only increase in the case that we have covered all cells accessible from within a single page.
Whether we increase the page or not, we still need to do one more check, and that is checking whether our counter is equal to $1E
, that is, 31. The reason for this is that we do not want to paint more than 30 pixels on the side, since the first and last pixel in the vertical line were already painted over by the green and blue line before. If we have not reached 31 yet, we will jump back to yellowLoop
.
The next code block deals with the exact same logic, but this time it will draw a purple line on the right side of the screen.
LDA #$3F ; start purple line
STA $0
LDA #$02
STA $1
LDY #$0
LDX #$0
purpleLoop: LDA #PURPLE
INX
STA ($00), Y
TYA
CLC
ADC #$20
TAY
BCC purpleLoopCheck
INC $01
purpleLoopCheck: CPX #$1E
BNE purpleLoop
; end purple line
The only important differences are the color we are using and the point of reference we are using (we point at $023F
this time), but overall, is the exact same logic.
Thus, when we run the whole program on a 6502 emulator, we will end up with an image like this.
An interesting study would be analysing the expected runtime of this program, after, we can quantify how long will each operation takes by referencing the opcode reference that annotates the amount of clock cycles an operation would take. If our processor ran at 1 Mhz, our code will take around 2,474 microseconds, which is just barely over 0.002 seconds. This might sound pretty fast, but in reality, we can do some optimizations that would reduce the runtime, although not by much, since the program is already simple enough!
If one of my readers is very familiar with the 6502 and knows some optimization tricks to reduce the code runtime or the code size, or even both, please, let me know, I would love to hear any suggestions.
Posted on September 30, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.