Summary of the video https://www.youtube.com/watch?v=ttJZjP0p_uE
I have heard assembly code is the closest of how humans can write in a fashion that machines can understand. I have never understood how. Here's the bit of how the translation occurs from assembly instruction to machine code.
In ARM Assembly, the following is the translation from assembly code to binary. After looking at each section in detail, we will try to translate ARM operations into binary operations that machine (ARM chip in this case) understands.
| 31:28 |
27:26 |
25 |
24:21 |
20 |
19:16 |
15:12 |
11:0 |
| cond |
op |
I |
cmd |
S |
Rn |
Rd |
Src2 |
Cond
| Opcode [31:28] |
Mnemonic extension |
Interpretation |
Status flag state for execution |
| 0000 |
EQ |
Equal / equals zero |
Z set |
| 0001 |
NE |
Not equal |
Z clear |
| 0010 |
CS/HS |
Carry set / unsigned higher or same |
C set |
| 0011 |
CC/LO |
Carry clear / unsigned lower |
C clear |
| 0100 |
MI |
Minus / negative |
N set |
| 0101 |
PL |
Plus / positive or zero |
N clear |
| 0110 |
VS |
Overflow |
V set |
| 0111 |
VC |
No overflow |
V clear |
| 1000 |
HI |
Unsigned higher |
C set and Z clear |
| 1001 |
LS |
Unsigned lower or same |
C clear or Z set |
| 1010 |
GE |
Signed greater than or equal |
N equals V |
| 1011 |
LT |
Signed less than |
N is not equal to V |
| 1100 |
GT |
Signed greater than or equal |
Z clear and N equals V |
| 1101 |
LE |
Signed less than or equal |
Z set or N is not equal to V |
| 1110 |
AL |
Always |
any |
| 1111 |
NV |
Never (do not use!) |
none |
SOME NOTES
-
ADD operation takes Opcode 1110 AL for always.
- Flag bit is usually the previous operation that results in some bit set. In
ADDEQ it will execute if the previous instruction set Z flag to true (1).
op
I
- It stands for immediate
- Example,
ADD, R1, R2, #0x28
- If a constant / label is used, then the I field is set to 1
cmd
- This is the operation that we are well aware of
- It is ARM data processing instructions
| Opcode [24:21] |
Mnemonic |
Meaning |
Effect |
| 0000 |
AND |
Logical bit-wise AND |
Rd := Rn AND Op2 |
| 0001 |
EOR |
Logical bit-wise exclusive OR |
Rd := Rn EOR Op2 |
| 0010 |
SUB |
Subtract |
Rd := Rn - Op2 |
| 0011 |
RSB |
Reverse subtract |
Rd := Op2 - Rn |
| 0100 |
ADD |
Add |
Rd := Rn + Op2 |
| 0101 |
ADC |
Add with carry |
Rd := Rn + Op2 + C |
| 0110 |
SBC |
Subtract with carry |
Rd := Rn - Op2 + C - 1 |
| 0111 |
RSC |
Reverse subtract with carry |
Rd := Op2 - Rn + C - 1 |
| 1000 |
TST |
Test |
Scc on Rn AND Op2 |
| 1001 |
TEQ |
Test equivalence |
Scc on Rn EOR Op2 |
| 1010 |
CMP |
Compare |
Scc on Rn - Op2 |
| 1011 |
CMN |
Compare negated |
Scc on Rn + Op2 |
| 1100 |
ORR |
Logical bit-wise OR |
Rd := Rn OR Op2 |
| 1101 |
MOV |
Move |
Rd := Op2 |
| 1110 |
BIC |
Bit clear |
Rd := Rn AND NOT Op2 |
| 1111 |
MVN |
Move negated |
Rd := NOT Op2 |
S
- Setting S here means, we want the status of operation
- For example,
ADDS R1, R2, R3 means, put on the status like whether the operation will be Zero, Negative, Carry (this will be shown in CPSR register - which we will cover but not in this article)
-
ADDS operation will set S status to 1 and let us start tracking the status.
Rn (19:16)
- It is called first source register
- In
ADD R1, R2, R3 Rn is R2
- Hence get's binary value of 0010 in 19:16
Rd (15:12)
- It is also called destination register
- In
ADD R1, R2, R3 Rd is R1
- Hence it gets binary value of 0001 in 15:12
Src2 (11:0)
-
Second Source: Can be a variety of things
a) Immediate
b) Register
c) Register-shifted Register
Immediate
-
rot is for rotation. Mnemonic for it is
ROR for rotate right.
- NOTE: It is subject to rotate right by twice the value in the rotate field
- 11:8 bits represent the amount of rotation to right of their immediate counterpart (7:0)
Register
| 11:7 |
6:5 |
4 |
3:0 |
| shamt5 |
sh |
0 |
Rm |
- shamt5 represents amount of shift whether left or right
- sh is the shift operators - ops table at the bottom
- Rm is the register of target whose values being shifted
Register-shifted Register
| 11:8 |
7 |
6:5 |
4 |
3:0 |
| Rs |
0 |
sh |
1 |
Rm |
- Rs is the register that holds the amount of shift
- Rm is the target register whose value is being shifted
sh table
| Instruction |
sh |
Operation |
| LSL |
00 |
Logical shift left |
| LSR |
01 |
Logical shift right |
| ASR |
10 |
Arithmetic shift right |
| ROR |
11 |
Rotate right |
STARTING WITH THE EASY ONE ADD R5, R6, R7
Let's unpack one section at a time
- cond is ALWAYS hence 1110, since there's no condition to prevent ADD being done.
- op is 00
- I is 0 (there's no immediate values here)
- cmd is ADD which translates to 0100
- No Status indicator ADD(S) S is omitted, hence S is 0
- Rn is source, hence R6, 0110
- Rd is destination, hence R5, 0101
- shamt5 is 00000, since there's no shift
- Sh is 00 as there's no shift
- Rm is src2 hence, 7. 0111.
Combining them leads to
| 31:28 |
27:26 |
25 |
24:21 |
20 |
19:16 |
15:12 |
11:7 |
6:5 |
4 |
3:0 |
| 1110 |
00 |
0 |
0100 |
0 |
0110 |
0101 |
00000 |
00 |
0 |
0111 |
| cond |
op |
I |
cmd |
S |
Rn |
Rd |
shamt5 |
sh |
N/A |
Rm |
Nicely formatted binary here:
1110 0000 1000 0110 0101 0000 0000 0111
Care to convert to hex?
0xE0865007
SLIGHTLY HARDER: ADD R5, R6, R7, LSR #4
- we pick immediate variety for shift operations since because
#4 is a literal value
- Most are the same except fields 11:0
-
LSR has sh code as 01
-
LSR amount is 4 so shamt5 is 00100
- Rm is 7, hence 0111
| 31:28 |
27:26 |
25 |
24:21 |
20 |
19:16 |
15:12 |
11:7 |
6:5 |
4 |
3:0 |
| 1110 |
00 |
0 |
0100 |
0 |
0110 |
0101 |
00100 |
01 |
0 |
0111 |
| cond |
op |
I |
cmd |
S |
Rn |
Rd |
shamt5 |
sh |
null |
Rm |
LET'S DO MORE: ADD R0, R1, #42
- The third one is immediate, hence
- I is set to 1
- Src2 becomes immediate format (rot for 11:8 and immediate value 7:0)
| 31:28 |
27:26 |
25 |
24:21 |
20 |
19:16 |
15:12 |
11:8 |
7:0 |
| 1110 |
00 |
1 |
0100 |
0 |
0001 |
0000 |
0000 |
00101010 |
| cond |
op |
I |
cmd |
S |
Rn |
Rd |
Rot |
imm8 |
BRING SOME MORE! SUB R2, R3, #0xFF0
- Rd is 2, Rn is 3, imm - 0xff0
- SUB has 0010 code.
- OH NO, BUT #0xFF0 does not fit in 8 bit.
- That's ok. That's what rot is for.
- 0xFF0 is 0000 0000 0000 0000 0000 1111 1111 0000
- 0xFF is 0000 0000 0000 0000 0000 0000 1111 1111
- How many shift to right will make 0xFF the 0xFF0?
- 1 shift right is 1000 0000 0000 0000 0000 0000 0111 1111
- Following? Let's shift right a little more.
- 4 shift right is 1111 0000 0000 0000 0000 0000 0000 1111
- 8 shift right is 1111 1111 0000 0000 0000 0000 0000 0000
- guess what? it takes 24 shift right to get 0xFF0!
- So, rot should be 12 since by our rule the actual rotation is twice the value at rot.
- Hence, the 11:8 bit values will be
1100 and 7:0 1111 1111
- which is just a representation of 0xff0 into 8 bit number combined with rotation.
| 31:28 |
27:26 |
25 |
24:21 |
20 |
19:16 |
15:12 |
11:8 |
7:0 |
| 1110 |
00 |
1 |
0010 |
0 |
0011 |
0000 |
1100 |
11111111 |
| cond |
op |
I |
cmd |
S |
Rn |
Rd |
Rot |
imm8 |
WHAT ABOUT THIS? LSL R0, R9, #7
- WAIT WAIT... LSL is not in the command table. How am I supposed to put in the bit field 24:21?
- Thanks Rakesh, the creator of the video: Basically LSL is equivalent to this:
MOV R0, R9, LSL #7
- Wait again... Rakesh says R9 is not Rn... Hm.. I thought it would be the same as how SUB was done above.
- In the MOV operation, that's not the case, as per user guide armasm user guide page 333
MOV R0, R9, LSL #7 applies to the following syntax: MOV{S}{cond} Rd, Operand2 where operand2 is (according to page 244 the same guide) can be Register with optional shift.. Hence, on page 246 of the guide it says, register with optional shift, is Rm{, shift}.
- Still following?
- Hence, R9 here is Rm, where Rm is the register holding the data for the second operand.
- Hence, Rn here is 0 and Rm is 9
- (BY THE WAY THE REFERENCE I'M TALKING ABOUT IS armasm User Guide Version 6.6) - the latest is here
| 31:28 |
27:26 |
25 |
24:21 |
20 |
19:16 |
15:12 |
11:7 |
6:5 |
4 |
3:0 |
| 1110 |
00 |
0 |
1101 |
0 |
0000 |
0000 |
00111 |
00 |
0 |
1001 |
| cond |
op |
I |
cmd |
S |
Rn |
Rd |
shamt5 |
sh |
null |
Rm |
OK TAKE A BREAK AND COME BACK! ROR R3, R5, #21
- GUESS WHAT.. SHIFT AGAIN! Which means Rm is 5
- This is equivalent to
MOV R3, R5, ROR, #21
- Same translation step for
LSL above..
| 31:28 |
27:26 |
25 |
24:21 |
20 |
19:16 |
15:12 |
11:7 |
6:5 |
4 |
3:0 |
| 1110 |
00 |
0 |
1101 |
0 |
0000 |
0011 |
10101 |
11 |
0 |
0101 |
| cond |
op |
I |
cmd |
S |
Rn |
Rd |
shamt5 |
sh |
null |
Rm |
KEY TAKEAWAY:
- There's no one rule for all in the translations. Sometimes, you have to look up command table sometimes you will face operation that are not in one table hence, need to break down the command.
- But, all should be translated to binary otherwise, machine won't understand! So, let's stick to the basics and see if we can translate!!!!!
- Good news and bad news: You and I have learned how to translate,, not entirely but seen a bit of it... But these translation steps will be also different in A64 architecture but... we learned how to apply our knowledge in some way... Some methods must be similar... must be..
OK AFTER YOUR DINNER... LSR R4, R8, R6
- This time the shift amount is in R6.
- Does that make R8, the source register the Rn?
- NOPE!!
- This is equivalent to
MOV R4, R8, LSR, R6
- R6 is Rs haha! Found it!
- Rm is 8 hohoho
| 31:28 |
27:26 |
25 |
24:21 |
20 |
19:16 |
15:12 |
11:8 |
7 |
6:5 |
4 |
3:0 |
| 1110 |
00 |
0 |
1101 |
0 |
0000 |
0100 |
0110 |
0 |
01 |
1 |
1000 |
| cond |
op |
I |
cmd |
S |
Rn |
Rd |
Rs |
N/A |
sh |
N/A |
Rm |
PHEW LET'S SLEEP AFTER THIS... ASR R5, R1, R12
- What is Rd, Rn, Rm, Rs?? Is some of them 0? Which one?
- Answer below:
| 31:28 |
27:26 |
25 |
24:21 |
20 |
19:16 |
15:12 |
11:8 |
7 |
6:5 |
4 |
3:0 |
| 1110 |
00 |
0 |
1101 |
0 |
0000 |
0101 |
1100 |
0 |
10 |
1 |
0001 |
| cond |
op |
I |
cmd |
S |
Rn |
Rd |
Rs |
N/A |
sh |
N/A |
Rm |