Assembly snippets

Assembly snippets

Postby DonQuichotte » Sat May 21, 2016 11:27 pm

Hello

Snippets are rather scattered, IMO.
This place is for gathering them.
It will be free will, free share under a PDF or FAQ format ;
we accept Epiphany assembly - if with a macro for C usage, it's a bonus.

***

Here is my 1st snippet, I call it modestly :roll: "PERFECT_ALIGN8".
It will align the code on a 64-bit boundary with an optimized number of instructions, whatever the program counter modulo 8 is:
Code: Select all
.balignw 4, 0x01a2
.balignl 8, 0xfc02fcef

The whole story is here https://parallella.org/forums/viewtopic.php?f=13&t=3679&p=17627#p17627

***

It may be useful for:
- the PAL guys (if they accept assembly ?! did not read everything) ; I did write a better 32-bit popcount routine recently
- the "integer" guys - most of the time I work with integers and bits.
- the compiler guys
- ourselves (the "assembly class")
User avatar
DonQuichotte
 
Posts: 45
Joined: Fri Apr 29, 2016 9:58 pm

Re: Assembly snippets

Postby DonQuichotte » Mon May 23, 2016 12:02 pm

Snippet #2: forcing a 4-byte B<cond> instead of the traditional 2-byte B<cond>

***
Edited after Andreas's answer:
you can simply "beq.l label" instead of all my messy code :shock: that's good news :)
Sorry Andreas I missed your answer somewhere.

I leave my code though, it shows some of the macro possibilities.
There are others in the official examples, just grep "hwloop" or "bitrev" for instance.

Code: Select all
beq.l label // force 32-bit branch instruction

***

Example with BEQ.

/* from this sample from e-objdump and the architecture manual - table 66 - we notice or deduce:
- 1st nibble (4 bits) = encoding of the instruction
- 2nd nibble = <COND> code ; ARM BEQ <=> x86 JZ, JE
- highest byte = 2-byte displacement, starting from the beginning of the instruction
- example: we jump from 7e8 to 7f8 and the encoded displacement is '08' ; from 7f4 to 7f8 => '02
7e8: 0800 beq 7f8 <_frame_dummy+0x24>
7ea: 200b 0002 mov r1,0x0
7ee: 200b 1002 movt r1,0x0
7f2: 4433 sub r2,r1,0
7f4: 0200 beq 7f8 <_frame_dummy+0x24>
7f6: 0552 jalr r1
7f8: 058b 0072 mov r0,0x72c
*/

Code: Select all
// force the 4-byte BEQ instead of the standard 2-byte ; 'forward' version
.macro JZf label
1:
    .byte 0b00001000 // instead of 0b00000000
    .byte ( \label - 1b ) / 2
    .byte 0, 0
.endm


Code: Select all
// force the 4-byte BEQ instead of the standard 2-byte ; 'before' version
.macro JZb label
1:
    .byte 0b00001000 // instead of 0b00000000
    .byte ( \label - 1b ) / 2
    .byte 0xff, 0xff
.endm


Tested under x86_64 e-gcc & e-objdump
I could not put both parts in a single macro ; the syntax below is unwanted
Code: Select all
.if \label >= 1b


This 4-byte forced instruction is another help for reducing the NOP wastes.
Moreover it's a tool for my funny little experience: using only 32-bit instructions.
Last edited by DonQuichotte on Mon May 23, 2016 4:02 pm, edited 1 time in total.
User avatar
DonQuichotte
 
Posts: 45
Joined: Fri Apr 29, 2016 9:58 pm

Re: Assembly snippets

Postby aolofsson » Mon May 23, 2016 2:36 pm

Not sure I understand what you are trying to accomplish here? Sorry if I misunderstood. You can control the length of the instruction with the ".l" suffix as I previously posted.

Example:
Code: Select all
        beq 0xf8        ;
        beq.l 0xf8      ;


Output:
Code: Select all
Disassembly of section .text:

00000000 <.text>:
   0:   7c00         beq 0xf8
   2:   7c08 0000    beq 0xfa
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: Assembly snippets

Postby DonQuichotte » Fri May 27, 2016 1:58 pm

Snippet #3: how to increase dual issue rate when not crunching any float

(Epiphany may compute one IALU and one IALU2 instruction in the same cycle,
with optimal conditions explained in the architecture manual)

- Instead of pushing/popping easy computable values like pointers,
choose register storage for frequent immediates.
The IALU2 pipeline accepts IMUL, IADD/ISUB, IMADD/IMSUB, but no immediates - only registers.

- Instead of mov ..., #0 replace with a imul...
- Instead of mov ..., #1 (or another easy value) replace with a iadd...
- Instead of lsl ..., #2 replace with a imul...

Examples below.

Code: Select all
Init:
#define I_M1 r63
#define I_0 r62
#define I_1 r61 // yes you can do "reg--;" et "reg++;" with "iadd/isub reg, reg, I_M1"  but I like I_1 for bit operations for example
#define I_4 r60

mov r63, #-1
mov r62, #0
mov r61, #1
mov r60, #4

...

CriticalLoopsOrHeavyComputing:

iadd R_PTR, R_PTR, I_1 // or I_4 or I_10... fitting your needs
recursive(...)
isub R_PTR, R_PTR, I_1

...
imul tile, tileN, I_4 // instead of lsl tile, tileN, #2
...
// replace a MOV register, #0 ! do a IMUL instead :)
imul tile, I_0, I_0
// replace a MOV register, #1 ! do a IADD instead :)
iadd tile, I_0, I_1
iadd tile, I_1, I_1
iadd tile, I_1, I_4 // etc
User avatar
DonQuichotte
 
Posts: 45
Joined: Fri Apr 29, 2016 9:58 pm

Re: Assembly snippets

Postby DonQuichotte » Tue May 30, 2017 12:33 am

When simulation is not enough.
Added a simple canvas for live low-level benchmarks with event timers information.
Demo with some 64-bit incrementation routines: unzip the *c.zip

Click on my github website, "SPMD-canvas" project.
User avatar
DonQuichotte
 
Posts: 45
Joined: Fri Apr 29, 2016 9:58 pm

Re: Assembly snippets

Postby jar » Tue May 30, 2017 2:16 am

DonQuichotte,

You're not using git the way it was intended to be used...I've never seen anything quite like your SPMD-canvas repository.

Redistributing the Epiphany documentation is fine, but you bundle it up into multiple zip files and track each one. It's much easier to view your code if you just track the files and do not distribute *.zip files.
User avatar
jar
 
Posts: 264
Joined: Mon Dec 17, 2012 3:27 am

Re: Assembly snippets

Postby DonQuichotte » Tue May 30, 2017 11:18 am

My github is special because donquichotte is really, really special ;)

That said, I've done my best to suit your wishes, jar.
You can access individual files now :)

Anyway:
- a zip is a unit ; if somebody's only interested in seeing a basic program with timers configurable via argv[1] and argv[2]: download the first zip.
If you don't care assembly, rather download the *b zip
- a zip contains the full tree - files, folders. With github I am annoyed with creating empty folders. It cannot - or tell me how ; I must create a dummy file into the folder ; and if I delete this dummy file github deletes my empty folder.
Any structured project contains some folders: bin, src... and of course the bin folder should be empty.
(Or maybe I should tell build.sh to create the bin folder if it does not exist... another compromise.)

In this way it's faster for me to download the everything in a zip ; github is not my cup of tea and more like an archive for me - I admit the doc/misc.txt is a personal reminder ^^
If one thousand people would follow Rossinante, Sancho Panza and me on my projects I'd certainly invest more. But it's OK for other advices, I could certainly do better, no problem.
User avatar
DonQuichotte
 
Posts: 45
Joined: Fri Apr 29, 2016 9:58 pm


Return to Assembly

Who is online

Users browsing this forum: No registered users and 1 guest