ABI / register usage / compiler change

Any technical questions about the Epiphany chip and Parallella HW Platform.

Moderator: aolofsson

ABI / register usage / compiler change

Postby aolofsson » Tue Aug 04, 2015 12:25 pm

A few years back we made the decision to split the registers between caller/callee as you see in the ABI today. This was made for a variety of reasons, compiler efficiency, balance, etc, but the decision process was not rigorous. This has been discussed before here:

viewtopic.php?f=43&t=1748&p=10829&hilit=callee#p10829
viewtopic.php?f=13&t=1578&p=9698&hilit=callee#p9698
viewtopic.php?f=23&t=549&p=3284&hilit=callee#p3284

At this point, the number of users is growing and if we ever want to make a change without angering a lot of people, this may be our last chance.

ABI Change proposal:
Make registers R32-R63 officially caller saved. (currently R32-R43 are callee saved)

Why:
-better energy/program efficiency for leaf routines, where the program will likely spend most of its time (check assumption?)
-ease of use for assembly programming ("do whatever you want with R32-R63")

Why not?
...

This is actually what I had specified back in 2011. Looked through my emails and found that I made the same arguments back in 2011. Someone must have convinced me otherwise. I can't find the email trail and can't remember what the argument was for the R32-R43 callee change?.:-)

Feedback much appreciated.

Andreas
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: ABI / register usage / compiler change

Postby dobkeratops » Tue Aug 04, 2015 3:43 pm

I don't have any hard data, but I was always a big fan of the MIPS ABI with the split caller/callee saved subset of registers, allowing for efficient Leaf functions as you mention. It also reserved a few as function arguments & return values. It did indeed make writing optimised assembly for small, critical functions (which tended to be leaf) very pleasant.
In most cases, calling & writing leaf functions involved no stack manipulation, which is a good thing IMO, leaving the load-store units dealing with global changes rather than confusing them with locals.

I would guess this is also a big deal squeezing code into a small amount of local memory too: you need less inlining, and have less bloat saving/restoring on function calls.

in the absence of hard data saying otherwise, it sounds better to go for a roughly 50:50 split between caller & callee saved.

non-leaf functions can still use the extra registers for intermediates between function calls.
larger leaf-functions can still the whole register range by actually saving.

I can also see a good case for reserving some for globals (localstore pointer, message queue ?).like 'thread-local storage' or values related to the current Task.
again MIPS had a 'global pointer' which was left pointing at global variables, allowing them to be accessed with compact addressing modes.

I gather the epiphany instruction set allows smaller instructions using the first 8 registers, what is their best use?
Should those also be split e.g. r0-r3=caller-saved, r4-r7=callee-saved, r8-r31=caller,r32-r55=callee ; r56-63=special purpose globals ?
maybe its' simpler to just keep 0-7 'scratch area'/return values & function arguments (2 return values, 6 args)

.. or is that getting too convoluted.
another way would be to use 'odd/even' registers as caller/callee
dobkeratops
 
Posts: 189
Joined: Fri Jun 05, 2015 6:42 pm
Location: uk

Re: ABI / register usage / compiler change

Postby timpart » Thu Oct 08, 2015 1:15 pm

I would recommend that you have someone recompile the compiler with several different ABI options and try them out with some substantial code examples. That way you could confirm or deny your suspicions with some hard facts.

This discussion of interrupts mentions that the C wrapper for interrupts saves all the caller saved registers to the stack, so increasing their number makes C based interrupt routines even more impractical.

When you mention banning external accesses I assume you mean executing code. Is that any off core access or just off chip? I'm wary of doing this. When you say expensive do you mean in executing time or chip silicon? A more radical suggestion might be to restrict code to the first 64K of memory; then the compiler wouldn't have to generate MOVT instructions for a function address that the linker will just make MOVT Rx,#0.

I'm not sure how many leaf functions are substantial. Also does using division in a function (which causes an internal division routine to be used) make that function non-leaf? Not sure of the compiler complexities here.

I suggest you also consider code size. Things like SP=R7 might have more benefits than the loss of that register for other uses.

Tim
timpart
 
Posts: 302
Joined: Mon Dec 17, 2012 3:25 am
Location: UK


Return to Epiphany and Parallella Q & A

Who is online

Users browsing this forum: No registered users and 1 guest

cron