Page 1 of 1

Register usage questions

PostPosted: Fri Sep 19, 2014 12:09 pm
by cmcconnell
As a neophyte when it comes to issues of assembly language optimisation, I have a few queries based on what I've gleaned from the documentation and various forum posts.

The task I've set myself is to optimise a particular assembly-language function which is constantly being called by its parent program. As well as optimising the body of the code, I think I need to pay attention to the overheads of function entry/exit. That is, I'd like to avoid the need to include register save/restore operations within the function, and also to encourage the compiler not to generate any around the function calls.

So, my first query relates to terminology - The documentation talks of registers being caller saved or callee saved. Is my interpretation of this, below, correct? -

Callee Saved : As the writer of a function, I can have no idea which registers will be in use at the point of any given call to my function. I therefore have to assume that any 'callee saved' register which I want to use may already be in use, and include save/restore operations within the function.

Caller Saved : Similarly, the compiler can't know what registers my function makes use of, so any 'caller saved' registers it is currently using at the point of a call must be saved and restored.

Then there are the compiler options -ffixed-<reg> and -fcall-used-<reg>. My interpretation of these is -

call-used : If applied to a callee saved reg will turn it into a caller saved reg, shifting the save/restore burden to the compiler. (And it also acts as a hint to the compiler that it would be better to avoid using that particular register across a function call, if possible. Thinking about it, presumably the compiler will always preferentially use callee saved registers across calls before caller saved ones??)

fixed : Simply forbids the compiler from using the given register. So I'm then free to do whatever I want with it in my function, and there will be zero overhead. I can also use it to create global variables or constants that live across function calls.

So, if I've got all that right, it would seem that I can avoid the need for saving and restoring registers within my function by using some combination of -
    caller saved regs
    callee saved regs,to which I have applied one or other of the above compiler options.

If I want to prevent the compiler from generating save and restore instructions, it looks like -ffixed-reg should be applied both to caller and callee saved registers. But if too many registers are 'stolen' in this way, I guess it could be counterproductive, causing extra instructions to be generated throughout the C code because the compiler is being forced to juggle with a limited working set of registers.

I'd appreciate any corrections to the various assumptions I've listed, plus any general tips relating to this area.


Re: Register usage questions

PostPosted: Sat Sep 20, 2014 2:16 am
by notzed
Your assumptions pretty much match my understanding.

Some other things I found whilst looking into exactly the same problem:

- I think fixed and call-used have some limitations. I played with one or the other and found they didn't do exactly what I told them to all the time, although i'm sorry i can't recall the details.
- There is also the ability to limit the compiler to using only the first 32-registers: -mhalf-reg-file.
- You can assign global or local variables to fixed registers within a compilation unit. This removes them from the register pool but leaves them available as that value. gcc manual: 6.44 Variables in Specified Registers, e.g. register int *foo asm ("r5"); I'm pretty sure I tested this with -mhalf-reg-file and the upper 32 registers can still be used.
- (i'm sure you're aware, but ...) Remember all execution paths executed need to be compiled with the same options, i.e. libc, elib, etc.

A combination of -mhalf-reg-file and specific global registers in r32+ might be the easiest way to see if it's worth looking into further for your application.

You might have better luck or more patience but in the end I just gave up fighting with the compiler and didn't really feel like moving everything to what is effectively a new abi. Usually when I thought i had something some code change would break it. And I just wasn't seeing the gains that would justify the effort compared to simply re-arranging the code such that each leaf function does a batch of work at a time which makes the the invocation and setup costs insignificant, and often the inner loop more efficient. LDS is only 1 cycle away so batching overheads can be small.

If the function is small and doesn't take long to execute then an inline C function will probably beat it no matter how optimised it is in isolation because not only does it save the function invocation entirely it can be scheduled fairly freely amongst the caller's other work. You can try inline asm for the same effect but i've found it's just another compiler fight I kept losing.

Re: Register usage questions

PostPosted: Tue Sep 23, 2014 2:16 am
by cmcconnell
Thanks for that.

One further query - Registers R28 - R31 : 'Reserved for constants'. I'm wondering just what that means in practice.

If my code path does not call into any library code that may expect R28-R31 to contain their predefined constants, then can I treat them just like regular 'callee saved' registers?

So long as I restore their values when I'm done, there doesn't seem to be a reason not to use them.

Re: Register usage questions

PostPosted: Tue Sep 23, 2014 7:41 am
by notzed
They could be potentially used by any interrupt code but if you're not using any (or write your own in asm) I don't think it should matter; but a future sdk and/or gcc might make use of them more. Then again it should only matter in compiled C, or calling C, or with C ISRs.

From what I could tell about 6 months ago r28 is the only one "assigned" so far, and its intended use is for a segment pointer to allow for relocatable code. But at that time I don't think gcc was fully using it. One will also probably be used to implement thread-local-storage, although tbh i think the unix thread/process model is a pretty poor fit for the hardware.

I presume anything like this is "unsupported" - although who knows what that means at the best of times.