pushing and popping

pushing and popping

Postby bithead » Tue May 27, 2014 6:14 am

Looking over the Epiphany architecture reference, I see some things that confuse me.

There is a register that is designated as a stack pointer (r13). There are no instructions that are identified as "push" and or "pop," but that's not a big deal, we have some flexible addressing modes.

LDR and STR both feature a "post modify register offset" address mode that looks useful. If you wanted your stack pointer to always point at the next free spot, push and pop would look like this:

; pushing and popping r0

push: str r0,[r13], 0x1

pop: sub r13, r13, 0x2
ldr r0, [r13]

to do a stack, we want BOTH post and pre increment/decrement, or one of those operations becomes two instructions.

I could be wrong, I could be making a fool out of myself in front of everyone here, and it could also be that I'm expecting too much general-purposeness out of what is essentially a floating point hotrod, but I'm playing around with doing a Forth port to the Epiphany architecture, and ease of stack manipulation makes a big difference there.

ALSO:

The way I read the architecture document, the immediate value used for the increment or decrement is scaled according to the size of the data being moved. So if you were only moving one byte/word/double, you'd only ever use an inc/decrement of 1.

Is that correct? This means if we need to do math on the stack pointer directly, it's always being counted as bytes, but if we use the post inc/decrement address mode, the factor changes.
bithead
 
Posts: 9
Joined: Thu May 22, 2014 5:30 am
Location: West Seattle

Re: pushing and popping

Postby timpart » Tue May 27, 2014 7:02 am

bithead wrote:Looking over the Epiphany architecture reference, I see some things that confuse me.

There is a register that is designated as a stack pointer (r13). There are no instructions that are identified as "push" and or "pop," but that's not a big deal, we have some flexible addressing modes.

Yes you have to use those.

bithead wrote:to do a stack, we want BOTH post and pre increment/decrement, or one of those operations becomes two instructions.

I could be wrong, I could be making a fool out of myself in front of everyone here, and it could also be that I'm expecting too much general-purposeness out of what is essentially a floating point hotrod, but I'm playing around with doing a Forth port to the Epiphany architecture, and ease of stack manipulation makes a big difference there.

I've been toying with a homebrew Forth too, and I couldn't find any way around this need for two instructions in one direction.

bithead wrote:The way I read the architecture document, the immediate value used for the increment or decrement is scaled according to the size of the data being moved. So if you were only moving one byte/word/double, you'd only ever use an inc/decrement of 1.

Is that correct? This means if we need to do math on the stack pointer directly, it's always being counted as bytes, but if we use the post inc/decrement address mode, the factor changes.


Yes that's correct.

One other thing to look out for if you are using the official R13 stack is that interrupt routines may do anything they like to the unused area. So never save data without moving R13 in the same instruction, or changing R13 first then saving the data.

Tim
timpart
 
Posts: 302
Joined: Mon Dec 17, 2012 3:25 am
Location: UK

Re: pushing and popping

Postby bithead » Tue May 27, 2014 3:42 pm

I've been toying with a homebrew Forth too, and I couldn't find any way around this need for two instructions in one direction.


I'm glad I was able to figure that out.

One other thing to look out for if you are using the official R13 stack is that interrupt routines may do anything they like to the unused area. So never save data without moving R13 in the same instruction, or changing R13 first then saving the data.


I'm not using the official r13 stack -- I figured I was going to approach my Forth implementation from the point of view that everything is MINE. With that assumption out of the way, I took over the first 8 registers for various things, so that I could use as many 16 bit instructions as I could in order to be space efficient.

The plan was to provide interrupt handling in the Forth context, although that part is just a barely conceived thought.

I have to say that the choice of using a link register for subroutine calls rather than directly pushing onto a return stack makes certain inner Forth things very easy and clean, although I was surprised that no time was spent on subroutine calling conventions in the manual, although it simple to figure out that you'd better preserve the current link register somewhere if you want to make any subsequent subroutine calls. Maybe it's common enough in embedded systems that it goes without saying, but I only dabble in embedded stuff (and this Parallella thing). My day job involves writing code for the iPhone, which is farther away from embedded programming than one might think.
bithead
 
Posts: 9
Joined: Thu May 22, 2014 5:30 am
Location: West Seattle

Re: pushing and popping

Postby timpart » Wed May 28, 2014 6:58 am

bithead wrote:I'm not using the official r13 stack -- I figured I was going to approach my Forth implementation from the point of view that everything is MINE. With that assumption out of the way, I took over the first 8 registers for various things, so that I could use as many 16 bit instructions as I could in order to be space efficient.

The plan was to provide interrupt handling in the Forth context, although that part is just a barely conceived thought.

I have to say that the choice of using a link register for subroutine calls rather than directly pushing onto a return stack makes certain inner Forth things very easy and clean, although I was surprised that no time was spent on subroutine calling conventions in the manual, although it simple to figure out that you'd better preserve the current link register somewhere if you want to make any subsequent subroutine calls.


Yes use the registers however you wish. Though if you want to be able to call things in other languages you'll have to save and restore certain registers in the wrapper.

I've wondered about interrupts too. The handler has to save and restore STATUS as well as any registers used. The awkward part from the Forth virtual machine is that the interrupt can happen at any time, not just when you have finished doing a word.

As a general hint macros are pretty easy to do and let you put repetitive things like getting second entry in stack into an easy package. Also makes redesign easier if you change your mind!

Tim
timpart
 
Posts: 302
Joined: Mon Dec 17, 2012 3:25 am
Location: UK

Re: pushing and popping

Postby bithead » Wed May 28, 2014 4:03 pm

I've wondered about interrupts too. The handler has to save and restore STATUS as well as any registers used. The awkward part from the Forth virtual machine is that the interrupt can happen at any time, not just when you have finished doing a word.


I was considering doing what amforth does -- the interrupt routine sets a flag and saves identifying information in a global, returning from the interrupt as fast as possible. The forth inner interpreter then checks that global and calls the appropriate forth word for that interrupt.

Of course, consideration is quite different from implementation, so I could throw all of that out.

As a general hint macros are pretty easy to do and let you put repetitive things like getting second entry in stack into an easy package. Also makes redesign easier if you change your mind!


I've macro'd my code to the point where it looks like the epiphany is a forth processor with cool instructions like "nip," "drop," "next," and other fun things.

Of course, they can also be a way to hide inefficiencies from yourself. You should have seen the DOS 4.0 code that IBM put out. It was hard to believe that it was still x86 assembler. And the resulting code was slow and inefficient.
bithead
 
Posts: 9
Joined: Thu May 22, 2014 5:30 am
Location: West Seattle

Re: pushing and popping

Postby notzed » Wed Jun 04, 2014 2:33 am

FWIW most of the C abi and the way LR works comes from the ARM - so you've been using it but just didn't need to know about it. I think adapteva could've done a bit better to suit the arch (and code-size constraints) such as r7 for sp but it's irrelevant if you're doing your own thing.

The ABI does modify the ARM stack convention so that it always has 8 bytes above SP free; which does let you get away without the need for the separate arithmetic for leaf functions (which can use that 8-bytes for scratch/storing something).

e.g.

strd r4,[sp],#-1
strd r6,[sp]
...
ldrd r6,[sp],#1
ldrd r4,[sp]
rts

May not be of use to forth though.

(been a looong time since i looked at forth so apologies if none of this matches modern forth or my memory).

Also your posts have stack going up, but stack growing downwards lets you pick 0 ... 7 with 16-bit instructions (for hard-coded picks), negative offsets require 32-bit instructions.

It may not fit with the forth idea of an argument stack but if you can keep an 8-byte aligned stack it's probably worth it otherwise you end up with 2x the code-size just on register save/restore. Or just have a different stack for register saves vs the forth argument stack.

On interrupts, a global flag polled in an inner loop could be placed into a reserved register to speed it up/save the need to use scratch. Since you have total control over the execution of the machine maybe you could re-direct the interpreter from the interrupt 'implicitly' and use this to avoid the need for the poll entirely.

Simple idea:
; r62 = interrupt/system stack pointer
; r63 = inner loop start address
mov r63,#inner_loop
inner_loop:
... do the inner loop
jr r63

Normal operation, it just loops. An ISR could then just redirect it:

isr_dma0done:
str r63,[r62],#-1
mov r63,#dma0done
rti

Save it so it can handle nested interrupts. This simple example loses any priority though.

dma0done:
.. does whatever it does (alternative or redirect of inner loop)
gid
sub r62,r62,#-4
ldr r63,[r62]
gie
jr r63

Well ... maybe something like that might work.
notzed
 
Posts: 331
Joined: Mon Dec 17, 2012 12:28 am
Location: Australia

Re: pushing and popping

Postby timpart » Wed Jun 04, 2014 12:59 pm

notzed wrote:Also your posts have stack going up, but stack growing downwards lets you pick 0 ... 7 with 16-bit instructions (for hard-coded picks), negative offsets require 32-bit instructions.

Good point. Unfortunately in my implementation a lot of the time I want to pull something off the stack into a register. I use a full stack and the displacement post modify load e.g. LDR R0,[R1],#1. This only comes in a 32bit version. (Well e-as thinks so, and no 16 bit form the the Arch manual.)
notzed wrote:It may not fit with the forth idea of an argument stack but if you can keep an 8-byte aligned stack it's probably worth it otherwise you end up with 2x the code-size just on register save/restore. Or just have a different stack for register saves vs the forth argument stack.

I've just used the standard API stack for register saves and don't use it in my Forth otherwise. As you say the lower numbered registers are handier. I made some of the stacks in my implementation 8-bytes aligned. The data stack by long convention keeps a double word as two single words so I kept that 4-byte aligned. The control stack for compiling structured constructs is entirely up to the implementation, so I made that go in 8s.
notzed wrote:On interrupts, a global flag polled in an inner loop could be placed into a reserved register to speed it up/save the need to use scratch. Since you have total control over the execution of the machine maybe you could re-direct the interpreter from the interrupt 'implicitly' and use this to avoid the need for the poll entirely.

Simple idea:
; r62 = interrupt/system stack pointer
; r63 = inner loop start address
mov r63,#inner_loop
inner_loop:
... do the inner loop
jr r63

Normal operation, it just loops. An ISR could then just redirect it

Yes nice idea I was considering something similar for a debugging mode. My inner interpreter isn't compatible with the approach you've given, but I sure someone will benefit.

Tim
timpart
 
Posts: 302
Joined: Mon Dec 17, 2012 3:25 am
Location: UK


Return to Assembly

Who is online

Users browsing this forum: No registered users and 1 guest