RAW and WAW hazard avoidance

Any technical questions about the Epiphany chip and Parallella HW Platform.

Moderator: aolofsson

Re: RAW and WAW hazard avoidance

Postby ysapir » Thu Aug 08, 2013 5:03 pm

An instruction should execute as soon as it can. I don't see how stalling the AND for having dual issue with the FADD (or a similar combination) can help in accelerating any program? In this case there will be no dual issue.
User avatar
ysapir
 
Posts: 393
Joined: Tue Dec 11, 2012 7:05 pm

Re: RAW and WAW hazard avoidance

Postby Gravis » Thu Aug 08, 2013 9:09 pm

@ysapir
can you confirm if the proposed method is in fact who it's done on the epiphany chip? i really would like to get this right for the emulator.
User avatar
Gravis
 
Posts: 445
Joined: Mon Dec 17, 2012 3:27 am
Location: East coast USA.

Re: RAW and WAW hazard avoidance

Postby timpart » Mon Jul 14, 2014 11:55 am

It seems from Notzed's experiments that a floating point instruction can delay a IALU instruction by dual issuing with it then discovering that the FP instruction's registers aren't ready. See Notzed's blog, the part after the "Update" .

Tim
timpart
 
Posts: 302
Joined: Mon Dec 17, 2012 3:25 am
Location: UK

Re: RAW and WAW hazard avoidance

Postby notzed » Thu Jul 17, 2014 1:29 pm

timpart wrote:It seems from Notzed's experiments that a floating point instruction can delay a IALU instruction by dual issuing with it then discovering that the FP instruction's registers aren't ready. See Notzed's blog, the part after the "Update" .

Tim


In a message somewhere here on the forums from Andreas he stated that both block if either one does and I think I read that before I got too far with my testing. I can't remember if i came up with a specific test to demonstrate it but i've tried so many over the last week i'm confident that's how it works.

Actually taking your earlier query is a good test case that confirms it:

Code: Select all
LDR R0,.....
AND R1,R1,R2 ; mask out sign bit to get absolute value
FADD Rx,Rx,R0 ; Does this dual issue with the AND and then delay it?
FADD Ry,Ry,R1 ; If the AND wasn't stalled this can execute immediately


If I run it on the hardware I get this timing out of it:

Code: Select all
  clock   idle   rsrv   ialu    fpu   dual  e1 st  ra st   rsrv loc fe loc ld  ex fe  ex ld  mesh0  mesh1
     43      0     44     14      2      1      9     19      3      1      0      0      6      0      0
less the overheads for the timing function:
  clock   idle   rsrv   ialu    fpu   dual  e1 st  ra st   rsrv loc fe loc ld  ex fe  ex ld  mesh0  mesh1
     35      0     36     10      0      0      9     16      3      1      0      0      6      0      0

Hard to read the shitty formatting but it takes 8 cycles,with 3 ra stalls and has a single dual-issue event.

I tried pasting the output from my tool but it was unreadable. But basically it shows:

Code: Select all
cycle of E1, instruction

3  movl r3,#0
4 nop
5 ldr ...
6  - stall -
7  - stall -
8 and r1,r2,r2  ** dual issue
8 fadd rx,rx,r0
9 - stall -
10 fadd rx,rx,r1


Total is 10-3+1 = 8 cycles with 3 stalls as for the hardware counters.

ldr+and can't dual issue (same pipeline)
and+fadd can dual issue based on their own register mix, so are enqueued together
fadd needs to wait for r0 so stalls, but it also stalls the and.
a further stall is required due to the ialu op to flop single cycle delay as in the manual (i had a panic for a sec there thinking i'd broken it but it matches the hardware and docs).
notzed
 
Posts: 331
Joined: Mon Dec 17, 2012 12:28 am
Location: Australia

Previous

Return to Epiphany and Parallella Q & A

Who is online

Users browsing this forum: No registered users and 12 guests