RAW and WAW hazard avoidance

Any technical questions about the Epiphany chip and Parallella HW Platform.

Moderator: aolofsson

RAW and WAW hazard avoidance

Postby Gravis » Tue Aug 06, 2013 8:41 pm

i was curious as to how RAW and WAW hazards are known to an instruction. as far as i can figure, there are 64 bits to mark for which registers are being read and another 64 for which register will be written to by the previous instruction. then this lock is checked against the current instruction until the ones to be used are freed up (by the last instruction) at which point it proceeds with the current instruction.

is this correct or perhaps there is just a single set of 64 bits which ends up blocking RAR? is there a better way to do this?

anyway it goes, do instructions get delayed and then lock registers at the beginning RA pipeline stage?
User avatar
Gravis
 
Posts: 445
Joined: Mon Dec 17, 2012 3:27 am
Location: East coast USA.

Re: RAW and WAW hazard avoidance

Postby timpart » Wed Aug 07, 2013 6:36 am

Well here is one way it could work which I think comes up with the right answer.

Have a 64 entry boolean array to indicate register waiting for a write. All start as false.

Do the RA pipeline stage after E1 to E4 happening at the same time for other instructions.

In E1 / E2 /E4 if the instruction is completing here then set the waiting for a write of the destination register to false.

In RA check whether any of the registers being read are waiting for a write. If so stall the instruction here (effectively inserting a NOP into the pipeline) This avoids RAW.

Then in RA check whether the register being written to is waiting for a write. If so stall the instruction here (effectively inserting a NOP into the pipeline) This avoids WAW.

Then in RA set the waiting for a write of the destination register to true.

Tim
timpart
 
Posts: 302
Joined: Mon Dec 17, 2012 3:25 am
Location: UK

Re: RAW and WAW hazard avoidance

Postby Gravis » Wed Aug 07, 2013 7:08 pm

perhaps i'm not understanding but is this your idea (in rudimentary code)? (i hope you know bitwise math!)

Code: Select all
int64 locked_for_writing = 0; // bitmask for all 64 registers

stage_RA()
{
  if(locked_for_writing & (want_to_write | want_to_read)) // if the registers we need are busy...
    return; // retry this stage on the next cycle

  locked_for_writing |= want_to_write; // OR marks the registers that we want_to_write to
  /* rest of function */
}

stage_LastExec() // last execution stage that needs registers marked by want_to_write
{
  /* rest of function */
  locked_for_writing ^= want_to_write; // XOR unmarks the registers marked by want_to_write
}


note: i dont completely overwrite locked_for_writing because it may have other registers that are not needed by this instruction.

edit: switched from while to if and return
Last edited by Gravis on Thu Aug 08, 2013 9:13 pm, edited 1 time in total.
User avatar
Gravis
 
Posts: 445
Joined: Mon Dec 17, 2012 3:27 am
Location: East coast USA.

Re: RAW and WAW hazard avoidance

Postby EggBaconAndSpam » Wed Aug 07, 2013 7:23 pm

locked_for_writing &= ~want_to_write; might be clearer.

On a different note: How comes that the architecture manual states a 1 cycle stall between IALU and FPU instructions (with register dependency)?
EggBaconAndSpam
 
Posts: 32
Joined: Tue Jul 16, 2013 2:39 pm

Re: RAW and WAW hazard avoidance

Postby Gravis » Wed Aug 07, 2013 7:56 pm

EggBaconAndSpam wrote:locked_for_writing &= ~want_to_write; might be clearer.

it would also be one more operation. i find xor to be clearer anyway.

EggBaconAndSpam wrote:lOn a different note: How comes that the architecture manual states a 1 cycle stall between IALU and FPU instructions (with register dependency)?

and yet it's a 4 cycle stall between FPU and IALU (the order changes the count) instructions. it's part of the variable length pipeline.
User avatar
Gravis
 
Posts: 445
Joined: Mon Dec 17, 2012 3:27 am
Location: East coast USA.

Re: RAW and WAW hazard avoidance

Postby EggBaconAndSpam » Wed Aug 07, 2013 8:00 pm

Well FPU-IALU makes sense, RA to E4 would be 4+1 cycles, however IALU-FPU is actually the same as IALU-IALU and would be 0+1 cycle...

Maybe they incorporated dual-issuing into their numbers?...
EggBaconAndSpam
 
Posts: 32
Joined: Tue Jul 16, 2013 2:39 pm

Re: RAW and WAW hazard avoidance

Postby timpart » Thu Aug 08, 2013 6:33 am

EggBaconAndSpam wrote:locked_for_writing &= ~want_to_write; might be clearer.

On a different note: How comes that the architecture manual states a 1 cycle stall between IALU and FPU instructions (with register dependency)?


I'm wondering if the FPU is somehow further away and the RA does take a whole cycle? (Rather than the quick reuse with IALU)

In table 26 LOAD IALU has a one cycle gap, but LOAD FPU has two cycles.
PU
Another curiosity is in table 25. FPU Store has only 3 cycle gap not 4.

Tim
timpart
 
Posts: 302
Joined: Mon Dec 17, 2012 3:25 am
Location: UK

Re: RAW and WAW hazard avoidance

Postby timpart » Thu Aug 08, 2013 7:12 am

Gravis wrote:perhaps i'm not understanding but is this your idea (in rudimentary code)? (i hope you know bitwise math!)

Code: Select all
int64 locked_for_writing = 0; // bitmask for all 64 registers

stage_RA()
{
  while(locked_for_writing & (want_to_write | want_to_read) ) { /* wait for required registers to be unlocked */ }
  locked_for_writing |= want_to_write; // OR marks the registers that we want_to_write to
  /* rest of function */
}

Yes I agree with this except for the use of while. I'd make it an if, and if waiting then just do that for one clock cycle. Need to check the config register CTIMER0CFG and CTIMER1CFG and if either of them have binary 1000 (count RA stalls) then decrement the corresponding clock.

Perhaps pass a NOP to the next pipeline stage to give it something to do. (Don't want it re-executing the last instruction.)

Then carry on with the rest of the pipeline. Next clock cycle we check again.

Gravis wrote:note: i dont completely overwrite locked_for_writing because it may have other registers that are not needed by this instruction.

Completely agree.

Tim
timpart
 
Posts: 302
Joined: Mon Dec 17, 2012 3:25 am
Location: UK

Re: RAW and WAW hazard avoidance

Postby EggBaconAndSpam » Thu Aug 08, 2013 8:50 am

timpart wrote:
EggBaconAndSpam wrote:locked_for_writing &= ~want_to_write; might be clearer.

On a different note: How comes that the architecture manual states a 1 cycle stall between IALU and FPU instructions (with register dependency)?


I'm wondering if the FPU is somehow further away and the RA does take a whole cycle? (Rather than the quick reuse with IALU)

In table 26 LOAD IALU has a one cycle gap, but LOAD FPU has two cycles.
PU
Another curiosity is in table 25. FPU Store has only 3 cycle gap not 4.

Tim


Might actually be the case.
Regarding FPU - Store: Store reads its source register in E1, not RA (where it reads the address registers), hence stalling occurs one cycle later.
EggBaconAndSpam
 
Posts: 32
Joined: Tue Jul 16, 2013 2:39 pm

Re: RAW and WAW hazard avoidance

Postby timpart » Thu Aug 08, 2013 11:20 am

EggBaconAndSpam wrote:Regarding FPU - Store: Store reads its source register in E1, not RA (where it reads the address registers), hence stalling occurs one cycle later.


Ah good point. So we'll need code in E1 for stores that does similar stuff to the RA busy test. There's even another TIMER config setting to keep track of it, binary 0111.

I never managed to get a definitive answer as to when the chip tries to Dual Issue. Presumably it has a look at instruction decode stage, otherwise it doesn't know whether to send the FPU instruction through in parallel rather waiting until the next cycle. But what happens if one of the instructions is waiting on a register at RA but the other isn't? Does the duo get split apart or do they both wait until they can proceed? Potentially has an impact as to when the result of the IALU instruction is available. If they split then the delayed one can potentially form a new pairing with the following instruction.

I'm thinking of a (ficticious) sequence like
LDR R0,.....
AND R1,R1,R2 ; mask out sign bit to get absolute value
FADD Rx,Rx,R0 ; Does this dual issue with the AND and then delay it?
FADD Ry,Ry,R1 ; If the AND wasn't stalled this can execute immediately

Tim
timpart
 
Posts: 302
Joined: Mon Dec 17, 2012 3:25 am
Location: UK

Next

Return to Epiphany and Parallella Q & A

Who is online

Users browsing this forum: No registered users and 9 guests

cron