Different Workgroups - Different Performance

Discussion about Parallella (and Epiphany) Software Development

Moderators: amylaar, jeremybennett, simoncook

Different Workgroups - Different Performance

Postby eoghanoh » Thu Feb 20, 2014 3:04 pm

Attachments
WorkgroupTimings.pdf.zip
Workgroup Timings
(19 KiB) Downloaded 1125 times
eoghanoh
 
Posts: 23
Joined: Mon Dec 17, 2012 3:22 am

Re: Different Workgroups - Different Performance

Postby timpart » Thu Feb 20, 2014 8:24 pm

Something of a puzzle. Do you recompile or relink in any way when changing workgroup size? The slower ones are taking one extra clock cycle per loop iteration.

Could you do a e-objdump -d objectfilename.elf and post the snippet which is the assembler equivalent of just before the loop to just after? (searching for the names the functions called just before should be an easy way to find the code.) It should be under 30 instructions.

My theory, (based on no evidence) is that in the longer cases the place jumped to at the start of the for loop falls 2 bytes before an 8 byte boundary. I may be completely wrong though.

Tim
timpart
 
Posts: 302
Joined: Mon Dec 17, 2012 3:25 am
Location: UK

Re: Different Workgroups - Different Performance

Postby aolofsson » Thu Feb 20, 2014 8:55 pm

Are you sure that all code is internal? (with the mod operator there is certainly a chance that there is external access, unless you forced absolutely everything to be placed in internal). Which LDF file did you use? As Tim suggested, can you post the obj-dump. This behavior could be explained if there are external RAM acceses.
Andreas
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: Different Workgroups - Different Performance

Postby eoghanoh » Thu Feb 20, 2014 9:34 pm

Hello Tim and Andreas,

thanks for the replies.

I am linking against internal.ldf

Yes, I recompile each time I change the workgroup.

Also just a note I always have the workgroup's origin at the top left, I haven't tried any other locations for the workgroup.

I've attached the entire object dump file as it's not 100% clear to me where the loop starts and ends and I don't want to leave out something critical.

If it's still not clear after this I'll strip everything else out of the code and post a SSCCE.

But really, the timing is just around the loop and the control variable max is a local non volatile variable. The code I've posted is exactly as I have it.

Perhaps it's not that the cores are actually taking longer or shorter, but instead perhaps it's something to do with the timers? Just a thought.

Thanks,
Eoghan.
Attachments
dump.txt
objdump
(28.04 KiB) Downloaded 1125 times
eoghanoh
 
Posts: 23
Joined: Mon Dec 17, 2012 3:22 am

Re: Different Workgroups - Different Performance

Postby eoghanoh » Thu Feb 20, 2014 9:47 pm

eoghanoh
 
Posts: 23
Joined: Mon Dec 17, 2012 3:22 am

Re: Different Workgroups - Different Performance

Postby eoghanoh » Thu Feb 20, 2014 11:12 pm

OK, this is now sorted. Tim's response put me on the right track - thanks Tim.

There's now just 1 tick between the time on a 2x1 workgroup and a 1x2 workgroup.

I've just used the O3 flag. I picked up a build script somewhere from the forums and it didn't have the O3 flag. O3 turns on loop aligns, function aligns, jump aligns etc. The code is smaller too - can do more with the 32kB!

Plus.....my code is now running over 3 times faster.

What was taking 1944000039 ticks is now taking just 576000030 ticks on a 1x2 workgroup and 576000029 on a 2x1 workgroup. I'm not worried about that 1 extra load (I'm presuming).

Feel silly I didn't have O3 on...... :)

Oh well, it was interesting to see the effect of not aligning the loops.
eoghanoh
 
Posts: 23
Joined: Mon Dec 17, 2012 3:22 am

Re: Different Workgroups - Different Performance

Postby ysapir » Thu Feb 20, 2014 11:21 pm

@eoghanon - by itself, the optimization should not explain a difference between group timings. Even if you recompile your code, as you mentioned, but provided there was no change in the code itself, then the generated image should remain the same, and the timing should be the same. However - did you indeed NOT change the code between the recompilation?
User avatar
ysapir
 
Posts: 393
Joined: Tue Dec 11, 2012 7:05 pm

Re: Different Workgroups - Different Performance

Postby eoghanoh » Fri Feb 21, 2014 12:00 am

eoghanoh
 
Posts: 23
Joined: Mon Dec 17, 2012 3:22 am

Re: Different Workgroups - Different Performance

Postby ysapir » Fri Feb 21, 2014 12:06 am

Is this the device code or the host code? Why would the e-core be aware of your topology?
User avatar
ysapir
 
Posts: 393
Joined: Tue Dec 11, 2012 7:05 pm

Re: Different Workgroups - Different Performance

Postby timpart » Fri Feb 21, 2014 2:09 am

timpart
 
Posts: 302
Joined: Mon Dec 17, 2012 3:25 am
Location: UK


Return to Programming Q & A

Who is online

Users browsing this forum: No registered users and 12 guests