Anomaly in execution time and memory stalls reduction

Discussion about Parallella (and Epiphany) Software Development

Moderators: amylaar, jeremybennett, simoncook

Anomaly in execution time and memory stalls reduction

Postby vanchiramani » Mon Jun 05, 2017 4:38 am

I have a 16 core parallella board. I have a micro-benchmark containing 4 variables (A, B, C, D) with two different placement strategies:
(i) A, B, C, D are placed in the SPM of Core 5 and accessed equal number of times by all 16 threads
(ii) A, B, C, D are placed in SPM of Core 5, 6, 9 and 10 respectively and accessed equal number of times from all 16 threads
I use the performance counters to find the external memory stalls and waits on cmesh using e_ctimer.

When I compare the execution time in both these approaches, I get a speedup of 3.08 in (ii) with respect to (ii), whereas the ratio of total memory stalls is only 1.26. The other parameters like the number of IALU and FPALU instruction executed remains the same in both the cases.

Generally, I observed that the reduction in memory stalls is higher than the execution time. However, in this case the execution time reduction is much higher than the memory stalls. Hence, I would like to understand what other factor helps in reducing the execution time in this context.

Any help really appreciated.
vanchiramani
 
Posts: 17
Joined: Tue Mar 29, 2016 8:41 am

Re: Anomaly in execution time and memory stalls reduction

Postby sebraa » Mon Jun 05, 2017 9:39 am

External memory is, as far as I know, off-chip memory - not off-core memory. So if you check the "ext fetch" and "ext data" ctimer values, you measure accesses to the external memory, which you should not see at all (i.e. check that your code is executed from local memory and does not access external memory for your test).
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: Anomaly in execution time and memory stalls reduction

Postby vanchiramani » Mon Jun 05, 2017 3:08 pm

Dear Sebraa

I am using COPRTHR library. Hence, instructions are executed from local SPM. This is confirmed by obtaining Ext fetch stalls as zero.

Ext.Fetch and Ext. Data stalls occur when instruction/data is obtained from non-local SPM. You had answered one of my questions before:
viewtopic.php?f=23&t=3915&p=18307&sid=393681d9d32d1a22872a61ae1e82238a#p18307
vanchiramani
 
Posts: 17
Joined: Tue Mar 29, 2016 8:41 am


Return to Programming Q & A

Who is online

Users browsing this forum: No registered users and 13 guests

cron