I had a quick look to OpenShMeM as well as multicast sample in Parallella, but what I am driving at is a bit different. I would like to fetch the same data from main memory to multiple thread's local SPM. Some of these data are brought in by all threads while others are brought in by few threads. If each thread brings in data independently, there can be NoC contention leading to higher memory access latency. However, with hardware support, this can be mitigated.
Please do advice if (bringing data by Core 0 using DMA + broadcast using OpenShMem library) would be better than (each core bringing data independently). Additionally, would it be possible to just send the data to a subset of cores? as I could not find any OpenSHMem examples.
Thanks in advance
FarzanehStatistics: Posted by Farzaneh — Fri Dec 23, 2016 4:15 am
]]>