mutex performance
Posted: Mon Feb 20, 2017 5:21 pm
Is it to be expected that "close" cores receive preferential performance when accessing (attempting to lock) a mutex?
I modified the sample "mutex_test" application found here: https://github.com/adapteva/epiphany-ex ... /cpu/mutex to use "e_mutex_trylock" in "e_mutex_test" rather than "e_mutex_lock". I then spun incrementing a local counter until the lock was acquired. I also modified the application (e_mutex_test) to consume a random number of cycles after the lock was acquired and before it was released. Afterwards, I read the spin lock counts from the driving harness (mutex_test). To my surprise, one of the "close" processors always appears to acquire the lock first when comparing multiple runs of the application. This appears statistically significant. I suppose that I could more extensively modify the app to histogram the results against thousands of executions, however, if my initial observation is correct, it appears possible that on an applications where multiple cores were looking to grab the mutex, cores that are farther away could be starved out.
Is this the expected behavior for your mutex implementation? If so, this should be documented, as there appears to be no way to prevent the starvation with a sufficiently parallelized application demanding high frequency access to a mutex.
I modified the sample "mutex_test" application found here: https://github.com/adapteva/epiphany-ex ... /cpu/mutex to use "e_mutex_trylock" in "e_mutex_test" rather than "e_mutex_lock". I then spun incrementing a local counter until the lock was acquired. I also modified the application (e_mutex_test) to consume a random number of cycles after the lock was acquired and before it was released. Afterwards, I read the spin lock counts from the driving harness (mutex_test). To my surprise, one of the "close" processors always appears to acquire the lock first when comparing multiple runs of the application. This appears statistically significant. I suppose that I could more extensively modify the app to histogram the results against thousands of executions, however, if my initial observation is correct, it appears possible that on an applications where multiple cores were looking to grab the mutex, cores that are farther away could be starved out.
Is this the expected behavior for your mutex implementation? If so, this should be documented, as there appears to be no way to prevent the starvation with a sufficiently parallelized application demanding high frequency access to a mutex.