I chose to put my data in the shared external memory rather than core memory due to my understanding of the system architecture as documented in the eSDK and architecture manuals that I have access to. I have not read the eSDK source to determine how the loader works, but from the eSDK documentation, it does not appear that the host can directly map core memory as it can external memory. Use of the e-read() and e-write() functions to access the core memory, while possible, is less than elegant. I could copy the block of data to the host, access it, and copy it back--but without a mutex, this would be problematic.(grin) I presume that the loader and other eSDK functions that access the registers do so through the mesh network protocol via the FPGA and the link protocol--but I have not studied this yet. I could possibly write some assembly code to make this happen, but am not that far yet.
To test my code, I implemented a third shared memory data structure that supports formatted printing from the core through the host to the console. I guarded access to the data structures with my mutex from both the host and core processors. I also created light-weight c++ classes layered over the eSDK definition to isolate me from eSDK changes and relieve the tedium of much of the code I found myself writing. One class supports the eHAL functions, the other supports the eLib functions. Lastly, I implemented a core application that executed random wait's to simulate work and printed core data/numbers that were interspersed with barriers to see it all work. It was thorough enough to flush out a bug or two in both the barrier and the mutex code, however, I would not claim them to be exhaustively tested. The resulting host or core code does not look like any of the sample epiphany applications that I have read--although I have only read a few, however, it does allow me to reuse algorithms and code more easily.
As for my code, I pretty much followed Lamport's pseudo-code as detailed in the link to his ACM article previously posted. It is not finished yet, and I have not analyzed performance yet, but it does appear to work. The downside of the algorithm is that thread number, or in my case core number, determines priority in locking the mutex. One can re-configure the priorities, however, as discovered by Lamport, thread/core 1/0 will always have priority over 16/15. (His algorithm indexes from one rather than zero.) The good news is that it is predictable. The bad new is that it is not fair. To implement fairness, one probably would need to add at least a primitive scheduler as an Operating System would. The implications of the prioritization are that independent of order of mutex lock request, thread/core 1/0 will always have priority if it is waiting for the mutex.
P.S. My initial implementation dynamically allocated the buffers based upon the number of cores in the system, however, I found that executing a "new" or "malloc" brought in way too much library code and data. I am exploring alternatives.
e-test.cpp
bakery.h
bakery.cppStatistics: Posted by GreggChandler — Fri Mar 24, 2017 12:43 pm
]]>