Parallella Community

by **alexrp** » Tue Jan 07, 2014 9:07 am

I'm wondering what the most efficient (in terms of both latency and CPU utilization) way to wait for a TRAP on the host side would be. What I currently do is have a thread sit and constantly check the ACTIVE bit and the instruction PC points to. But obviously, just doing these two trivial checks in a loop with no yielding to the OS scheduler would easily saturate the system. At the same time, sleeping for periods of time will introduce undesirable latency when servicing system calls made with TRAP.

Any bright ideas for dealing with this? It's OK if a solution is Linux-specific or even ARM-specific - as long as it works.

by **tnt** » Tue Jan 07, 2014 10:32 am

by **timpart** » Tue Jan 07, 2014 12:24 pm

I'd agree with tnt, polling the Epiphany cores for a TRAP is not the way to go for performance. Even without changing the FPGA logic to allow a host interrupt it would still be better if the Epiphany wrote to the external DRAM (say one location per core) and the host polled that instead. Otherwise the eLink is occupied with all the TRAP polling, and external DRAM access would be even slower.

I'm an FPGA novice, but what would be nice is if the logic could be changed to recognize the Epiphany DMA end of transfer message mode and trigger a host interrupt. Oh and why not support the TESTSET mode as well, while I'm dreaming of the future.

Tim

by **tnt** » Tue Jan 07, 2014 12:58 pm

I don't think TESTSET is possible at all since you can't prevent the ARM or any other AXI actors from interfering in the read/modify/write cycle ...

by **alexrp** » Tue Jan 07, 2014 1:18 pm

I agree that the Epiphany writing to some external memory would be nice, but this somewhat limits the usefulness of the TRAP instruction by itself. At that point, you may as well define an ABI where the Epiphany writes arguments to external memory and issues IDLE, and the host sets ILAT[0] to resume execution, or something else similar. There are various ways to achieve the same thing with that kind of approach.

TRAP triggering a host interrupt would indeed be nice. That would make it actually stand out and have a clear purpose compared to other similar functionality.

by **shodruk** » Tue Jan 07, 2014 1:26 pm

by **mhonman** » Tue Jan 07, 2014 11:14 pm

In partial answer to the OP, strictly speaking the only thing that has to be polled is the DEBUGSTATUS register - if that indicates a halt condition, only then do other core registers need to be read. These polls do take bandwidth away from the interface between Epiphany and host-side memory buses but I think this is not too bad because the TRAPs underlying newlib are all for I/O and we're unlikely to see more than 100MB/s on any of the Parallella host peripherals - thus for block transfers of 1KB there is little point to polling more frequently than every 10us.

It is also possible to make this polling adaptive, e.g. there is a good chance that if a core has just made an IO call, then either it or another core may be about to make another IO request, so poll again almost as soon as the result of the call have been posted to the Epiphany (I say almost as soon, because the core will spend some time processing the IO results and preparing for the next operation, so an immediate repoll is unlikely to find anything to do).

IMO the alternate approach of writing to special locations in external RAM and then waiting for ILAT[0] won't really help because the host must still poll the external RAM & will thus compete with the Epiphany chip for RAM bandwidth. I guess it all boils down to whether the e-link has more bandwidth than the host RAM - if so, polling DEBUGSTATUS may be the lesser evil.

Some example code:

"simon" is a Simple IO Monitor that is supposed to handle the IO traps (I say supposed to because it's not fully tested!) & uses the x_epiphany_exception module to check for traps.

Polling is normally every 10uS - if it services a trap then the next 5 polls are at a 2us interval. This could be made smarter to take account of the type of request, e.g. for console I/O there is no point in a fast repoll.

by **ed2k** » Wed Jan 08, 2014 3:32 am

moving the polling code to kernel module would save you lots of context switch time.

by **shodruk** » Thu Jan 09, 2014 12:21 pm

It seems that TRAP is for debugging purposes, but is also useful for normal operation.
It's very easy to use and efficient. I like TRAP! :lol:

Parallella Community

Most efficient way to wait for a TRAP on the host CPU?

Most efficient way to wait for a TRAP on the host CPU?

Re: Most efficient way to wait for a TRAP on the host CPU?

Re: Most efficient way to wait for a TRAP on the host CPU?

Re: Most efficient way to wait for a TRAP on the host CPU?

Re: Most efficient way to wait for a TRAP on the host CPU?

Re: Most efficient way to wait for a TRAP on the host CPU?

Re: Most efficient way to wait for a TRAP on the host CPU?

Re: Most efficient way to wait for a TRAP on the host CPU?

Re: Most efficient way to wait for a TRAP on the host CPU?

Who is online