High temperature causes bus error

PostPosted: Mon Mar 13, 2017 11:38 am
by wiegmink
Hi all,

I'm using the Parallella Embedded (Zynq 7020) with a heatsink mounted to constantly compute some algorithm. All 16 cores are kept busy, the Zynq PS feeds new data, collects results, and routes it to a network destination. After a while (5 minutes or so) a bus error occurred. At first I thought it was related to a possible memory conflict (gdb showed that the error occurred at e_read or e_write), but the message log showed the following:
[243461.989088] epiphany_vm_fault: Temperature outside operating range. Sending SIGBUS to process mainthread.elf (pid: 3426)

- Is it indeed so that a high temperate causes a bus error (the message is pretty clear, it's for my own peace of mind)?
- Which component is getting too hot (Zynq or Epiphany)?
- What is the threshold temperature?
- And finally again for my peace of mind, could a simultaneous read/write from the Zynq PS and a Epiphany core ever cause a memory conflict?

I added a small fan, and now it is already running for quite some time. So it seems clear.

I'm new to the forum, I don't think I found the answers somewhere else (I did search, but you never know).


PostPosted: Mon Mar 13, 2017 2:01 pm
by jar
The Zynq is causing this. There is a thermal daemon running that can be configured in /etc/default/parallella-thermald if you want to void the warranty.

I picked up a couple of the Parallella aluminum cases recently and they've kept the Zynq well below the default limit and look much nicer. Though, I used the "heatsink hack" described here:

Your fan setup should work now and prevent the thermal daemon from shutting the board down.

PostPosted: Tue Mar 14, 2017 8:19 am
by wiegmink
Thanks jar. I don't want to void the warranty, I just didn't make the connection between the bus error and overheating of the Zynq.