P1601-01 Board OK at first but dies 'slowly' after less than

Hardware related problems and workarounds

P1601-01 Board OK at first but dies 'slowly' after less than

Postby ralphmcardell » Tue Aug 05, 2014 5:43 pm

Hello,

I ordered a Parallella P1601-01 from RS (UK) to replace a dodgy Kickstarter reward A101040 in a mini-cluster (see topic viewtopic.php?f=50&t=1438).

Having followed the instructions to mount the heat sink, made the J15 power-from-mounting-holes bridge connection, mounted and connected the board to power and network, changed the boot FPGA bin file to the HDMI for 7010 on the micro SD-card I applied power.

The board seemed to boot OK, the network came up and I could SSH into the board. It then ran about 500 or 600 iterations of the Epiphany LIST.E16 tests and then crashed. I noticed this around 21:00 last night (BST), re-booted and ran modified matmul-16 repeatedly [1].

I left all 4 boards - the remaining original 3 A101040 boards and the new P1601-01 - running the same modified matmul-16 for ever tests overnight

When I checked in this morning at around 08:00 the original 3 boards were still going strong (all be it having timed out approximately every 1000 - 1500 iterations) but the new board had hung - I estimate sometime around 1 or 2 am.

It refused to come back - not even the pre-boot messages from the serial port.

I left the mini-cluster powered down for a while and came back half an hour of so later. This time I managed to get the new board fired up - it ran 1000 or so iterations of my modified matmul-16 test then hung and once again would not boot at all for a while.

After an interlude playing with headless boot files and fixing corrupted SD card images I returned to the 7010 HDMI default boot files, and plugged in an HDMI monitor - after some more SD card image fixing the GUI login screen came up (note: I do not have suitable uUSB cables and adapters to plug in a keyboard and mouse - yet - on order).

Logging in again over SSH I went to run the modified matmul-16 forever test...the GUI login screen display sort of twitched and turned a sort of mauve colour and the SSH session hung.

That was the last sign of life from this board - I cannot even get the Zynq pre-boot messages and console (if no uSD-card present) to come up. The power LED and green Ethernet socket LED come on (the Ethernet one flashing as usual) but that's it.

Seriously unhappy and _not_ impressed with the Parallella at the moment despite my really wanting to like it - they seem to be a huge disappointment, and a great waste of time and money :(

[1] the modifications basically skip the host side check calculation and applies a timeout on the host when waiting for the Epiphany cores to finish so as to get the Epiphany to run the calculation as many times as possible to flush out cases where the host never reads the completion state from the Epiphany.
ralphmcardell
 
Posts: 12
Joined: Mon Dec 17, 2012 3:25 am
Location: London UK

Re: P1601-01 Board OK at first but dies 'slowly' after less

Postby sebraa » Wed Aug 06, 2014 8:46 pm

Did you check the temperature of the board while it was running?
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: P1601-01 Board OK at first but dies 'slowly' after less

Postby ralphmcardell » Thu Aug 07, 2014 11:03 am

Hi sebraa, thanks for taking the time to enquire.

If by board temperature you mean Zynq chip temperature as reported by the xtemp utility (or ztemp shell script) then yes initially I ran the xtemp utility over ssh -X.

The case I use has 120mm fans above and below the Parallellas, and initially I had only the top pull (suck) fan running (previous test configuration for outgoing dodgy board) and the board in question was running in the region of 55 - 57 degrees C. Reconnecting the bottom push (blow) fan reduced this down to the region of 50 - 52 degrees C. This was while running the Epiphany LIST.E16 tests continuously over several hours (~485 iterations at ~65 seconds per complete run through of the tests).

Overnight I forgot to re-run xtemp. In the final session I had only just logged in over ssh -X when the board failed so did not have time to check the temperature - the heat sink did feel somewhat warmer than usual but still comfortable to touch - I presumed this might have been due to the extra effort of actually having to drive a 1920 by 1200 display via HDMI.

Oh, and the 5V supply was measured at 5.06V.

Again thanks for asking.

Ralph
ralphmcardell
 
Posts: 12
Joined: Mon Dec 17, 2012 3:25 am
Location: London UK


Return to Troubleshooting

Who is online

Users browsing this forum: No registered users and 5 guests

cron