P1601-01 Board OK at first but dies 'slowly' after less than
Posted: Tue Aug 05, 2014 5:43 pm
Hello,
I ordered a Parallella P1601-01 from RS (UK) to replace a dodgy Kickstarter reward A101040 in a mini-cluster (see topic viewtopic.php?f=50&t=1438).
Having followed the instructions to mount the heat sink, made the J15 power-from-mounting-holes bridge connection, mounted and connected the board to power and network, changed the boot FPGA bin file to the HDMI for 7010 on the micro SD-card I applied power.
The board seemed to boot OK, the network came up and I could SSH into the board. It then ran about 500 or 600 iterations of the Epiphany LIST.E16 tests and then crashed. I noticed this around 21:00 last night (BST), re-booted and ran modified matmul-16 repeatedly [1].
I left all 4 boards - the remaining original 3 A101040 boards and the new P1601-01 - running the same modified matmul-16 for ever tests overnight
When I checked in this morning at around 08:00 the original 3 boards were still going strong (all be it having timed out approximately every 1000 - 1500 iterations) but the new board had hung - I estimate sometime around 1 or 2 am.
It refused to come back - not even the pre-boot messages from the serial port.
I left the mini-cluster powered down for a while and came back half an hour of so later. This time I managed to get the new board fired up - it ran 1000 or so iterations of my modified matmul-16 test then hung and once again would not boot at all for a while.
After an interlude playing with headless boot files and fixing corrupted SD card images I returned to the 7010 HDMI default boot files, and plugged in an HDMI monitor - after some more SD card image fixing the GUI login screen came up (note: I do not have suitable uUSB cables and adapters to plug in a keyboard and mouse - yet - on order).
Logging in again over SSH I went to run the modified matmul-16 forever test...the GUI login screen display sort of twitched and turned a sort of mauve colour and the SSH session hung.
That was the last sign of life from this board - I cannot even get the Zynq pre-boot messages and console (if no uSD-card present) to come up. The power LED and green Ethernet socket LED come on (the Ethernet one flashing as usual) but that's it.
Seriously unhappy and _not_ impressed with the Parallella at the moment despite my really wanting to like it - they seem to be a huge disappointment, and a great waste of time and money
[1] the modifications basically skip the host side check calculation and applies a timeout on the host when waiting for the Epiphany cores to finish so as to get the Epiphany to run the calculation as many times as possible to flush out cases where the host never reads the completion state from the Epiphany.
I ordered a Parallella P1601-01 from RS (UK) to replace a dodgy Kickstarter reward A101040 in a mini-cluster (see topic viewtopic.php?f=50&t=1438).
Having followed the instructions to mount the heat sink, made the J15 power-from-mounting-holes bridge connection, mounted and connected the board to power and network, changed the boot FPGA bin file to the HDMI for 7010 on the micro SD-card I applied power.
The board seemed to boot OK, the network came up and I could SSH into the board. It then ran about 500 or 600 iterations of the Epiphany LIST.E16 tests and then crashed. I noticed this around 21:00 last night (BST), re-booted and ran modified matmul-16 repeatedly [1].
I left all 4 boards - the remaining original 3 A101040 boards and the new P1601-01 - running the same modified matmul-16 for ever tests overnight
When I checked in this morning at around 08:00 the original 3 boards were still going strong (all be it having timed out approximately every 1000 - 1500 iterations) but the new board had hung - I estimate sometime around 1 or 2 am.
It refused to come back - not even the pre-boot messages from the serial port.
I left the mini-cluster powered down for a while and came back half an hour of so later. This time I managed to get the new board fired up - it ran 1000 or so iterations of my modified matmul-16 test then hung and once again would not boot at all for a while.
After an interlude playing with headless boot files and fixing corrupted SD card images I returned to the 7010 HDMI default boot files, and plugged in an HDMI monitor - after some more SD card image fixing the GUI login screen came up (note: I do not have suitable uUSB cables and adapters to plug in a keyboard and mouse - yet - on order).
Logging in again over SSH I went to run the modified matmul-16 forever test...the GUI login screen display sort of twitched and turned a sort of mauve colour and the SSH session hung.
That was the last sign of life from this board - I cannot even get the Zynq pre-boot messages and console (if no uSD-card present) to come up. The power LED and green Ethernet socket LED come on (the Ethernet one flashing as usual) but that's it.
Seriously unhappy and _not_ impressed with the Parallella at the moment despite my really wanting to like it - they seem to be a huge disappointment, and a great waste of time and money
[1] the modifications basically skip the host side check calculation and applies a timeout on the host when waiting for the Epiphany cores to finish so as to get the Epiphany to run the calculation as many times as possible to flush out cases where the host never reads the completion state from the Epiphany.