[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 112: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 112: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 112: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 112: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Parallella Community • View topic - Matmul-16 example gets stuck

Matmul-16 example gets stuck

Hardware related problems and workarounds

Matmul-16 example gets stuck

Postby Calle » Mon Mar 24, 2014 6:39 pm

Calle
 
Posts: 24
Joined: Mon Dec 17, 2012 3:24 am

Re: Matmul-16 example gets stuck

Postby ubii » Mon Mar 24, 2014 7:12 pm

Calle,

If you are not already using an external fan, then I would strongly suggest doing so, in order to help cool the the board. I highly recommend that folks monitor the temperature of the Zynq chip, at least initially, especially those who are encountering stability issues. This way, the user can see how variables such as ambient temperature, different heat sinks and/or fans, enclosures, and system load affect temperature. The follow two methods can be used to monitor the temperature of the Zynq chip.

viewtopic.php?f=23&t=930&p=6242#p6242

https://github.com/parallella/parallell ... ster/xtemp
User avatar
ubii
 
Posts: 71
Joined: Sun Dec 16, 2012 7:18 pm
Location: US

Re: Matmul-16 example gets stuck

Postby Calle » Mon Mar 24, 2014 9:07 pm

Okay, ztemp says it is running at around 95 degrees Celsius, so it is probably a heating issue (this with only the supplied heatsink mounted).

Letting it cool of and then restarting shows an initial rest temperature at about 65 degrees C, after a few minutes of doing nothing the temperature has climbed to 76 degrees C. After 10 minutes it's up to 90 degrees C.
Calle
 
Posts: 24
Joined: Mon Dec 17, 2012 3:24 am

Re: Matmul-16 example gets stuck

Postby ubii » Mon Mar 24, 2014 9:56 pm

Calle,

To quote Andreas, "The need for a fan is a really unfortuate reality. The board consumes 5 Watts, which is simply too much for such a small board without any kind of custom fitted heatsink or fan." - viewtopic.php?f=10&t=487&p=6249&hilit=fan#p6249

You will need to use some type of external fan to help cool the board, as it is my experience that stability issues tend to start at temperatures over 80 C.
User avatar
ubii
 
Posts: 71
Joined: Sun Dec 16, 2012 7:18 pm
Location: US

Re: Matmul-16 example gets stuck

Postby Calle » Sat Mar 29, 2014 2:08 pm

Calle
 
Posts: 24
Joined: Mon Dec 17, 2012 3:24 am

Re: Matmul-16 example gets stuck

Postby aolofsson » Sat Mar 29, 2014 3:38 pm

Calle,

Great to see that your board is now almost up and running.

The intermittent comparison failure is actually not a great sign for your board (and worries me) because it could suggest a weakness in our test flow. We have run boards continuously for 24 hours on some boards (with the matmul test for example) without any failures/diffs. Could you try running the following test to help us figure out what the problem is.

https://github.com/adapteva/epiphany-ex ... ons/e-test

Thanks,
Andreas
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: Matmul-16 example gets stuck

Postby Calle » Sat Mar 29, 2014 3:57 pm

Calle
 
Posts: 24
Joined: Mon Dec 17, 2012 3:24 am

Re: Matmul-16 example gets stuck

Postby Calle » Sat Mar 29, 2014 5:47 pm

Calle
 
Posts: 24
Joined: Mon Dec 17, 2012 3:24 am

Re: Matmul-16 example gets stuck

Postby Calle » Sat Mar 29, 2014 6:39 pm

In the data above I have used A and B matrices that are the same, by changing the matrix_init(float seed) function to give both matrices the value (i + j + seed) % MAX_MEMBER for the value at position (i,j), using zero-based indexing. seed = 0 and MAX_MEMBER = 32.

Thus we see that the Zynq give the correct values and the Epiphany give the wrong values. We can also see that the values are consistently off by 2 at first, but then the error starts to change.
Calle
 
Posts: 24
Joined: Mon Dec 17, 2012 3:24 am

Re: Matmul-16 example gets stuck

Postby aolofsson » Sat Mar 29, 2014 8:34 pm

Calle,Thanks, this is really helpful!

That fact that the self test passes means that the chip is more or less OK which is a good thing...but it makes the problem more frustrating.

Can you confirm that the matmul test does run correctly sometimes?
When it does fail, does it always fail the same way?
Which supply and cable are you using? Your own or the one from the accessory kit? Can you show a picture of your setup?
Does the board setup make a difference (ethernet only vs USB+hdmi+ethernet) for example.

Here are some theories ranked from likely to unlikely:

1. Power supply issue.
2. Operator error in the manufacturing line (we have now automated the testing procedure further to reduce the chance of operator error)
3. We "got lucky" in our production test (it only ran once)
4. We missed a subtle failure mode in our production tests.
5. The board was damaged due to overheating (we have not seen this yet..but 95 deg is out of spec for the zynq part).

Thanks for helping us figure out this failure mode!! Anything we learn will be immediately fed back to the production line and will ensure that the remaining 4,800 KS boards (out of 6,300) don't have this issue.

Andreas
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Next

Return to Troubleshooting

Who is online

Users browsing this forum: No registered users and 27 guests

cron