My theory is there is a fault in the multiply add instruction on one of the cores. One that doesn't show up if you multiply 1.0 by 1.0 and then add those numbers together. Alternatively, the rounding mode from the config register isn't working properly and truncate is being used instead of round.
Perhaps an alternative test that tries a selection of "randomly" chosen numbers then compares the result to a known good outcome? (Perhaps best in assembler to get exact bit patterns, or some tricky C that interprets float as an int bit pattern.) Doesn't need to be a matrix operation, just some arithmetic statements.
Tim