Page 1 of 1

### paper on extended "double-single" precision Posted: Sat Jun 14, 2014 11:05 am
The paper ``Extended-Precision Floating-Point Numbers for GPU Computation'' by Andrew Thall may be of use for the epiphany.

It outlines some basic flops for a double-single format that I presume should be faster (and smaller) than a software-ieee-double library. All operations are decomposed into 32-bit flops so can run on hardware. It approximately doubles the mantissa accuracy but doesn't extend the exponent. Posted: Sat May 27, 2017 11:28 pm Posted: Sun May 28, 2017 12:42 am
`float2 df64_mult(float2 a, float2 b) { // 8 mul + 10 sub + 6 add   float2 p;   p = twoProd(a.x, b.x); // 6 mul + 8 sub + 3 add   p.y += a.x * b.y;   p.y += a.y * b.x;   p = quickTwoSum(p.x, p.y); // 2 sub + 1 add   return p;}float2 quickTwoSum(float a, float b) { // 2 sub + 1 add   float s = a + b;   float e = b - (s - a);   return float2(s, e);}float2 twoProd(float a, float b) { // 6 mul + 8 sub + 3 add    float p = a * b;   float2 aS = split(a); // 1 mul + 3 sub   float2 bS = split(b); // 1 mul + 3 sub   float err = ((aS.x * bS.x - p) + aS.x * bS.y + aS.y - bS.x) + aS.y * bS.y;   return float2(p, err);}float2 split(float a) { // 1 mul + 3 sub   const float split = 4097; //(1<<12)+1;   float t= a * split;   float ahi = t - (t - a);   float alo = a - ahi;   return float2(ahi, alo);}`