Page 2 of 3

Re: Current status

PostPosted: Mon Jul 03, 2017 4:27 pm
by upcFrost
Spent quite a lot of time trying to make the Load/Store optimization pass work as it should. Still a bunch of bugs might pop up.

Not exactly sure how to proceed with volatile memory references - on one hand, the compiler should not touch those. On the other hand, merging two loads or stores does not change the logic flow itself (it should be preserved in any case, volatile or not). Maybe I'll just make an additional flag

Re: Current status

PostPosted: Wed Jul 05, 2017 3:39 pm
by jar
upcFrost wrote:Not exactly sure how to proceed with volatile memory references - on one hand, the compiler should not touch those. On the other hand, merging two loads or stores does not change the logic flow itself (it should be preserved in any case, volatile or not). Maybe I'll just make an additional flag


Code: Select all
volatile int* p = ...;
*p = 1;
*p = 2;


IMO, in the code above, the compiler should not attempt to remove either line. Otherwise, I'm not sure what you mean.

Re: Current status

PostPosted: Thu Jul 06, 2017 2:19 pm
by upcFrost
Here's what I mean (the code is a bit different)

Code: Select all
volatile int* p = ...;
p[0] = 1;
p[1] = 2;

This code can be translated into (don't mind %p, there's a reg ofc)
Code: Select all
mov r0, 1
str r0, %p
mov r1, 2
str r1, %p+4

Or both stores can be merged into a single strd
Code: Select all
mov r0, 1
mov r1, 2
strd r0, %p


There is a bunch of checks if such conversion is possible, of course, but still that's a question if the compiler is eligible to do it or not.

Re: Current status

PostPosted: Thu Jul 06, 2017 3:15 pm
by sebraa
I don't think you can merge two word-writes into a dword-write.
For volatile variables, each access must happen exactly in the way specified.

Re: Current status

PostPosted: Thu Jul 06, 2017 3:40 pm
by GreggChandler
You might find this link to be an informative read: http://en.cppreference.com/w/c/language/volatile

Among the rest of the read, specifically:
Every access (both read and write) made through an lvalue expression of volatile-qualified type is considered an observable side effect for the purpose of optimization and is evaluated strictly according to the rules of the abstract machine (that is, all writes are completed at some time before the next sequence point). This means that within a single thread of execution, a volatile access cannot be optimized out or reordered relative to another visible side effect that is separated by a sequence point from the volatile access.


The following link speaks to the interpretation of "sequence points": http://en.cppreference.com/w/c/language/eval_order

Again, consider:
4) There is a sequence point after the evaluation of a full expression (an expression that is not a subexpression: typically something that ends with a semicolon or a controlling statement of if/switch/while/do) and before the next full expression.


An example where combining sequenced writes to "volatile" memory would be in the case of I/O. The sequence of I/O to registers is often significant. Converting two 32-bit writes to a 64-bit write could break the sequencing depending upon the "endianness" of the architecture.

Re: Current status

PostPosted: Thu Jul 06, 2017 3:56 pm
by jar
upcFrost wrote:Here's what I mean (the code is a bit different)
Code: Select all
volatile int* p = ...;
p[0] = 1;
p[1] = 2;



That question makes more sense and it shouldn't be done according to what Gregg posted. I can't think of a real code that this might break, but there is probably an edge case somewhere. If you do this, make it an optional flag.

However, if it's NOT a pointer to volatile data, I would appreciate the compiler optimizing it to a single double-word store (if 'p' is 8-byte aligned). This would improve unrolled code where two consecutive results or two inputs can be stored or read in a single instruction.

Re: Current status

PostPosted: Thu Jul 06, 2017 5:22 pm
by GreggChandler
jar wrote:it shouldn't be done according to what Gregg posted


@jar, are you suggesting that the implementor ignore the specs?

The referenced document clearly articulates a relationship to "sequence points", and as I read the quoted pages, statements/expressions separated via semi-colons define sequence points. Optimizing across a "sequence point" does not appear to be per the spec when volatile data is involved. Did I really mis-interpret or mis-read that?

The purpose of specifications is to avoid statements such as
jar wrote: I can't think of ...

Re: Current status

PostPosted: Thu Jul 06, 2017 9:09 pm
by jar
GreggChandler wrote:
jar wrote:it shouldn't be done according to what Gregg posted

@jar, are you suggesting that the implementor ignore the specs?

I think you misunderstood me. I meant:
jar wrote:[combining two stores to volatile memory into one double word operation] shouldn't be done


I also meant if upcFrost does implement this behavior that he should make it an optional flag so that it doesn't break code.

I feel like you're trying to misrepresent what I said...
jar wrote:there is probably an edge case somewhere


I meant that the spec should be followed because there will be a code that is broken even though I can't think of a specific real code that would break.

I believe we're in agreement, Gregg.

Re: Current status

PostPosted: Tue Jul 11, 2017 3:50 pm
by upcFrost
Callee-saved regs now use paired loads and stores. Optimized scheduler a bit. In general, matmul-16 currently gives out 158ms against 130ms on gcc, purely because of suboptimal scheduling. Will work on it tomorrow.
Also found a couple of bugs in Load/Store optimization pass, fixed all except one.

Gregg, about volatile access - that's exactly as @jar said. It is possible to implement (basically skipping one "if-then" case), but as it might, and it will in some cases, break the code.
Still, in some cases it can improve performance quite a bit. I'm planning to make additional flag to allow it, but it will be set to false by default.
So the default behavior is not to touch volatile ldr/str at all, just as in spec

Re: Current status

PostPosted: Mon Sep 11, 2017 8:53 pm
by upcFrost
Took a long vacation to switch jobs and move to the different country :)

On the compiler side - working on vectorization, or strd/ldrd to be precise, plus i64/f64 support. Atm most tests build fine, and actually arithmetic runs a bit faster than on GCC (not much, as i'm still using precompiled libs from gcc bundle). I want to fix matmul-16 test which fails on memory mapping (sections overlap) before moving on.

Another big question is CI and distribution. Travis has build time limitation of 1 hour, which is not enough to build full LLVM stack, and ccache size of 512 mb, which is not enough to use ccache with LLVM build. Semaphore has disk quote, which is also too small to fit the build. Will try to look for solution, or maybe i'll just mail Semaphore or Travis and ask them to raise quote for my build (iirc they're ok doing it for FOSS projects :roll: )