[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 483: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 112: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/bbcode.php on line 112: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4688: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4690: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4691: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4692: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3823)
Parallella Community • View topic - Requests for improving the C compiler

Requests for improving the C compiler

Discussion about Parallella (and Epiphany) Software Development

Moderators: amylaar, jeremybennett, simoncook

Re: Requests for improving the C compiler

Postby Olaf » Wed May 18, 2016 9:01 pm

Olaf
 
Posts: 37
Joined: Sun May 08, 2016 8:47 pm

Re: Requests for improving the C compiler

Postby DonQuichotte » Thu May 19, 2016 7:33 am

Thanks Olaf

Yes I myself do some task instead of letting the C compiler guess the whole thing, it may help.
Great help with aligned structures on powers of 2... etc.

Here is my answer to the "perfectly code aligned 8-byte" :D - the long version, one heavily commented C file lol


#include "e-lib.h" // mandatory even for a minimalist design -- e_get_coreid(), e_read(), e_write()

/*
* DonQuichotteComputers (at) gmail (dot) com: 2016/05/19 testing if we need a better code align(8) macro
*
* There are 4 cases for align(8): either we need 0, 2, 4 or 6 bytes
*
* Tested with SDK 2015.1 on a Fedora Core 23
*
* Compile with:
* e-gcc -T ${ELDF} -O0 src/testalign.c -o testalign.elf -le-lib
*
* with *your* ELDF path, something like that:
* ELDF=/home/ylav/dev/parallella/buildroot/esdk.2015.1/bsps/current/internal.ldf
*
* Trace with:
* e-run -t testalign.elf
*/

//#######################################

void finalsolution(void);

int main(void) {
// e_coreid_t coreid;
int row, col, cmdI;
int fn1, fn2;

#define ALIGN(x) __attribute__ ((aligned (x)))

int ALIGN(8) var=3;
asm volatile (".balignw 8, 0x01a2"); // (1) 0x1a2 is the 2-byte NOP, theoretically e-as pads 8 bytes with those 2-byte sequences
asm volatile (".balignw 8, 0x01a2"); // (2) 0 byte => lack of 0 bytes => no issue, e-as handles this
asm volatile ("gid"); // (3) 2 bytes => lack of 6 bytes => 3 NOP
asm volatile (".balignw 8, 0x1a2"); //
asm volatile ("mov r41, #3");
asm volatile (".balignw 8, 0x01a2"); // (4) 4 bytes => lack of 4 bytes
asm volatile ("gie");
asm volatile ("add r41, r41, #4"); // (5) 6 bytes => lack of 2 bytes
asm volatile (".balignw 8, 0x01a2");
asm volatile ("gid");
asm volatile (".balignw 8"); // (6) no 2nd parameter ? the 'as' doc says if pads with '0' ; what does e-as do ?
asm volatile ("mov r62, #0xDEAF");

testalign4();
asm volatile ("mov r63, r63");

return var;
}

int testalign4(void) {
int ALIGN(8) var=3;
// we'll try fcef fc02 = mov r63,r63 => no updated flag, no change, minimal one-instruction penalty
// bad syntax // asm volatile (".balign 8, 0xfceffc02");
// bad syntax // asm volatile (".p2alignl 3, 0xfcef, 0xfc02");
// bad syntax // asm volatile (".p2alignl 3, 0xfcef 0xfc02");
// bad syntax // asm volatile (".p2alignl 3, 0xfceffc02");
// bad syntax // asm volatile (".p2alignl 3, 0xfceffc02UL");
// grep parallella_examples with p2align... nothing
// lots of imagination, huh ? ... bad syntax though // asm volatile (".p2alignl 3, .byte 0xfc, .byte 0xef, .byte 0xfc, .byte 0x02");
// what I "love" with the 'as' documentation is the lack of a simple example for .p2alignl: it would have been too simple. Trial and error is much more 'fun' probably ^^
// or maybe they work for google... since we have to find the syntax somewhere... my bad, I have no internet at home ; I am a knight errant...
// grep parallella_examples with align... too much results ; some .align before function names... not my concern ; some mysterious ".balign 4" ?!
// hey, this .balign is also documented in the 'as' documentation ! ... but it's just a copy/paste of the .p2align, no .balignl explained :'(
// grep parallella_examples with balign... .balign 4, no explanation ; and .balignw 8,0x01a2 exclusively... nothing new

// O_O bingo ! asm volatile (".balignl 8, 0xfc02fcef"); it is a 4-byte NOP lol at last I can go to bed !

asm volatile (".p2alignl 3"); // it complains but at least compiles :P
asm volatile ("mov r62, #0xABC");
asm volatile (".p2alignl 3"); // expecting 4 '0'
asm volatile ("mov r62, #0xDEF");
asm volatile (".balignl 8, 0xfc02fcef");
/* yes, as expected:
86e: 01a2 nop
870: d78b e0a2 mov r62,0xabc
874: 0000 beq 874 <_testalign4+0x14>
876: 0000 beq 876 <_testalign4+0x16>
878: ddeb e0d2 mov r62,0xdef
87c: fcef fc02 mov r63,r63 // yes ! my beloved 4-byte NOP !
*/

asm volatile ("mov r62, #0xBED"); // go to bed... I deserved it
// OK... the 'patron de remplissage' may be longer than 4 bytes, good news :) asm volatile (".balignl 8, 0xfc02fcef01a201a2");

// OK... eureka... 2 steps will do the task: first, 4-byte align ; second, 8-byte align
// lack 0 => 1st step = nothing, 2nd step = nothing 0 op, optimal
// lack 2 => 1st step = NOP, 2nd step = nothing 1 op, optimal
// lack 4 => 1st step = nothing, 2nd step = mov r63, r63 1 op, optimal
// lack 6 => 1st step = NOP, 2nd step = mov r63, r63 2 op, optimal

finalsolution();

return var;
}

// we test the expected solution to a perfectly optimized 8-byte code alignment
#define PERFECT_ALIGN8 asm volatile (".balignw 4, 0x01a2"); asm volatile (".balignl 8, 0xfc02fcef");

void finalsolution(void) {
PERFECT_ALIGN8
asm("gid");
PERFECT_ALIGN8
asm("mov r62,r62");
PERFECT_ALIGN8
asm("gie");
asm("mov r61,r61");
PERFECT_ALIGN8
PERFECT_ALIGN8
asm("B 0xBED");
}

/*
00000800 <_main>:
800: 775c 2700 str fp,[sp],-0x6
804: 74ef 2402 mov fp,sp
808: 0063 mov r0,0x3
80a: 0e5c 0400 str r0,[fp,+0x4]
80e: 01a2 nop (1) OK, align 8, 1 NOP as expected
810: 0392 gid (2) OK, 'as' is smart, nothing done as expected since we are already 8-byte aligned
812: 01a2 nop (3) OK, 3 NOP :/ suboptimal, don't you think so ?
814: 01a2 nop
816: 01a2 nop
818: 206b a002 mov r41,0x3
81c: 01a2 nop (4) OK, 2 NOP
81e: 01a2 nop
820: 0192 gie
822: 261b b400 add r41,r41,4
826: 01a2 nop (5) OK, 1 NOP
828: 0392 gid
82a: 01a2 nop
82c: 0000 beq 82c <_main+0x2c> (6) e-as complains there is a lack of a 'patron de remplissage', nice French translation by the way :D Yes, these are zeroes, "BEQ Program Counter"...
82e: 0000 beq 82e <_main+0x2e> This situation is well handled: beq <current_program_position> jumps to <current_program_position + 2> as e-run will confirm it
830: d5eb ede2 mov r62,0xdeaf
834: 0e4c 0400 ldr r0,[fp,+0x4]
838: 774c 2400 ldr fp,[sp,+0x6]
83c: b41b 2403 add sp,sp,24
840: 194f 0402 rts
844: 0000 beq 844 <_main+0x44>
*/

/* e-run -t testalign.elf
*
0x000800 --- _main str fp,[sp],-0x6 - memaddr <- 0x7ff0, memory <- 0x0, registers <- 0x7fd8
0x000804 --- _main mov fp,sp - registers <- 0x7fd8
0x000808 --- _main mov.b r0,0x3 - registers <- 0x3
0x00080a --- _main str r0,[fp,+0x4] - memaddr <- 0x7fe8, memory <- 0x3
0x00080e --- _main nop -
0x000810 --- _main gid - gidisablebit <- 0x1
0x000812 --- _main nop -
0x000814 --- _main nop -
0x000816 --- _main nop -
0x000818 --- _main mov.l r41,0x3 - registers <- 0x3
0x00081c --- _main nop -
0x00081e --- _main nop -
0x000820 --- _main gie - gidisablebit <- 0x0
0x000822 --- _main add.l r41,r41,4 - cbit <- 0x0, vbit <- 0x0, vsbit <- 0x0, registers <- 0x7, zbit <- 0x0, nbit <- 0x0
0x000826 --- _main nop -
0x000828 --- _main gid - gidisablebit <- 0x1
0x00082a --- _main nop -
0x00082c --- _main beq.s 0x000000000000082c -
0x00082e --- _main beq.s 0x000000000000082e -
0x000830 --- _main mov.l r62,0xdeaf - registers <- 0xdeaf
0x000834 --- _main ldr r0,[fp,+0x4] - memaddr <- 0x7fe8, registers <- 0x3
0x000838 --- _main ldr fp,[sp,+0x6] - memaddr <- 0x7ff0, registers <- 0x0
0x00083c --- _main add.l sp,sp,24 - cbit <- 0x0, vbit <- 0x0, vsbit <- 0x0, registers <- 0x7ff0, zbit <- 0x0, nbit <- 0x0
0x000840 --- _main jr lr - pc <- 0x6d8
*/

/*
000008ac <_finalsolution>:
8ac: 765c 2700 str fp,[sp],-0x4
8b0: 74ef 2402 mov fp,sp
8b4: fcef fc02 mov r63,r63
8b8: 0392 gid
8ba: 01a2 nop
8bc: fcef fc02 mov r63,r63 // 6 bytes, 2 op => success
8c0: d8ef fc02 mov r62,r62
8c4: fcef fc02 mov r63,r63 // 4 bytes, 1 op => success
8c8: 0192 gie
8ca: b4ef fc02 mov r61,r61
8ce: 01a2 nop // 2 bytes, 1 op => success
8d0: f6e8 0005 b 14bc <__HALF_BANK_SIZE_+0x4bc> // 0 byte, 0 op => success
8d4: 764c 2400 ldr fp,[sp,+0x4]
8d8: b41b 2402 add sp,sp,16
8dc: 194f 0402 rts
*/

/*
* my conclusion ? Yes, I wanted a better management of the 8-byte code alignment for C under Epiphany.
* I come from an x86 background where gcc and other compilers handle this issue perfectly well.
* I did not find anything on this subject in the parallella examples or on the forum.
*
* From now on, I can use PERFECT_ALIGN8 for my needs and that's my small gift to the Parallella community.
* We'll talk about Don Quichotte's exploits for centuries, for sure ;)
*
* Last words...
* Some will say it's a "nearly perfect" solution since we can have RAW or WAW sequences with updating r63, preventing dual issues.
* Nothing prevents you from choosing an unused register, or have a second macro with another register - fp, r28, lr... - to avoid these improbable situations.
* You can even write a macro with the register of your choice as parameter... I consider this as trivial and my problem as solved :)

* Now my next challenge will be a macro for automagically forcing a 4-byte B<cond> instead of the standard 2-byte B<cond>, with -Ofast as usual.
* It should be a decisive path for exclusively producing 32-bit instructions, don't ask me why. I guess this challenge will be easier this time :P
*/
User avatar
DonQuichotte
 
Posts: 46
Joined: Fri Apr 29, 2016 9:58 pm

Re: Requests for improving the C compiler

Postby Olaf » Sat May 21, 2016 12:36 am

I am not familiar with RISC and RISC assembly yet but I am learning it. So I do not understand your code completely :-)
But I see that you also use the tricks that I used to get faster than the Intel C++ compiler (Back in 2001--2006) and their special image processing library.

Back in my x86 image processing I also heard about byte, word, dword, .... alignment to prevent stalls loading from memory.
But back in those days I could speed up with with CPU 10% by aligning along the cache lines that were 32 bytes aligned.
Back in those days I was literately hitting the boundaries of the CPU caches and I had to bypass them to be better than the competition ;-)

Now if you load a single dimensional data stream then, then one compiler alignment would be ok.
However in a 2 dimension data stream like in a image that is not dividable by 32 bytes size, would use the CPU cache less efficiently.
I did increase image processing speed by making sure that every image line started at a 32 byte boundary.

Example x = fill byte to make sure that "A" gets on a 32 byte boundary that matches the CPU cache line.

Source 2D image:
[ABCDDEFGH]
[ABCDDEFGH]
[ABCDDEFGH]
[ABCDDEFGH]
[ABCDDEFGH]

Pre-processed 2D image ready to have processing on
[xxxxxxxABCDDEFGHxxxx]
[xxxxxxxABCDDEFGHxxxx]
[xxxxxxxABCDDEFGHxxxx]
[xxxxxxxABCDDEFGHxxxx]
[xxxxxxxABCDDEFGHxxxx]

In memory this 2D image would be represented like this (single dimension)
[xxxxxxxABCDDEFGHxxxx][xxxxxxxABCDDEFGHxxxx][xxxxxxxABCDDEFGHxxxx][xxxxxxxABCDDEFGHxxxx][xxxxxxxABCDDEFGHxxxx]

Of course the code became more complex but I optimized the the data transfer from RAM to CPU cache.
No C++ compiler optimization was that smart enough to give that optimized result.

I did not come to these conclusions by logically deducing them, I did it by measuring them.
Olaf
 
Posts: 37
Joined: Sun May 08, 2016 8:47 pm

Re: Requests for improving the C compiler

Postby DonQuichotte » Sat May 21, 2016 11:30 pm

What kind of competition ? did you code for video games or graphics ?

Anyway. I started new topics "Assembly class" and "Assembly snippets" in the "Assembly" forum ; you're welcome Olaf :)
User avatar
DonQuichotte
 
Posts: 46
Joined: Fri Apr 29, 2016 9:58 pm

Re: Requests for improving the C compiler

Postby Olaf » Sun May 22, 2016 12:01 pm

Olaf
 
Posts: 37
Joined: Sun May 08, 2016 8:47 pm

Previous

Return to Programming Q & A

Who is online

Users browsing this forum: No registered users and 4 guests

cron