Parallella Community

by **upcFrost** » Tue Apr 04, 2017 4:05 pm

I'll probably use this topic to update the backend status from time to time

So, today i've managed to compile and run dotproduct demo. Yay. The instruction choice quality is, well, meh. Still can't figure out how to get rid of load/store pairs generated due to the default calling convention. Probably i'll need to adjust the frontend a bit, so that it will at least know about the number of regs.
Anyway, it works. I'll try to compile other demos tomorrow. The biggest PITA is the FPU config flag, as it requires additional pass to set, and this pass is far from being perfect.

by **jar** » Wed Apr 05, 2017 5:33 am

by **upcFrost** » Wed Apr 05, 2017 11:17 pm

by **upcFrost** » Tue Apr 11, 2017 3:55 pm

Today finally got through the "Hello World" example. Actually it's not as simple as it might be seen. As e_write function takes 6 args, and we only have 4 scratch regs, mem placement for the last 2 args was failing. Fixed now.
Also, tried running basic_math example. After some jumping around, it compiled and even managed to complete half of the tests correctly. Performance-wise still... meh :roll:

by **upcFrost** » Thu Apr 13, 2017 9:50 am

Basic_math example works now. The result is, well, comparable with e-gcc except division, as I'm currently using standard __divsf2, not __fast_recipsf2 optimized for E16. Some difference in results comes from LLVM scheduling (+- 4 cycles)
With this i'd say that basic functionality is added, so it's time to do some optimization and bugfixes

Also, I've updated patch and readme in 64bit branch. I'll probably merge it into main branch in a couple of days, maybe even today.

by **jar** » Thu Apr 13, 2017 1:54 pm

Some thoughts on compiler optimizations that I would like to see based on my experience with GCC:

1) Load/Store Postmodify in array lookups and loops should be used instead of an arithmetic operation to increment/decrement an index register (an unnecessary instruction and extra clock)
2) Mask operations that can be replaced with bitwise operations for smaller/faster code
3) Hardware loops are fun, but I think this may be a challenge to get right since the code layout has to be just right. It's often larger code and actually slower for very small loops so it must be used judiciously. GCC doesn't touch them.
4) Preferential use of r0-r3 in leaf functions for smaller code. GCC seems to just use the higher registers as if they were valued the same as r0-r3, which enable 16-bit instructions.
5) Dual-issue loads/stores and FPU operations. You can sometimes move around instructions to improve performance. You can also zero initialize registers early with the FPU (fsub rx, rx, rx) rather than a mov instruction (but that may be a 32-bit instruction instead of 16-bits in some cases).

In hand-writing some assembly routines, I have copied one of the r0-r3 registers to a higher register to free it up in order to save instruction space (despite costing one 32-bit instruction and one clock cycle). The design tradeoff space is huge despite being a RISC architecture.

Good luck

by **upcFrost** » Thu Apr 20, 2017 1:09 pm

by **upcFrost** » Fri May 12, 2017 6:55 am

Still working. Found a couple of not-so-nice bugs, fixed (i hope so). Now trying to implement inline asm parser. There's a stub, but it doesn't work for ops that require parameters (so, basically with pretty much everything except idle).

With this all examples should finally work.

by **upcFrost** » Sun May 21, 2017 10:30 pm

Trying to make all examples running. Currently got stuck with inline asm processing. Hope to fix it during the next week

by **upcFrost** » Mon Jun 05, 2017 4:23 pm

Inline asm parser is ready, at least for now. Today fixed config reg setting for FPU/IALU2 (previously it was unable to handle mixed functions) and debugger (it was 16 bits off the actual value).
Currently trying to make matmul-16 work. At the moment it works literally 50/50 - first four values are ok, next four values are out, repeat. This example is a bit too much for me to handle using e-GDB disasm mode, but now source debug should work as well.

Parallella Community

Current status

Current status

Re: Current status

Re: Current status

Re: Current status

Re: Current status

Re: Current status

Re: Current status

Re: Current status

Re: Current status

Re: Current status

Who is online