Page 1 of 2
Memcpy slow compared to manual copy of data
Posted:
Thu Jun 15, 2017 12:40 pm
by pascallj
Re: Memcpy slow compared to manual copy of data
Posted:
Thu Jun 15, 2017 2:03 pm
by pascallj
Think I got at least one problem figured out. If copy memcpy from into my own code, it results in roughly the same benchmark values as when I adapt my own code to copy 8 instead of 64 bits. So I guess the low memcpy performance comes from the standard C library being in the external dram instead of SRAM.
However I am still wondering why O3 will, instead if optimizing my code for speed, makes my code so much slower.
Re: Memcpy slow compared to manual copy of data
Posted:
Thu Jun 15, 2017 4:06 pm
by sebraa
Do not execute any code from external memory if you want reasonable performance. Use internal.ldf as linker script.
Re: Memcpy slow compared to manual copy of data
Posted:
Thu Jun 15, 2017 4:23 pm
by pascallj
Re: Memcpy slow compared to manual copy of data
Posted:
Thu Jun 15, 2017 5:08 pm
by jar
You can use my shmemx_memcy routine if you need best performance. Nothing else beats it while also handling address misalignment.
https://github.com/USArmyResearchLab/op ... x_memcpy.c
Re: Memcpy slow compared to manual copy of data
Posted:
Thu Jun 15, 2017 5:25 pm
by pascallj
Thanks! I am using the Parallella board for my bachelor thesis so maybe it is nice to compare the results I finally get with your implementation.
Re: Memcpy slow compared to manual copy of data
Posted:
Fri Jun 16, 2017 7:44 am
by GreggChandler
Re: Memcpy slow compared to manual copy of data
Posted:
Fri Jun 16, 2017 8:46 am
by GreggChandler
Re: Memcpy slow compared to manual copy of data
Posted:
Fri Jun 16, 2017 12:16 pm
by sebraa
Re: Memcpy slow compared to manual copy of data
Posted:
Fri Jun 16, 2017 3:09 pm
by DonQuichotte
Totally agree with sebraa.
Performance is my only goal.
Use internal.ldf or nothing.
Avoid using the libraries as much as possible.
gcc bit handling emulation for example is really poor, I had to replace them with my own (popcount, clz).