Re: Loader bug.
Posted: Fri Apr 07, 2017 7:52 pm
You have already reproduced the bug when you admit that the code provided
https://github.com/joseluisquiroga/para ... lq-test-22
fails the assert when runned after boot.
If you look at the 672 lines file
https://github.com/joseluisquiroga/para ... ader_znq.c
you will see two functions
uint8_t*
bj_memload(uint8_t* dest, const uint8_t* src, bj_size_t sz){
bj_size_t idx = 0;
for(idx = 0; idx < sz; idx++){
bj_force_assig(dest[idx], src[idx]);
}
return dest;
}
void
bj_ck_memload(uint8_t* dst, uint8_t* src, size_t sz){
bool ok = true;
long aa;
for(aa = 0; aa < sz; aa++){
if(dst[aa] != src[aa]){
ok = false;
break;
}
}
if(! ok){
write_file("SOURCE_shd_mem_dump.dat", src, sz, false);
write_file("DEST_shd_mem_dump.dat", dst, sz, false);
bjh_abort_func(9, "bj_ck_memload() FAILED !! CODE_LOADING_FAILED !!\n");
}
}
Where bj_force_assig is just:
#define bj_force_assig(var, val) do { (var) = (val); } while((var) != (val));
As you can see these are trivial implementations of memcpy and memcmp.
The rest of the code is just almost copy paste plus some renaming and NO CHANGES to the VALUES PASSED to the memcpy call in the original function e_process_elf (there called bjl_process_elf) for shared mem code.
You can copy paste this functions and the define into the original code of e_load_group and then use them toghether with the PROVIDED elf if you want a reproduction with the ORIGINAL code. Making the obvius changes so that it compiles. We want these and NOT memcpy NOR memcmp because the latter have optimizations that MAY NOT ALLOW the process to be interrupted. We want that interruption so that the memory gets corrupted by some other process (kernel process corrupting the NON insulated shared memory of the kernel process controlling /dev/epiphany/mesh0), and so the assert fails in order to REPRODUCE the bug with the ORIGINAL code. The only user processes that should have access to the shared memory are the ones that have opened /dev/epiphany/mesh0. In the given examples I am VERY confident it does NOT happen.
REMEMBER I make "NO CHANGES to the VALUES PASSED to the memcpy call". The failure in the given examples happens between my "memcpy" and my "memcmp". What has JUST been "memcpy" ed is DIFFERENT when "memcmp" ed !!!!
It is the driver my friend.
Before all this I already knew there was a bug in e_load_group because I had checked the machine code loaded from the epiphany side of a function of mine in a weard case that was happening to my code: my program blocked the epiphany when linked with a file and did not when not linking with the same file (nothing was been called nor used of that file in the running code), so I decided to check from the ephifany side what was the loaded code in shared memory for that function, and found that the machine code (the memory in shared mem) was corrupted when linking with the big file so I decided to use the source code of e_load_group to track what was happening. You can see what I was doing in the first example. I was already planning to implement my own loader for my code (because I want my code to be able to load different epiphany elfs and a common base of shared code) so I renamed and did some other changes.
Yes I thought initially that the problem was with the mmap call. I WAS WRONG. That is why I load into heap the elf file (to check if it was the mmap func) in the second example. The mems cannot overlap if the file is in the heap (well they could but that would be another story). So then I thought I was in the e_alloc call of e_load_group. The one that is already well documented in the specs. It was not. I WAS WRONG. That is why there is a copy paste of the e_alloc in the last example.
I do not have the experience, not the time, nor the interest to get into your code and I would not let you get into my code if I were you.
I have never implemented a driver in my life so literally "I cannot help you with that".
Here is what if found in the internet about these kind of drivers (shared mem):
http://myclipnotes.blogspot.com.co/2015 ... odule.html
I do not see a call to remap_pfn_range in your code but "I have never implemented a driver in my life". So what do I know if there should be one.
REMEMBER I make "NO CHANGES to the VALUES PASSED to the memcpy call". The failure in the given examples happens between my "memcpy" and my "memcmp". What has JUST been "memcpy" ed is DIFFERENT when "memcmp" ed right AFTER the "memcpy" !!!!
It is the driver my friend.
And that is my take on this issue.
Shalom.
JLQ.
https://github.com/joseluisquiroga/para ... lq-test-22
fails the assert when runned after boot.
If you look at the 672 lines file
https://github.com/joseluisquiroga/para ... ader_znq.c
you will see two functions
uint8_t*
bj_memload(uint8_t* dest, const uint8_t* src, bj_size_t sz){
bj_size_t idx = 0;
for(idx = 0; idx < sz; idx++){
bj_force_assig(dest[idx], src[idx]);
}
return dest;
}
void
bj_ck_memload(uint8_t* dst, uint8_t* src, size_t sz){
bool ok = true;
long aa;
for(aa = 0; aa < sz; aa++){
if(dst[aa] != src[aa]){
ok = false;
break;
}
}
if(! ok){
write_file("SOURCE_shd_mem_dump.dat", src, sz, false);
write_file("DEST_shd_mem_dump.dat", dst, sz, false);
bjh_abort_func(9, "bj_ck_memload() FAILED !! CODE_LOADING_FAILED !!\n");
}
}
Where bj_force_assig is just:
#define bj_force_assig(var, val) do { (var) = (val); } while((var) != (val));
As you can see these are trivial implementations of memcpy and memcmp.
The rest of the code is just almost copy paste plus some renaming and NO CHANGES to the VALUES PASSED to the memcpy call in the original function e_process_elf (there called bjl_process_elf) for shared mem code.
You can copy paste this functions and the define into the original code of e_load_group and then use them toghether with the PROVIDED elf if you want a reproduction with the ORIGINAL code. Making the obvius changes so that it compiles. We want these and NOT memcpy NOR memcmp because the latter have optimizations that MAY NOT ALLOW the process to be interrupted. We want that interruption so that the memory gets corrupted by some other process (kernel process corrupting the NON insulated shared memory of the kernel process controlling /dev/epiphany/mesh0), and so the assert fails in order to REPRODUCE the bug with the ORIGINAL code. The only user processes that should have access to the shared memory are the ones that have opened /dev/epiphany/mesh0. In the given examples I am VERY confident it does NOT happen.
REMEMBER I make "NO CHANGES to the VALUES PASSED to the memcpy call". The failure in the given examples happens between my "memcpy" and my "memcmp". What has JUST been "memcpy" ed is DIFFERENT when "memcmp" ed !!!!
It is the driver my friend.
Before all this I already knew there was a bug in e_load_group because I had checked the machine code loaded from the epiphany side of a function of mine in a weard case that was happening to my code: my program blocked the epiphany when linked with a file and did not when not linking with the same file (nothing was been called nor used of that file in the running code), so I decided to check from the ephifany side what was the loaded code in shared memory for that function, and found that the machine code (the memory in shared mem) was corrupted when linking with the big file so I decided to use the source code of e_load_group to track what was happening. You can see what I was doing in the first example. I was already planning to implement my own loader for my code (because I want my code to be able to load different epiphany elfs and a common base of shared code) so I renamed and did some other changes.
Yes I thought initially that the problem was with the mmap call. I WAS WRONG. That is why I load into heap the elf file (to check if it was the mmap func) in the second example. The mems cannot overlap if the file is in the heap (well they could but that would be another story). So then I thought I was in the e_alloc call of e_load_group. The one that is already well documented in the specs. It was not. I WAS WRONG. That is why there is a copy paste of the e_alloc in the last example.
I do not have the experience, not the time, nor the interest to get into your code and I would not let you get into my code if I were you.
I have never implemented a driver in my life so literally "I cannot help you with that".
Here is what if found in the internet about these kind of drivers (shared mem):
http://myclipnotes.blogspot.com.co/2015 ... odule.html
I do not see a call to remap_pfn_range in your code but "I have never implemented a driver in my life". So what do I know if there should be one.
REMEMBER I make "NO CHANGES to the VALUES PASSED to the memcpy call". The failure in the given examples happens between my "memcpy" and my "memcmp". What has JUST been "memcpy" ed is DIFFERENT when "memcmp" ed right AFTER the "memcpy" !!!!
It is the driver my friend.
And that is my take on this issue.
Shalom.
JLQ.