Running parallel, while writing 'regular' code

Hi, I'm watching the online lectures . Currently at lecture 4.2. While I consider myself as a complete newbie, I notice getting frustrated, not because of the information provided, or "complexity" (which is very do-able), or its sporadic vageness (like introducing matrix multiplication, without use-case), or his stutter... And I know this is all for CUDA and Nvidia chips. But... while he introduces methods and steps you need to be aware of, as a parallel programmer, it is all limited to (what I call) "best practice", which in-fact are (automatic) "optimizations" upon normal procedural programming.
So what came to mind: can we build an 'optimisator' for Parallela? A [LANGUAGE]*-compiler which implements all/most of those steps we have to _write_ ourselves as programmers, so we might worry about our programming logic instead of our memory management, blocks/tiles/threads and caches of shared memory/registries and such. And so, our 'regular' code will be executed in parallel (where we didn't think of implementing it [yet] as parallel).
*) a language like C, Python, Perl, or even a webscript in PHP, or any other.
So what came to mind: can we build an 'optimisator' for Parallela? A [LANGUAGE]*-compiler which implements all/most of those steps we have to _write_ ourselves as programmers, so we might worry about our programming logic instead of our memory management, blocks/tiles/threads and caches of shared memory/registries and such. And so, our 'regular' code will be executed in parallel (where we didn't think of implementing it [yet] as parallel).
*) a language like C, Python, Perl, or even a webscript in PHP, or any other.