Accessing arbitrary host virtual memory from Epiphany

Any technical questions about the Epiphany chip and Parallella HW Platform.

Moderator: aolofsson

Re: Accessing arbitrary host virtual memory from Epiphany

Postby dms1guy » Thu Dec 03, 2015 11:57 am

piotr5 wrote:btw, that epiphany flat memory model isn't secure isn't my own observation. I think someone commented on it during kickstarter. I definitely remember reading such a comment in the early days.


I think that the sheer simplicity of the Epiphany design is really elegant and that Andreas has removed so much of the 'baggage' from conventional processor design, and implemented an incredibly lean, but highly capable engine.

But there is a danger that we could end up 'gluing' back the bits that Andreas took out, piece by piece.

My feeling is that we should embrace the design philosophy of Epiphany and extend this design philosophy into the software.

Which means that either

1) Epiphany is used as a fixed function accelerator, in which case it is not unreasonable for programmers to be liable for producing correct code that can be tested and proved to not violate any design rules.

2) Epiphany is used as a general purpose engine, in which case there are no guarantees as to the correctness of any user code executed on it.

In scenario 2) all of the issues of memory security models come into play. Traditionally this has been solved by hardware (e.g. MMUs), but that is heading back up the road that Andreas just took us away from.

To me, the eminently sensible approach is to only allow arbitrary user code to run under the control of a software virtual machine, (i.e. a managed code environment) Then you can completely implement in software any security policy or run-time constraints you desire.

There have historically been performance concerns over software virtual machines, but these days JIT technology means that performance is no longer a serious issue. Further, most real world platforms today are in fact layers of multiple virtual machines. From programming environments like Java and C# to Hypervisors. We already have layers of software virtualisation on top of layers of lower-level software virtualisation. Having just a single layer of efficient software virtualisation will likely yield good results whilst protecting against hardware scope creep like MMU's and even CACHES .
User avatar
dms1guy
 
Posts: 21
Joined: Thu Sep 10, 2015 9:05 pm
Location: Isle of Man

Re: Accessing arbitrary host virtual memory from Epiphany

Postby sebraa » Thu Dec 03, 2015 2:24 pm

dms1guy wrote:As sebraa stated, I intend to convert the virtual address into a PFN (page frame number) by dividing it by the page size (which is 4096). Then use this to index into /proc/pid/pagemap (where pid = PID of the process executing on the host) to access a 64-bit entry, the lower 55 bits of which provide the PFN of the physical memory mapped to that virtual address.
You can use /proc/self/pagemap, which always points to your current PID.

dms1guy wrote:
sebraa wrote:However, I think that the Epiphany being able to access all (or most) physical memory is an oversight; for security reasons, the FPGA logic should ignore (or trap) accesses outside the shared memory area.
If the Epiphany is going to run 'arbitrary' code, then I agree that there is significant scope for introducing difficult to trace bugs through ARM memory corruption by the Epiphany side, and some kind of FPGA logic protection against this would be a good idea.
It is better to treat any code as "arbitrary", even if it is not. The JVM being a virtual machine / sandbox did not prevent people from breaking out of that environment in the first place, and putting the burden on the programmer is definitely not going to produce secure software.

dms1guy wrote:If the FPGA logic restricted the Epiphany having direct access to non-shared 1GB SDRAM regions, then the only way to implement my desired scheme is to copy all transfers via the 32MB shared memory region, which would effectively double the number of memory transfers required for operations where Epiphany reads or writes system memory.
You are trying to abuse a security oversight from the Adapteva team, which is all good and well, but definitely not a good idea for production-grade software. In that case, a simple memory restriction scheme (e.g. at least an application-managed bitmask inside the FPGA) should be in place somewhere. Needing to manually disable safeguards for some applications usually gets people thinking about what they do, and then it's acceptable in my opinion.

dms1guy wrote:But there is a danger that we could end up 'gluing' back the bits that Andreas took out, piece by piece.
Assuming that there is a reason they exist in most other systems, I see that as a reasonable concern, but one without good alternatives.

dms1guy wrote:1) Epiphany is used as a fixed function accelerator, in which case it is not unreasonable for programmers to be liable for producing correct code that can be tested and proved to not violate any design rules.
GPUs used to be fixed-function, but they aren't anymore, which gave us IOMMUs etc. Also, using a general-purpose engine for fixed functions is always a net loss in energy efficiency compared to special-purpose hardware (there are other trade-offs though, I'm aware). In any case, I think that would cripple the architecture.

dms1guy wrote:2) Epiphany is used as a general purpose engine, in which case there are no guarantees as to the correctness of any user code executed on it.
This is always the safe assumption (see above). However, the Epiphany is not designed to live on its own; (host-protecting) security-relevant stuff could be implemented on the interconnect, like it is done in most systems (the IOMMU is part of the chipset, not part of the GPU). Again, turning off safeguards is easily done; turning on missing safeguards is not.

dms1guy wrote:To me, the eminently sensible approach is to only allow arbitrary user code to run under the control of a software virtual machine, (i.e. a managed code environment) Then you can completely implement in software any security policy or run-time constraints you desire.
GPU drivers validate command streams; the eBPF infrastructure validates programs on load. However, for turing-complete accelerators, this is becomes equivalent to solving the halting problem, so it is not feasible for general Epiphany programming. In that case, you either need to do it at runtime (which causes overhead, increases energy consumption, and introduces bugs), or you do it in hardware. The latter is feasible if Epiphany is to be integrated in some bigger system.

dms1guy wrote:Having just a single layer of efficient software virtualisation will likely yield good results whilst protecting against hardware scope creep like MMU's and even CACHES .
The Epiphany instruction set is not efficiently virtualizable, since system-state chaning instructions (MOVFS, MOVTS) do not trap; this is why x86 virtualization required specific extensions.
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: Accessing arbitrary host virtual memory from Epiphany

Postby dms1guy » Thu Dec 03, 2015 4:33 pm

sebraa wrote:You can use /proc/self/pagemap, which always points to your current PID.


Thanks!
This saves having to construct the path in code after calling getpid()

sebraa wrote:It is better to treat any code as "arbitrary", even if it is not. The JVM being a virtual machine / sandbox did not prevent people from breaking out of that environment in the first place, and putting the burden on the programmer is definitely not going to produce secure software.


In this day and age, no system is all 'hard' ... typically there are many 'soft' components. Even hardware is written like software using HDLs, and micro-code is used within some state engines.

Clearly there is some code on which you have no choice but to get it right, and if bugs are found they are a serious issue and they have to be rectified quickly. We could refer to such 'system' software as 'firmware' for want of a better term.

The whole point of "managed code environments" is that there is a clear distinction between
"(1) the code that is being managed", and
"(2) the code that does the managing".

(2) should be considered firmware, which requires full access to all privileged system resources. Firmware should effectively be seen as the 'soft' part of the system architecture. Of course firmware can have bugs, but then so can the 'hard' part of the system (e.g. the FPGA). The point is that being part of the privileged core system architecture, it will become hardened over time as bugs are detected. ... Even processors can have microcode, which is the 'soft' part of a processor. The point being, you can't protect at the most privileged level of the system ... and some software components will always live at that level (e.g. kernel drivers). If a kernel driver crashes, even on a mainstream processor, likely it will take the system down as it is very difficult to guard against a highly privileged part of a system if it fails.

(1) on the other hand can can be completely locked into a 'sandbox', providing the virtual machine has an appropriately secure specification. The only reason people were able to break out of the JVM sandboxes was because the JVM spec made that possible. e.g. Java Native Interface.

So I disagree that it is better to treat any code as "arbitrary" as clearly you can't protect against faulty firmware which has access to all privileged resources. But you can very easily protect against badly behaving "managed code", which you definitely can sandbox to any desired level of protection.


dms1guy wrote:But there is a danger that we could end up 'gluing' back the bits that Andreas took out, piece by piece.

sebraa wrote:Assuming that there is a reason they exist in most other systems, I see that as a reasonable concern, but one without good alternatives.


Managed code is a highly effective alternative in my opinion, so long as you consider the managing environment a 'soft' part of the core system architecture, which is not unreasonable.

dms1guy wrote:To me, the eminently sensible approach is to only allow arbitrary user code to run under the control of a software virtual machine, (i.e. a managed code environment) Then you can completely implement in software any security policy or run-time constraints you desire.

sebraa wrote:GPU drivers validate command streams; the eBPF infrastructure validates programs on load. However, for turing-complete accelerators, this is becomes equivalent to solving the halting problem, so it is not feasible for general Epiphany programming. In that case, you either need to do it at runtime (which causes overhead, increases energy consumption, and introduces bugs), or you do it in hardware. The latter is feasible if Epiphany is to be integrated in some bigger system.


Validating programs on loading means that you can only perform static checks. "Managed code" environments typically use dynamic checks at run-time. In some ways it helps to see managed code environments like an interpreter engine (some are) ... but many use modern dynamic compilation techniques means that the "slow performance" and "increased energy consumption" arguments are not as strong as they once were, especially when you trade that off against the fine-grained security and other system management functions like intelligent cache management they can provide without adding to chip real-estate.


dms1guy wrote:Having just a single layer of efficient software virtualisation will likely yield good results whilst protecting against hardware scope creep like MMU's and even CACHES .

sebraa wrote:The Epiphany instruction set is not efficiently virtualizable, since system-state chaning instructions (MOVFS, MOVTS) do not trap; this is why x86 virtualization required specific extensions.


With managed code environments, it is not the Epiphany ISA that is virtualised, it is the Virtual ISA presented to the managed code that is virtualised. The efficiency of this is purely down to the virtual machine spec and the quality of the dynamic compilation method. Given the highly efficient and compact e-Core ISA and low-latency inter-core messaging, I suspect Epiphany could be one of the best targets ever for a dynamic compiler.

I can understand your reservations, but I remain highly optimistic about what is possible with dynamic compilers on Epiphany.
User avatar
dms1guy
 
Posts: 21
Joined: Thu Sep 10, 2015 9:05 pm
Location: Isle of Man

Re: Accessing arbitrary host virtual memory from Epiphany

Postby sebraa » Thu Dec 03, 2015 10:19 pm

dms1guy wrote:The whole point of "managed code environments" is that there is a clear distinction between
"(1) the code that is being managed", and
"(2) the code that does the managing".

In my world, and I know we follow different definitions here, I don't consider either of these firmware. Firmware is the software part which is baked into hardware to make it work (it can be replaceable/user-serviceable/whatever, but it is bound to the hardware). The managing code is what I consider runtime (could be part of the operating system, could be an application on its own), and the managed code is the application itself. In any case, it's splitting hairs and doesn't matter.

dms1guy wrote:So I disagree that it is better to treat any code as "arbitrary" as clearly you can't protect against faulty firmware which has access to all privileged resources.
A faulty firmware might still enable hardware protection (a malicious firmware obviously won't), and even if it ends up crashing, people will over time notice and fix the problems. In my opinion, this is strictly better than not having any hardware protection in the first place, risking silent data corruption instead. The point is, it only takes one small bug in the whole system to compromise your whole system.


dms1guy wrote:But you can very easily protect against badly behaving "managed code", which you definitely can sandbox to any desired level of protection.
You assume that (a) your runtime is bug-free, and (b) it doesn't matter if it isn't because you can't do anything about it. These are fair assumptions in theory, but really bad ones in practise. Because what happens in reality is that people just don't care enough, since it's a hard problem and runtime performance counts. It has been shown that vendors don't do sufficient command stream checking in embedded GPUs, basically allowing any OpenGL program to corrupt any area of physical memory. This is not a big concern as long as you trust your OpenGL applications. But then WebGL came around and now this is a huge problem. It also is directly related to your idea (with OpenGL shaders being your 'managed code', and the graphics driver being your 'managing code'). We will see more of this, I'm sure.


dms1guy wrote:Validating programs on loading means that you can only perform static checks. "Managed code" environments typically use dynamic checks at run-time. In some ways it helps to see managed code environments like an interpreter engine (some are) ... but many use modern dynamic compilation techniques means that the "slow performance" and "increased energy consumption" arguments are not as strong as they once were, especially when you trade that off against the fine-grained security and other system management functions like intelligent cache management they can provide without adding to chip real-estate.
I know. If the virtual ISA is sufficiently restricted, static validation might be sufficient. I don't think the Epiphany is able to actually run a dynamic recompiler and a reasonably-sized kernel including the data and a runtime checker on its own, just because there is a distinct lack of memory. So you'd need to offload parts of it to the host, where the increased latency means that you actually can't do dynamic run-time checking anymore, at least not in real time.

Basically, I consider purely software based security as non-existant. In a closed system, I don't need much security. But as soon as I allow injection of arbitrary code (managed or not) into the Epiphany, without any hardware protection... if any invalid pointer can wreak havoc on the host, I don't consider the system safe from corruption. And you never know when your closed system becomes an open system either. ;-)
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: Accessing arbitrary host virtual memory from Epiphany

Postby dms1guy » Fri Dec 04, 2015 9:46 am

dms1guy wrote:The whole point of "managed code environments" is that there is a clear distinction between
"(1) the code that is being managed", and
"(2) the code that does the managing".

sebraa wrote:In my world, and I know we follow different definitions here, I don't consider either of these firmware. Firmware is the software part which is baked into hardware to make it work (it can be replaceable/user-serviceable/whatever, but it is bound to the hardware). The managing code is what I consider runtime (could be part of the operating system, could be an application on its own), and the managed code is the application itself. In any case, it's splitting hairs and doesn't matter.


We actually appear to agree on what firmware is ... but not on "managing code being firmware". So to further illustrate my contention, I will simply cite an example of the Intel Itanium ... it is what I would call a co-designed processor, in that the whole point of VLIW processor architecture is that a minimum fully operational system requires not just the processor hardware, but firmware which essentially replaces the hardware super-scalar unit, so code scheduling is now the responsibility of the firmware. A commercial example of this is the Transmeta processors, which using the very techniques I am suggesting actually reduced power consumption significantly and ran at reasonable performance ... they essentially ran a VLIW processor with the managing runtime provided within the processor ... and in some cases the managing runtime was upgradeable. This technique of dynamic binary translation is well established and Transmeta is a notable use-case. I don't think there can be any doubt that the run-time in the case of Transmeta is firmware as the processor has no super-scalar capability, it is expected to be provided by the run-time. This is exactly what I am proposing is the firmware. .... if you have a bug in the firmware, it is no different to if you have a bug in the hardware .. you have to get them both right. Sure you can silently corrupt memory ... but the hardware can silently enter metastable states or even suffer single-event-upsets. The processor hardware and firmware in the case of Transmeta for example are clearly both SYSTEM components, not USER components, with one simply being 'soft' and the other being 'hard'.

dms1guy wrote:So I disagree that it is better to treat any code as "arbitrary" as clearly you can't protect against faulty firmware which has access to all privileged resources.

sebraa wrote:A faulty firmware might still enable hardware protection (a malicious firmware obviously won't), and even if it ends up crashing, people will over time notice and fix the problems. In my opinion, this is strictly better than not having any hardware protection in the first place, risking silent data corruption instead. The point is, it only takes one small bug in the whole system to compromise your whole system.


I agree that any hardware assistance that makes difficult to track errors more visible is highly desirable.

dms1guy wrote:But you can very easily protect against badly behaving "managed code", which you definitely can sandbox to any desired level of protection.

sebraa wrote:You assume that (a) your runtime is bug-free, and (b) it doesn't matter if it isn't because you can't do anything about it. These are fair assumptions in theory, but really bad ones in practise. Because what happens in reality is that people just don't care enough, since it's a hard problem and runtime performance counts. It has been shown that vendors don't do sufficient command stream checking in embedded GPUs, basically allowing any OpenGL program to corrupt any area of physical memory. This is not a big concern as long as you trust your OpenGL applications. But then WebGL came around and now this is a huge problem. It also is directly related to your idea (with OpenGL shaders being your 'managed code', and the graphics driver being your 'managing code'). We will see more of this, I'm sure.


As you say, these GPU examples clearly show that run-time performance has been prioritised at the expense of many other design considerations, not least being system security. But that shouldn't be used as a case for negating the tremendous opportunities for processors that deploy a dynamic compiling run-time. It's all a question of which trade-offs you want in your design. You can just as easily bias a design towards security and being able to run any virtual instruction set. Of course, if you just want to squeeze every last cycle out of the system, then there are temptations to minimise any virtualisation to only a token level ... i.e. do a little bit at load time. ... But as it turns out, this is very short sighted, as the cycles saved in doing this pale into insignificant against the cycles lost by poor scheduling, especially when it comes to memory and I/O accesses which stall the core. Modern run-times are able to schedule much more efficiently, and what they lose in cycles doing extra scheduling work, they more than gain back by scheduling around core stalls and keeping the pipeline moving.

Of course I assume that the run-time is bug free ... until there is evidence that it isn't, at which point correcting that bug is a very high priority. It's like a "theory" ... you follow it until it breaks, at which point you amend it. BUT ... you can never draw a line under it and call it a LAW because you can never prove, only disprove. This is also the case with system components (soft or hard). I certainly have never assumed that is "doesn't matter that firmware is not bug free".


dms1guy wrote:Validating programs on loading means that you can only perform static checks. "Managed code" environments typically use dynamic checks at run-time. In some ways it helps to see managed code environments like an interpreter engine (some are) ... but many use modern dynamic compilation techniques means that the "slow performance" and "increased energy consumption" arguments are not as strong as they once were, especially when you trade that off against the fine-grained security and other system management functions like intelligent cache management they can provide without adding to chip real-estate.

sebraa wrote:I know. If the virtual ISA is sufficiently restricted, static validation might be sufficient. I don't think the Epiphany is able to actually run a dynamic recompiler and a reasonably-sized kernel including the data and a runtime checker on its own, just because there is a distinct lack of memory. So you'd need to offload parts of it to the host, where the increased latency means that you actually can't do dynamic run-time checking anymore, at least not in real time.

Basically, I consider purely software based security as non-existant. In a closed system, I don't need much security. But as soon as I allow injection of arbitrary code (managed or not) into the Epiphany, without any hardware protection... if any invalid pointer can wreak havoc on the host, I don't consider the system safe from corruption. And you never know when your closed system becomes an open system either. ;-)


If dynamic compilers and dynamic binary translators is not your area of core expertise, then it is quite reasonable to assume that it is not possible to run a high-performance, full-blooded dynamic binary translator firmware within the limited Epiphany on-chip memory.

But from my perspective, I have spent years delivering commercial-grade high-performance, full-blooded dynamic binary translators, including emulating a 2200 mainframe CPU on an Itanium (for Unisys), and MIPS translations to Itanium (for SGI), as well as PowerPC to Itanium (the Apple Rosetta translator) .. I am more comfortable writing assembler than I am writing high-level languages, and routinely work on projects that have only 32K (or less) of on-chip memory. On the Itanium, I was working within L1 cache most of the time, which is of this order, but the instruction sizes were very large.

In my opinion, by using mainly 16-bit Epiphany instructions, not only can the translator be as good as any of the translators I have worked on for Itanium, I believe it can be better, because on the Itanium, I was constantly fighting against the cache hierarchy to direct data movement, whilst on the Epiphany, I can explicitly manage the data movement, and I have very fast inter-core transfers, so the work would be distributed across multiple cores (and multiple 32K on-chip regions). .... Of course, these are just words, and the proof of the pudding is in the easting, so we shall see if it either is or isn't possible as I am already building an experimental translator on Epiphany.
User avatar
dms1guy
 
Posts: 21
Joined: Thu Sep 10, 2015 9:05 pm
Location: Isle of Man

Re: Accessing arbitrary host virtual memory from Epiphany

Postby sebraa » Fri Dec 04, 2015 3:46 pm

My only experience with Transmeta CPUs was ... bad. As in "the CPU got used to [application A] and after exiting it took minutes to return to [application B]", or when a 500 MHz processor feels much slower than a 486/33. So I am definitely burned a bit there. ;-) Also, I have seen an AVR-based Z80 emulator, so I know that it is possible. A table-driven instruction translator can be very small indeed, but I assumed that a dynamic recompiler would do more than that.

In any case, since I am obviously missing expertise there, do you have any useful references? Especially the "... as the cycles saved in doing this pale into insignificant against the cycles lost by poor scheduling" part is very interesting to me; how can a runtime work around stalls (how does it detect whether an instruction will stall without actually executing it)? Or would this be on some higher level instead of instruction-granularity?
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: Accessing arbitrary host virtual memory from Epiphany

Postby dms1guy » Sat Dec 05, 2015 4:44 pm

sebraa wrote:My only experience with Transmeta CPUs was ... bad. As in "the CPU got used to [application A] and after exiting it took minutes to return to [application B]", or when a 500 MHz processor feels much slower than a 486/33. So I am definitely burned a bit there. ;-)


I wasn't advocating for the efficiency of Transmeta products, but merely using them to illustrate what I meant when I was referring to the translator as firmware and part of the system. So in the case of Transmeta, the dynamic binary translator (DBT) firmware is distributed inside the processor chip)

sebraa wrote:Also, I have seen an AVR-based Z80 emulator, so I know that it is possible. A table-driven instruction translator can be very small indeed, but I assumed that a dynamic recompiler would do more than that.


You are correct in your assumption.

sebraa wrote:In any case, since I am obviously missing expertise there, do you have any useful references? Especially the "... as the cycles saved in doing this pale into insignificant against the cycles lost by poor scheduling" part is very interesting to me; how can a runtime work around stalls (how does it detect whether an instruction will stall without actually executing it)? Or would this be on some higher level instead of instruction-granularity?


It can indeed be done at a higher level of instruction-granularity.

If you examine the counters in a performance monitoring unit for processors that support them, you will typically see that around 30% to 50% of the time, the pipeline is stalled due to cache misses ... i.e. external memory accesses are a prime source of pipeline stalls.

Simply reading from the external memory will stall the pipeline of the requesting e-Core for as long as it takes to bring that data back. For a single read transaction, this could be 10's or even hundreds of cycles.

By using a Decoupled Access Execute (DAE) memory access strategy, i.e. in this case DMA, you can issue the memory read request, then check in a non-blocking manner (or be interrupted) when the read request has been completed. Similarly with I/O operations.

In this way, you can keep the processor moving and only stalling when it runs out of work, rather than when it got caught waiting for a specific I/O operation to complete.

On-chip memory access times are fast enough to be considered non-blocking.
User avatar
dms1guy
 
Posts: 21
Joined: Thu Sep 10, 2015 9:05 pm
Location: Isle of Man

Re: Accessing arbitrary host virtual memory from Epiphany

Postby sebraa » Sun Dec 06, 2015 10:35 pm

dms1guy wrote:I wasn't advocating for the efficiency of Transmeta products, but merely using them to illustrate what I meant when I was referring to the translator as firmware and part of the system. So in the case of Transmeta, the dynamic binary translator (DBT) firmware is distributed inside the processor chip)
I agree. However, I still think their decision was a bad one. If the system would have been able to actually execute their native code, I believe performance would have been better. Apple did a similar thing when they switched to PPC (the m68k-emulator was part of the OS, so not what I'd call firmware). MacOS was basically coded in m68k, but since each release contained more native PPC-code, they got nice speedups for each system release.

dms1guy wrote:
sebraa wrote:Also, I have seen an AVR-based Z80 emulator, so I know that it is possible. A table-driven instruction translator can be very small indeed, but I assumed that a dynamic recompiler would do more than that.
You are correct in your assumption.
The Z80 emulator I refer to needs about 4-8 KB of AVR-code. I would assume that a simple translator for the Epiphany would weigh somewhat similar (less instructions, but 32-bit instead). Additional functionality would increase that number, so the memory usage would be quite heavy; or am I mistaken?

I just wonder whether a "software-based execution engine" can outperform native code, assuming that the native code is at least somewhat fitting to the underlying hardware. For generic stuff (the examples you gave, i.e. the Mac 68k emulator, Rosetta or the Transmeta Code Morphing), it might actually work reasonably well. On the other hand, I am still not sure whether dynamic recompiling for the Epiphany is a good idea (apart from software portability, which is kind of a given with that approach). Well, let's wait and see; I hope you'll publish your results. It's going to be interesting. ;-)
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: Accessing arbitrary host virtual memory from Epiphany

Postby dms1guy » Mon Dec 07, 2015 11:36 am

dms1guy wrote:I wasn't advocating for the efficiency of Transmeta products, but merely using them to illustrate what I meant when I was referring to the translator as firmware and part of the system. So in the case of Transmeta, the dynamic binary translator (DBT) firmware is distributed inside the processor chip)

sebraa wrote:I agree. However, I still think their decision was a bad one. If the system would have been able to actually execute their native code, I believe performance would have been better. Apple did a similar thing when they switched to PPC (the m68k-emulator was part of the OS, so not what I'd call firmware). MacOS was basically coded in m68k, but since each release contained more native PPC-code, they got nice speedups for each system release.


I guess whether it was a good or a bad decision depends on their objectives.

If we were to measure success based on:

1) commercial objectives
2) technical objectives of low power x86 compatible processors with "good enough" performance

Then consider the following:

Wikipedia on Transmeta wrote:On November 7, 2000 (election day), Transmeta had their initial public offering at the price of $21 a share. The value reached a high of $50.26 before settling down to $46 a share on opening day. This made Transmeta the last of the great high tech IPOs of the dot-com bubble. Their opening day performance would not be surpassed until Google’s IPO in 2004.

Wikipedia on Transmeta wrote:AMD invested $7.5 million in Transmeta, planning to use the company’s patent portfolio in energy-efficient technologies.

Wikipedia on Transmeta wrote:On October 24, 2007, Transmeta announced an agreement to settle its lawsuit against Intel Corporation. Intel agreed to pay $150 million upfront and $20 million per year for five years to Transmeta in addition to dropping its counterclaims against Transmeta. Transmeta also agreed to license several of its patents and assign a small portfolio of patents to Intel as part of the deal.[12] Transmeta also agreed to never manufacture x86 compatible processors again. One significant sore point in the Intel litigation was the payout of approximately $34M to three of Transmeta's executives.[26][27] In late 2008, Intel and Transmeta reached a further agreement to transfer the $20 million per year in one lump sum.

Wikipedia on Transmeta wrote:On August 8, 2008, Transmeta announced that it had licensed its LongRun and low power chip technologies to Nvidia for a one time license fee of $25 million.

Wikipedia on Transmeta wrote:On November 17, Transmeta announced the signing of a definitive agreement to be acquired by Novafora, a digital video processor company based in San Diego, California, for $255.6 million in cash, subject to adjustments dependent on working capital.[28] The deal was finalized on January 28, 2009, when Novafora announced the completion of its acquisition of Transmeta.

Wikipedia on Transmeta wrote:Transmeta received a total of $969M in funding during its lifetime.


Also, Transitive Ltd, who's core business is dynamic binary translators were acquired by IBM and had a full order book of customers looking for migration solutions for legacy code bases to execute on new target processor ISAs.



sebraa wrote:Also, I have seen an AVR-based Z80 emulator, so I know that it is possible. A table-driven instruction translator can be very small indeed, but I assumed that a dynamic recompiler would do more than that.

dms1guy wrote:You are correct in your assumption.

sebraa wrote:The Z80 emulator I refer to needs about 4-8 KB of AVR-code. I would assume that a simple translator for the Epiphany would weigh somewhat similar (less instructions, but 32-bit instead). Additional functionality would increase that number, so the memory usage would be quite heavy; or am I mistaken?


There are generally two approaches to translator execution engines
1. Interpreter [classical pros: small memory footrint, cons: low performance]
2. Dynamic Compiler [classical pros: relatively high performance, cons: high memory footprint]

What you are describing above is likely an Interpreter engine, hence the very small footprint.

Dynamic compilers do require a much larger memory footprint, but that does include system memory. It is possible to architect a solution that reserves on-chip memory to accelerate certain key functions ... so it is not a like-for-like comparison.


sebraa wrote:I just wonder whether a "software-based execution engine" can outperform native code, assuming that the native code is at least somewhat fitting to the underlying hardware. For generic stuff (the examples you gave, i.e. the Mac 68k emulator, Rosetta or the Transmeta Code Morphing), it might actually work reasonably well. On the other hand, I am still not sure whether dynamic recompiling for the Epiphany is a good idea (apart from software portability, which is kind of a given with that approach). Well, let's wait and see; I hope you'll publish your results. It's going to be interesting. ;-)


The question you pose is the core question that is often posed about dynamic binary translation .. and the jury is still out on the final verdict.

To date,(at the risk of over-simplification)
Interpreters in general typically achieve around 20% of the performance of native code,
whilst dynamic binary translators typically achieve in the 50% to 75% region.

So at first glance it would appear that your reservations are well founded.

But the reason the jury is still out is that the theory of dynamic compilers suggests that it is possible to achieve in excess of 100%, for the simple reason that the output of a dynamic compiler is in fact native code. So we are actually comparing [dynamically scheduled] native code versus [statically scheduled] native code.

dynamically scheduled native code is generated by dynamically instrumenting virtual ISAs to identify memory access patterns, blocking code sequences and other hazards, and dynamically scheduling code to optimise these workloads factoring in run-time conditions to provide better memory access patterns and to schedule around hazards. So dynamically scheduled code could potentially perform better than statically scheduled code, subject to an optimal implementation.

But that's the theory. Then we meet the real world, where we find that dynamically scheduled code tends to have a significantly higher incidence of branches compared to statically scheduled code, with contemporary approaches to dynamic binary translation, which negatively offsets any performance gained through better scheduling.
User avatar
dms1guy
 
Posts: 21
Joined: Thu Sep 10, 2015 9:05 pm
Location: Isle of Man

Previous

Return to Epiphany and Parallella Q & A

Who is online

Users browsing this forum: No registered users and 5 guests

cron