2200vm - An emulation of OS 2200

2200vm - An emulation of OS 2200

Postby dms1guy » Fri Sep 11, 2015 4:43 pm

1. Background
Apple has successfully used Dynamic Binary Translation (DBT) to port their software base from a 68000 platform to a PowerPC platform, and then again to an x86 platform.

I had the pleasure of working for Transitive in Manchester, UK during the period when Transitive delivered the Rosetta product for Apple which enabled Mac PowerPC Apps to run on x86. Transitive were eventually acquired by IBM.

2. Unisys looks to port 2200 apps to Itanium
Shortly after Intel announced Itanium 2, Unisys explored shutting down their 2200 mainframe processor division and running their 2200 mainframe applications on Itanium using DBT. They initially developed an in-house 'interpreter' solution which worked, but was dog-slow. So they commissioned Transitive to develop a DBT solution. I had worked earlier on the 'IRIX on MIPS' to 'Linux on Itanium' translator and so ended up on the team tasked with delivering the 2200 to Itanium DBT product.

Whilst the translation model Transitive used worked well for x86, PowerPC, MIPS, ARM and that style of processor, it struggled with the 2200 processor (called the XPA Instruction Processor). The reasons were that the XPA saw memory as 36-bit words, could atomically access 18-bit and 9-bit sub-words, and had complex memory management registers,as well as 1's complement arithmetic and a whole host of other issues. The end result was a DBT engine that did not perform much better than the interpreter Unisys had developed.

3. The 2200vm Project is born
I had warned the team at Transitive that they needed a new DBT architecture, but I was voted down and the end result was exactly what I had warned about at the start of the project. This left me with a sense of "unfinished business" for many years, and I am still convinced that I can get XPA code to run well using DBT, but I have been waiting for the right target processor to be commercially available.

It is generally agreed within DBT circles that the best DBT target processors are VLIW architectures, hence Transmeta used a VLIW and Intel designed Itanium as a VLIW so it would be a good DBT target.

In my opinion, VLIW is the wrong way to go, and a many-core DSP-style processor, with directly addressable memory banks and short pipelines, like Epiphany is much closer to the ideal architecture.of a processor that could perform DBT optimally. The biggest bain of my life is cache. On the Itanium, there are 3 levels of cache. But for DBT, you need directly addressable on-chip memory.

So having completed my initial evaluation of the Epiphany architecture as a potential DBT target, I have decided to finally write the XPA-to-target DBTengine I have been itching to do for many years, and to use the 16-core Epiphany as the target.

4. Project Goals
Emulating just a CPU is not much use without an OS. So the emulation will include the core executive of OS 2200. The idea being to be able to run legacy 2200 apps on Epiphany.

I have named my project "2200vm" as it is a VIRTUAL MACHINE emulating an OS 2200 executive running on a Unisys 2200 mainframe.

At this stage I am doing this out of pure interest, to see how well my ideas on data-driven instruction scheduling hold up, and I intend to produce some benchmarks to compare the original Itanium 2 performance ( 1.6 GHz, 128-bit data paths ) with the Parallella board (600 MHz, 64-bit data paths ). Whilst it looks a bit of a David vs Goliath situation, I suspect the end results could raise some eyebrows.

Once I have a basic working DBT engine running, it is my intention to produce a web-based console so that anyone who wishes to have a play, or run their own benchmarks can have a go at running a mainframe emulation on a Parallella board. I also intend to post any relevant updates to the project on this thread.

In any event, I have just launched the project and just wanted to raise awareness of my project to the Parallella forum members in case anyone has any interesting ideas, thoughts or observations they would like to share on this project, or on DBT on Epiphany in general.
User avatar
dms1guy
 
Posts: 21
Joined: Thu Sep 10, 2015 9:05 pm
Location: Isle of Man

Re: 2200vm - An emulation of OS 2200

Postby dms1guy » Sat Sep 12, 2015 9:53 pm

Project Update - Completed Design of Memory Management Module


1. Taking full advantage of the concurrently accessible e-Core SRAM banks

1.1 Two separate banks used for executable code and static base
  • Enabling e-Core to operate as a Harvard architecture
1.2 Two separate banks used for caching external memory Objects
  • Enabling concurrent I/O with two external data sources

2. Decoupled Access-Execute Architecture

2.1 DMA instead of Register Loads/Stores from/to External Memory
  • Register loads from external memory negatively impact interrupt response latency
  • Register loads from external memory stall the execution pipeline
  • Using DMA for accessing external memory decouples the execution core from data access latencies
2.2 External Memory transfers always in data blocks
  • As external memory transfers take a significant amount of time, it is more efficient to transfer a block of I/O data rather than a single data item.

3. Software Cache

3.1 Set Associative Cache
  • DMA access to external memory to be driven by a software-implemented 'Set Associative' cache.
  • To keep the Translation Lookaside Buffer (TLB) size manageable, opted for 512 TLB slots, with slots grouped into sets of 8 i.e. an 8-Way Set Associative Scheme.
3.2 Concurrency Enhancement
  • Cache line data not to be stored in the TLB, but to be stored, under user direction, in one of two possible I/O banks, to enable overlapping DMA transfers.
  • Cache line data to comprise a software 'Object rather than a typical fixed cache line size of 32, 64, 128 or 256 bytes'.

4. Heap Management

4.1 System Memory & I/O Heaps
  • To be implemented as paged rather than segmented heaps.
  • Fast
  • Small footprint
  • Minimises garbage collection overhead.
  • Can simulate segmented heap to translated code.
User avatar
dms1guy
 
Posts: 21
Joined: Thu Sep 10, 2015 9:05 pm
Location: Isle of Man

Re: 2200vm - An emulation of OS 2200

Postby dms1guy » Sun Sep 13, 2015 9:38 am

Background Reading on Dynamic Binary Translation

For anyone interesting in learning about Dynamic Binary Translation, here is a useful set of links to reading material to get you kick-started.

1. About Transmeta
The first link is background on Transmeta who for a while shook up the whole processor design space by suggesting that:
Transmeta’s Code Morphing technology changes the entire approach to designing microprocessors. By
demonstrating that practical microprocessors can be implemented as hardware-software hybrids,
Transmeta has dramatically expanded the design space that microprocessor designers can explore for
optimum solutions. Microprocessor development teams may now enlist software experts and expertise,
working largely in parallel with hardware engineers to bring products to market faster. Upgrades to the
software portion of a microprocessor can be rolled out independently from the chip. Finally, decoupling
the hardware design from the system and application software that use it frees hardware designers to
evolve and eventually replace their designs without perturbing legacy software.

I personally don't think there is anything wrong with this vision, quite the contrary. Clearly their efforts to realise this vision were defeated by both commercial and technical obstacles. The technical obstacles I personally feel can be overcome, and hopefully we'll see that demonstrated in the 2200vm project in due course, as emulating the 2200's (XPA) processor is arguably the most difficult DBT challenge of any processor emulation project.

The commercial potential of getting DBT right:
On November 7, 2000 (election day), Transmeta had their initial public offering at the price of $21 a share. The value reached a high of $50.26 before settling down to $46 a share on opening day. This made Transmeta the last of the great high tech IPOs of the dot-com bubble. Their opening day performance would not be surpassed until Google’s IPO in 2004.

The company was once named as the Most important company in Silicon Valley in an Upside magazine editorial.

Transmeta received a total of $969M in funding during its lifetime.



2. An in-depth look at Transmeta's processor with integrated DBT software

3. Lecture on Dynamic Binary Translation

4. Original technical Paper on Transmeta Code Morphing Software
User avatar
dms1guy
 
Posts: 21
Joined: Thu Sep 10, 2015 9:05 pm
Location: Isle of Man

Re: 2200vm - An emulation of OS 2200

Postby dms1guy » Wed Sep 16, 2015 8:15 am

Project Update - Started Design of Kernel Scheduler Module

  • Traditionally each execution thread is scheduled onto a single core in a cooperative or pre-emptive manner, with threads being queued in prioritised ready queues.

  • Taking into account the very low inter-eCore latencies, I am exploring the possibility of implementing a kind of 'Scalar Operand Network' (SON) that spans multiple eCores, so that the scheduler can schedule jobs in a 'stream computing'-like manner, rather than just simple threads. This likely will increase the scope for parallel execution.
User avatar
dms1guy
 
Posts: 21
Joined: Thu Sep 10, 2015 9:05 pm
Location: Isle of Man

Re: 2200vm - An emulation of OS 2200

Postby jar » Wed Sep 16, 2015 1:54 pm

Sounds like an interesting project, and fortunately the Epiphany has a surplus of registers. Have you performed any analysis to determine the size of the translated binary relative to the original binary? I would think that, for the memory constraints of this architecture, it may be more effective to use direct threaded dispatch in some cases rather than DBT. My thoughts are that the minor dispatch overhead would be smaller than additional off-chip memory accesses due to a larger translated binary.
User avatar
jar
 
Posts: 292
Joined: Mon Dec 17, 2012 3:27 am

Re: 2200vm - An emulation of OS 2200

Postby dms1guy » Wed Sep 16, 2015 3:27 pm

@jar ... I agree that as a rule, (even at only 600 MHz), it is better to trade off extra processor cycles against off-chip memory accesses.

As you say, if the translation process results in a larger binary (often the case for DBT) then there is merit in exploring interpreting as an approach, with direct-threaded despatch being a good trade-off for performance vs storage costs.

Whilst I haven't done any detailed analysis yet on original vs translated binary, a quick rule of thumb tells me that
  • the original binary has 36-bit instructions that target a register file,
  • whilst the translated binary will have 16-bit instructions accommodating 3-bit immediate values sufficient to index into 32-byte data frames and address r0..r7
My gut tells me that we are not looking at the typical scenario here of DBT putting extra pressure on the on-chip resources.

Also, DBT performs a dependency analysis of the original instructions, generating a data-flow graph (DFG) exposing any ILP and TLP opportunities, and I am looking to implement an execution architecture that will exploit any such parallel execution opportunities.This will almost certainly mean:
  • (a) Looking for opportunities to do static dual-issue of instructions per eCore
  • (b) Implement some kind of a Scalar Operand Network (SON) to enable parallel execution of translated instructions across more than one eCore.

I won't know whether DBT or direct-threaded is the way to go until I get a better idea of how I want to handle the dual instruction issue and SON aspects of the design. It may be that direct-threaded yields additional benefits when examined in this context.
User avatar
dms1guy
 
Posts: 21
Joined: Thu Sep 10, 2015 9:05 pm
Location: Isle of Man

Re: 2200vm - An emulation of OS 2200

Postby dms1guy » Tue Sep 22, 2015 6:54 am

Project Update - Kernel Scheduler Module - Executable Loader

Have been trawling the Unisys 2200 user groups for help on how to get hold of:
  • Test 'Unisys 2200 mainframe' executables.
  • Documentation on their structure so that I can write a loader.
Have been guided to a site from where I can download public domain 2200 executables as tape files.

I've also been advised that executables live inside a structure known as an absolute element, which itself lives inside a program file, which for distribution and archiving purposes is written to a mechanical tape ( or to a tape file today).

There are multiple tape file formats, but I have been guided to the technical manuals that document the formats and the data structure in these tape files. So just working through these. Don't foresee any problems based on what I've read so far. Just a bit of a steep learning curve.

--
dms1guy
User avatar
dms1guy
 
Posts: 21
Joined: Thu Sep 10, 2015 9:05 pm
Location: Isle of Man


Return to Epiphany Operating System

Who is online

Users browsing this forum: No registered users and 0 guests

cron