Massively Parallel Operating Systems

Massively Parallel Operating Systems

Postby rabii » Mon Jan 05, 2015 4:31 pm

Hi,

I used to work at the Open Software Foundation and we developed an OS for large multi-node machines. The name was OSF1/AD and its initial target machine was the Intel Paragon. At the time the resource requirement for running a full OSF1/AD on each node was rather high due to memory and other needs so a 1000 node cluster had issues. However at this time we should be able to run such a cluster on a set of Parallella boards. We had all kinds of cool things like:

    Dedicated compute vs. file servers.
    Remove process creation
    Single system image
    A distrubuted file system
    A very powerful messaging based on mach micro-kernel
    Remote paging

We were also able to run "just" the mach micro-kernel on a node without the need for the rest of the OS. That way you could use mach remote task creation to put the node to use. I wonder if anyone has though of applying that technology here. Please post your thoughts and lets have a discussion.
User avatar
rabii
 
Posts: 7
Joined: Sun Jan 04, 2015 2:03 pm
Location: Wayland MA

Re: Massively Parallel Operating Systems

Postby aolofsson » Mon Jan 05, 2015 5:43 pm

Thanks for starting the discussion!

Do you have some links to the material on the OSF1/AD so that we can get up to speed?

An initial search:
http://en.wikipedia.org/wiki/Intel_Paragon
http://en.wikipedia.org/wiki/SUNMOS
http://en.wikipedia.org/wiki/Lightweigh ... ing_System

It would certainly be very interesting if we we could run a cut down version of it on every Epiphany node!

Andreas
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Re: Massively Parallel Operating Systems

Postby 9600 » Mon Jan 05, 2015 5:52 pm

OSF1/AD sounds extremely cool and I'm sure that many folks would be excited by the idea of running something like it on a Parallella cluster.

I'm vaguely familiar with the OSF, as in my first job we had VAX and later on Alpha systems, and of course the Alpha UNIX, Tru64, was called OSF/1 initially. I also remember reading about OSF DCE with much interest, although it seems that it didn't work out for whatever reason.

Out of interest, what has now happened to OSF1/AD, have the sources simply been lost to time, or perhaps parts of it live on in commercial products?

Regards,

Andrew
Andrew Back (a.k.a. 9600 / carrierdetect)
User avatar
9600
 
Posts: 997
Joined: Mon Dec 17, 2012 3:25 am

Re: Massively Parallel Operating Systems

Postby rabii » Mon Jan 05, 2015 7:55 pm

HI,

I will see if I can dig up some of the papers. I was one of the original development team members at OSF for OSF1/AD. Here is a summary:

OSF1 was basically a UNIX base OS.
OSF1 was broken into a "mach" based micro-kernel and a personality server which implemented UNIX.
OSF1/AD was a distributed version of OSF1 which used an enhanced version of mach to implemented a distributed single image UNIX for massively parallel machines. It was meant to work for Intel Paragon but we had it working on clusters of regular HP Vectras; also Intel-iPSC hypercube and then Hitachi took and ported it to some MPP of their own design.

I have all the main papers. There were three versions:

OSF1/AD - v1 : this had a proprietory contribution from Locus Computing Co which provided distributed process management and some other stuff.
OSF1/AD - v2 : this one had an OSF implemented partially distributed process management designed and implemented by yours truly and another gentleman Simon Patience.
OSF1/AD - v3 : this one had fully distributed process management designed and implemented by yours truly and a third person.

I am not sure how to get the code but if you want I am sure it can be dug up. However it was mach based and the distributed micro-kernel had some major issues on two points:

1) XMM - the distributed shared memory.
2) Remote Task Creation
User avatar
rabii
 
Posts: 7
Joined: Sun Jan 04, 2015 2:03 pm
Location: Wayland MA

Re: Massively Parallel Operating Systems

Postby rabii » Mon Jan 05, 2015 9:18 pm

The original paper is on Usenix: An OSF/1 Unix for Massively Parallel Multicomputers

I have a pdf version myself along with all the other papers but if you want to download the original paper then this is the right one.
User avatar
rabii
 
Posts: 7
Joined: Sun Jan 04, 2015 2:03 pm
Location: Wayland MA

Re: Massively Parallel Operating Systems

Postby jlambrecht » Thu Mar 26, 2015 3:30 pm

Being a microkernel based distro, would Debian GNU/Hurd be usefull as a building block ?

http://www.debian.org/ports/hurd/
( ieks, no news since 2013 )
http://www.gnu.org/software/hurd/

http://www.gnu.org/software/hurd/hurd/r ... strib.html
jlambrecht
 
Posts: 41
Joined: Wed Nov 13, 2013 7:57 pm

Re: Massively Parallel Operating Systems

Postby 9600 » Thu Mar 26, 2015 5:10 pm

I guess Hurd could be interesting if you had tasks for which you could write servers which target the Epiphany. I shouldn't think any of the existing servers would be a good fit, although I could be wrong.

Given that Hurd never really gained traction, development seems to have stalled and there are more current/used approaches for distributing workloads, it's probably not a great bet. Which is not to say that it's not a cool idea :)

Cheers,

Andrew
Andrew Back (a.k.a. 9600 / carrierdetect)
User avatar
9600
 
Posts: 997
Joined: Mon Dec 17, 2012 3:25 am

Re: Massively Parallel Operating Systems

Postby Melkhior » Fri Mar 27, 2015 10:56 am

jlambrecht wrote:Being a microkernel based distro, would Debian GNU/Hurd be usefull as a building block ?


There once was MkLinux, which was linux on top of the Mach microkernel... I ran that on a PowerMac way back. Seems to be lost to the sands of time though.
Melkhior
 
Posts: 39
Joined: Sat Nov 08, 2014 12:19 pm

Re: Massively Parallel Operating Systems

Postby jlambrecht » Tue Mar 31, 2015 12:45 pm

Well, yeah, cool idea's are cheap :-)

Evolving Hurd could permit for a native epiphany OS :-) Which would be like extremely cool since it could run native Debian packages if a port team was to be assembled.
jlambrecht
 
Posts: 41
Joined: Wed Nov 13, 2013 7:57 pm

Re: Massively Parallel Operating Systems

Postby carmonacr » Fri May 29, 2015 11:35 pm

Hello guys, I'm sorry but i am a real noob and this is just a data dump. I truly don't understand any of this but am hoping it will really revive this promising thread, even if it's for a laugh.

I researched a bit on the OSF1/AD: "Abstract. On the Paragon, two operating systems are available: OSF/1 AD, and SUNMOS. The chief drawbacks of OSF/1 AD are that..."
Don't know if that article is current with the OP's experience since it is '94.
http://archive.org/stream/nasa_techdoc_ ... 9_djvu.txt
...>
Which ultimately led me to Kitten lightweight kernel (LWK)/Palacios or Linux/Palacios.
https://software.sandia.gov/trac/kitten/wiki/WikiStart
Kitten LWK:"Kitten distinguishes itself from these prior LWKs by providing a Linux-compatible user environment, a more modern and extendable codebase, and a virtual machine monitor capability via Palacios that allows full-featured guest operating systems to be loaded on-demand. For a more detailed introduction to Kitten, please see this presentation:"
Highlight Features
Open Source (GPL)
New LWK codebase partially derived from Linux, familiar organization and build process
Linux user-space ABI support (partial, similar to IBM's CNK)
Guest OS support (via V3VEE project's Palacios hypervisor)
Uses standard GNU toolchains and system libraries such as Glibc
Multiple processes and threads per core, uses standard Glibc NPTL POSIX Threads implementation
SMARTMAP address-space to address-space mapping support
Qthreads API support
https://software-login.sandia.gov/~ktpe ... erview.pdf
http://www.sandia.gov/~ktpedre/slides/p ... _ics09.pdf
GitHub:https://github.com/ktpedre/kitten
"from github README"
Supported Host (Build) Platforms
================================
The Kitten kernel and user applications are compiled on a standard
x86_64 (64-bit only, no 32-bit support) Linux host. The following
distributions have been verified to work:

* Fedora Core 15
Must install glibc-static and syslinux packages
Known Issue: The "mktemp is dangerous" link warnings can be ignored
* RedHat Enterprise Linux 6 (RHEL 6)
Must install glibc-static, syslinux, and syslinux-devel packages
Known Issue: The "mktemp is dangerous" link warnings can be ignored
* Ubuntu 10.10
* RedHat Enterprise Linux 5 (RHEL 5)"
Palacios:"Instructions for Building with Palacios VMM Support
===================================================
Palacios is a virtual machine monitor (VMM) being developed by the V3VEE project (http://v3vee.org). Palacios is distributed and built separately from Kitten, but can be linked with Kitten as part of the normal Kitten build process. The Kitten+Palacios combination allows full guest operating system images to be launched and managed similarly to native Kitten tasks.

Installation Steps:

1. Download the latest Kitten and Palacios releases:
http://software.sandia.gov/trac/kitten
http://www.v3vee.org/download
Supported Target (Execution) Platforms
======================================
The Kitten kernel should boot on any x86_64 PC-compatible system.
By default, console output is to both the VGA device and COM1 serial port.
The following platforms have been verified to work:

Emulators:
* qemu-system-x86_64
* kvm (running on a 64-bit x86 system)
* virtualbox (with a bit of configuration, be sure to enable IO APIC)

Real Hardware:
* HP ProLiant BL460c G6 BladeSystem with dual-socket quad-core
Intel Xeon X5570, 24 GB RAM, no disk
* HP ProLiant BL465c G7 BladeSystem with dual-socket 12-core
AMD Opteron 6172, 32 GB RAM, no disk
* Cray XT4 compute nodes with single-socket quad-core AMD Opteron 1354,
8 GB RAM, SeaStar 2.1 network interface, no disk
....

Kitten/Palacios:
http://prod.sandia.gov/techlib/access-c ... 106232.pdf
http://www.v3vee.org/palacios/palacios-1.3-tr.pdf
Palacios is a virtual machine monitor (VMM) from the V3VEE Project that is available for public use as a community resource. Palacios is highly configurable and designed to be embeddable into different host operating systems, such as Linux and the Kitten lightweight kernel. Palacios is a non-paravirtualized VMM that makes extensive use of the virtualization extensions in modern Intel and AMD x86 processors. Palacios is a compact codebase, consisting of ~96, 000 lines of C and assembly of which
~40, 000 are in the core VMM, and the remainder are in virtual devices, extensions, an overlay network, and other optional features. Palacios is designed to be easy to understand and readily configurable for different environments. It is unique in being designed to be embeddable into other OSes instead of being implemented in the context of a specific OS.
"Get Involved
We are continuously looking for people to become engaged in this project. There are numerous ways to do so:
We are looking for graduate students at both Northwestern University and the University of New Mexico.
We have independent study and paid REU opportunities for undergraduate students at Northwestern University and the University of New Mexico.
This is an open source community development project and we encourage involvement by the broader community. "

***Booting Palacios/Kitten and Palacios/Linux Over the Network Using PXE****
http://www.v3vee.org/palacios/pxe-manual.pdf

SANDIA Kitten Kernel:"Downloading Open-Source Software Computations, Computers and Mathematics Center"
http://www.cs.sandia.gov/web1400/1400_download.html

This last page is a doozy because they have actual programs for download that work on kitten, open source...
ACRO: A Common Repository for Optimizers (LGPL)
App_model: Application simulator (GLP)
CANARY: Water Quality Event Detection Tool (LGPL)
Chaco Graph Partitioning Software (LGPL)
ChISELS feature-scale CVD and plasma etch topography modeler (LGPL)
CIT Cluster Integration Toolkit (LGPL)
CognitiveFoundry Machine Learning and Intelligent Systems Library (BSD)
Coliny Library of COLIN Optimizers (LGPL)
CPA Compute Processor Allocator (LGPL)
ESC Conference Management tool for organizing conferences (GPL)
Facetbool Library for performing facetted booleans (LGPL)
Kitten Lightweight Kernel (GPL)
LOCA Library of Continuation Algorithms (LGPL)
PEBBL Library for Parallel Branch-and-Bound (LGPL)
Mesquite Mesh-Quality Improvement Library (LGPL)
OpenCatamount Lightweight Compute Node Operating System (GLP)
Showmesh utility of nodes in physical x,y,z representation (LGPL)
Sisyphus Toolkit for Informatic Event Log Analysis (LGPL)
SMB: Sandia MPI Micro-Benchmarks (LGPL)
Surfpack Library of Function Aproximation Methods (GPL)
UTILIB C++ Utility Library (LGPL)
Zoltan Dynamic Data Management and Load Balancing Library (LGPL)

And then there's this at the bottom:
The Trilinos Project is an effort to develop algorithms and enabling technologies within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems. A unique design feature of Trilinos is its focus on packages.
Trilinos 12.0 is now available for download. http://trilinos.org/
carmonacr
 
Posts: 2
Joined: Sun May 10, 2015 4:40 am


Return to Clustering

Who is online

Users browsing this forum: No registered users and 1 guest

cron