Two-Parallella Cluster using eLink and Procupine

Two-Parallella Cluster using eLink and Procupine

Postby claudio4parallella » Mon Sep 25, 2017 2:10 pm

Hi all,

just to get in touch with everybody interested of the present topic.

I've realized a MPMD set of programs in order to manage on a single Parallella Board the 16 Cores, in a one by one way with input data and output data and synchronizing, starting and stopping all of them individually.
The same model of MPMD I've realized using two Parallella in Cluster using eLink and Porcupine boards via flat cable.
I can individually start, stop, get input and get output to and from each one of the 32 Cores.

The difference is :

- using a single Parallella Board, it's the Host program that may and must manage the 16 Cores directly, having direct access to all the RAMs, inside Epiphany and so within each Core range of memory address, too ;

- using a two Parallella Cluster (Master-Slave), it's one of the local Core's program of the Master Parallella that is in charge as Master-Core of driving all the 32 Cores (16 of them local and 16 of them on the eLinked board) having direct access to both the Epiphany Cores and RAM absolute address (0x908xxxxxxx and 0x808xxxxxxxxx).
The Host program on the Master Parallella could not have access to the Epiphany/Cores of the eLinked Parallella.
Only the Loader could have access to load the individual .elf (or .srec) executable program to the 32 Cores, even if on the Slave the set-up and running of the Epiphany must be ready to receive the Master commands. (I'm using the Interrupts-connected example of Parallel System Lab - Aviv Burshtein).

I'm available to show codes and examples of both my experiments, if necessary.

Thanks for any comment or different considerations from what I've understood.

My best to all of you
claudio4parallella
 
Posts: 55
Joined: Thu Aug 10, 2017 3:48 pm

Re: Two-Parallella Cluster using eLink and Procupine

Postby jar » Mon Sep 25, 2017 2:50 pm

A few people have struggled through what you have done over the past couple years. It would benefit the community to document the steps you took to configure the boards. My understanding is that you had to use the 2015.1 image?

One feature of COPRTHR-2 is having Epiphany core #0 initiating the work across all other epiphany cores. It actually launches programs in logarithmic time as the configured cores then configure and launch on the remaining cores.
User avatar
jar
 
Posts: 289
Joined: Mon Dec 17, 2012 3:27 am

Re: Two-Parallella Cluster using eLink and Procupine

Postby claudio4parallella » Mon Sep 25, 2017 3:04 pm

jar wrote:A few people have struggled through what you have done over the past couple years. It would benefit the community to document the steps you took to configure the boards. My understanding is that you had to use the 2015.1 image?

Yes, I'm using esdk.2015.1 image, headless, 2 desktop Parallella with Porcupine.
The example by Parallel Lab has its own hal.c instead of using the library and I am studying how it works, even if at higher level I've set up a driver for the 32 cores.
I'm in trouble porting to esdk.2016.1 but I'll do it.
One feature of COPRTHR-2 is having Epiphany core #0 initiating the work across all other epiphany cores. It actually launches programs in logarithmic time as the configured cores then configure and launch on the remaining cores.

Very interesting, I've abandoned attempts using CPRTHR2, but your suggestion give me a new task.
I'll provide showing my code and how to.
claudio4parallella
 
Posts: 55
Joined: Thu Aug 10, 2017 3:48 pm

Re: Two-Parallella Cluster using eLink and Procupine

Postby jar » Mon Sep 25, 2017 3:41 pm

You can always replicate logarithmic time launcher in COPRTHR-2 if you want. It was documented in the paper "Advances in Run-Time Performance and Interoperability for the Adapteva Epiphany Coprocessor": https://arxiv.org/abs/1604.04207

COPRTHR-2 almost certainly won't do it automatically for your dual-Parallella setup. There are hard-coded base addresses as it's configured for the single E3 on a Parallella.
User avatar
jar
 
Posts: 289
Joined: Mon Dec 17, 2012 3:27 am

Re: Two-Parallella Cluster using eLink and Porcupine

Postby claudio4parallella » Tue Sep 26, 2017 10:01 am

Let me share my experience with the two Parallella system image using eLink interconnection.

HARDWARE
1. I ordered 2 "Adapteva P1601-DK02 single board parallella Epiphany III Computer Desktop"
In Europe here (today) : http://www.ebay.it/itm/Adapteva-P1601-DK02-Single-Board-Parallella-Epiphany-III-Desktop-Computer-/152712182517?hash=item238e5afaf5:g:b-0AAOSwyNNZkaYW
In USA here: https://www.digikey.it/product-detail/it/adapteva-inc/P1601-DK02/1554-1001-ND/5018662
2. I ordered 2 "PARALLELLA-16 PORCUPINE BOARD"
In USA only: https://www.digikey.it/product-detail/it/adapteva-inc/ACC1600-01/1554-1003-ND/5048176
3. I ordered 1 "Parallella eLink Cable Pair" (the cost and work without tools for DIY is quite similar)
In Europe: https://groundelectronics.com/products/parallella-elink-cable-pair
4. I've mounted each Parallella with heat sink (one of them has the support for the "PARALLELLA ENCLOSURE ALUMINUM"

THEORY
1. I studied epiphany-examples and parallella-examples
2. I studied Adapteva Reference HW and System manuals
3. I downloaded and studied this eLink set up: https://github.com/Parallel-Systems-Lab/parallella-cluster/tree/master/interrupts_connected by Parallel System Lab

SYSTEM IMAGE
1. I Ordered 2 32 Gb SD Cards
2. I downloaded the "pubuntu-14.04-esdk2015.1-20150130" from here https://github.com/parallella/parabuntu/releases/tag/pubuntu-14.04-esdk.2015.1-20150130
following the exmpalanations of the document https://github.com/Parallel-Systems-Lab/parallella-cluster/blob/master/interrupts_connected/Set%20Up%20Your%20Epiphany%20Cluster.pdf

STATIC IP
1. I've set up the static IP on both the Parallella boards

FOLLOWING INSTRUCTIONS

-> SLAVE PARALLELLA
1. I've Hardwired the core-id reconfiguring the PEC_POWER using a female-female cable, so the SLAVE Epiphany address is (36,8) hex 908 instead of native (32,8) hex 808
2. I've modified the /opt/adapteva/esdk/bsps/parallella_E16G3_1GB/parallella_E16G3_1GB.hdf
as follows
Code: Select all
// Parallella/1GB/E16G3
PLATFORM_VERSION   PARALLELLA1601
ESYS_REGS_BASE         0x70000000

NUM_CHIPS                       1
CHIP                      E16G301
CHIP_ROW                       36
CHIP_COL                        8

NUM_EXT_MEMS                    1
EMEM                     ext-DRAM
EMEM_BASE_ADDRESS      0x3e000000
EMEM_EPI_BASE          0x8e000000
EMEM_SIZE              0x02000000
EMEM_TYPE                    RDWR


-> MASTER PARALLELLA
1. I've modified the /opt/adapteva/esdk/bsps/parallella_E16G3_1GB/parallella_E16G3_1GB.hdf
as follows
Code: Select all
// Parallella/1GB/E16G3
PLATFORM_VERSION   PARALLELLA1601
ESYS_REGS_BASE         0x70000000

NUM_CHIPS                       2
CHIP                      E16G301
CHIP_ROW                       32
CHIP_COL                        8
CHIP                      E16G301
CHIP_ROW                       36
CHIP_COL                        8

NUM_EXT_MEMS                    1
EMEM                     ext-DRAM
EMEM_BASE_ADDRESS      0x3e000000
EMEM_EPI_BASE          0x8e000000
EMEM_SIZE              0x02000000
EMEM_TYPE                    RDWR


SINGLE UNIT TEST
1. I've tested SLAVE Parallela with /epiphany-examples/apps/hello-world with 908 (36,8) and it did work
2. I've tested MASTER Parallela with /epiphany-examples/apps/hello-world with 808 (32,8) and it did work

eLINK
1. I've connected with flat cables the NORTH TX RX of SLAVE Parallella PORCUPINE to SOUTH RX TX of MASTER Parallella PORCUPINE

ASSEMBLY
1. I've assembled down-down the two Procupine boards as shown in the picture
2. I've mounted the Parallella boards to each of the Porcupine boards MASTER Parallela-MASTER Porcupine SLAVE Parallella-SLAVE Porcupine with hardwired connection

POWER ON
1. I've connected each Parallella "eth0" to my router
2. I've powered on two 5v 40x40mm fan on top of each heat sink
3. I've connected each Parallella to Power

SYSTEM TEST
1. I've logged in using ssh from my MacBook Pro
2. I've saved at each Parallella board the "Interrupts_connected" software from here https://github.com/Parallel-Systems-Lab/parallella-cluster/tree/master/interrupts_connected
3. I've executed /Interruputs_connected/slave/build.sh on the SLAVE software on SLAVE Parallella
4. I've executed /Interruputs_connected/master/build.sh on the MASTER software on MASTER Parallella
5. I've executed the /Interruputs_connected/slave/run.sh on the SLAVE Parallella at first
6. I've executed the /Interruputs_connected/master/run.sh on the MASTER Parallella immediately after, when SLAVE was running

SERIAL INTERCONNECTION
Purpose:
I'd like to use only one Network cable eth0 for the two Parallellas. I connect to the network only MASTER Parallella
Limitation:
I cannot use OpenMPI or MPICH for Network Parallel Clustering. If I want to user Parallel Host running I need the two Network cables connected to one Router.
1. I've connected GND-GND RX-TX TX-RX between the 2 Parallellas Serial Pins
2. I've modified the SLAVE devicetree in order to enable ttyPS0 console at rate 115200: let modify BOOT SD Card devicetree.dts Uncompiling and Ricompiling the file
Code: Select all
   chosen {
      bootargs = "console=ttyPS0,115200 root=/dev/mmcblk0p2 rw earlyprintk rootfstype=ext4 rootwait";
      linux,stdout-path = "/amba@0/serial@e0001000";
   };


IT DID WORKED

MY SOFTWARE
1. I've copied the /Interruputs_connected/slave/ into my development folder of my SLAVE Parallella
2. I've copied the /Interruputs_connected/master/ into my development folder of my MASTER Parallella
3. I've modified the original software and realized a simple SLAVE
4. I've modified the original software and realized my MASTER

THE MASTER PROGRAM:
The master program has 1 HOST program (master.c) and 32 DEVICE programs (e-masterxx.c where xx=00, 01, 02, .., 32)
The HOST master.c program do:
- local epiphany init, open
- load the 32 devices to the 32 cores
- reading and showing on the display the output of the 16 local cores
- reading and showing on the display from 16 local memory address the output sent my the 16 external (SLAVE) cores
- showing the status of each of the 32 cores using my own flags into a memory address

THE 32 DEVICE PROGRAMS ON 32 CORES :
- The device programs are arranged with a SETUP and a LOOP (as Arduino model)
- Each device setup program is clearing output and status to "on-work" addressing this values to the MASTER Memory location so the MASTER program could read these values
- Each device loop program is running its own task producing the output. When finished it puts itself into Idle State and send output and status=completed to the MASTER Memory location so the MASTER program could read these values

FURTHER TEST:
One by one from Core (0,0) each Device Loop Program count to 1000000 and then stops starting the next Core (0,1) up to the Core(7,3), 32 in total and the MASTER Loop program follow the progress of counts and of Core successive activations.

NEXT STEPS:
- Optimization, Corrections
- Matmul version of big matrix over all the 32 Cores
- Power on / off of only one Parallela Board using Parallella-Parallella power connection
- Using openMPI or MPICH to run from MASTER Parallela both the master and the slave programs synchronising them, forgetting to manage the SLAVE individually
- An Aluminium Box with heat sink capability

OPEN ISSUES
- I've not yet well understood the way Interrupts_connected is working using interrupts to govern the two Epiphany from one host. It's on the way.

MY SYSTEM
Parallela-eLink.jpeg
eLink two Parallela Image System
Parallela-eLink.jpeg (135 KiB) Viewed 2213 times
claudio4parallella
 
Posts: 55
Joined: Thu Aug 10, 2017 3:48 pm

Re: Two-Parallella Cluster using eLink and Procupine

Postby jar » Tue Sep 26, 2017 2:53 pm

This is a high quality writeup. Good job!
User avatar
jar
 
Posts: 289
Joined: Mon Dec 17, 2012 3:27 am

Re: Two-Parallella Cluster using eLink and Procupine

Postby claudio4parallella » Wed Sep 27, 2017 6:25 pm

UPDATE

- I've closed my open issue about interrupt, now tests work properly and I can manage them

- About my 2Parallells eLink cluster, if I have well understood: the SLAVE Parallela must run its Host and its Device core program and without this running the MASTER Parallella could not load the tasks to the SLAVE Epiphany cores..... (hope my English is clear...). No way to upload the device tasks to the SLAVE Epiphany from the MASTER Host....

Thanks for comments
claudio4parallella
 
Posts: 55
Joined: Thu Aug 10, 2017 3:48 pm

Re: Two-Parallella Cluster using eLink and Procupine

Postby claudio4parallella » Thu Sep 28, 2017 7:53 pm

Here are my terminal screen of the eLink 2-Parallellas/Porcupine

SLAVE showing in a loop the local 16 Cores and Registers



MASTER showing in a loop the 32 Cores output, the total of output, the core-done counters



The 32 Cores are activated in sequence: core00 when reach 10000000 activate core01 and so on...... up to core73
Attachments
Master.png
Master Parallella shows the 32 Cores
Master.png (112.06 KiB) Viewed 2165 times
Slave.png
Slave Parallella shows only local Cores
Slave.png (93.83 KiB) Viewed 2165 times
claudio4parallella
 
Posts: 55
Joined: Thu Aug 10, 2017 3:48 pm

Re: Two-Parallella Cluster using eLink and Procupine

Postby claudio4parallella » Fri Oct 13, 2017 9:43 am

Proof of concept results of two eLinked Parallellas in cluster (Master-Slave)

Purpose: realize synchronization between multicores in eLink cluster with individual tasks

Lesson learned:
- core 0 of MASTER parallella must be used to read/write/synchronize all the cores in Master Epiphany and Slave Epiphany
- internal.ldf works ok, not other choice

Attachments of resulting images:
1. example of a single Parallella: the 16 cores individually are drawing a 1024 step of Y=sin(alpha) [0..2Pi]
2. example of a dual eLinked Parallellas: the 31 cores individually are drawing a 1024 step of Y=sin(2*alpha) [0..2Pi]

Next step: 4 eLinked Parallellas and 64 cores
Attachments
core31.png
2 Parallellas 32 cores
core31.png (40.58 KiB) Viewed 2102 times
core16.png
One Parallella 16 cores
core16.png (17.95 KiB) Viewed 2102 times
claudio4parallella
 
Posts: 55
Joined: Thu Aug 10, 2017 3:48 pm

Re: Two-Parallella Cluster using eLink and Procupine

Postby claudio4parallella » Fri Nov 03, 2017 9:20 pm

Wow! I found a Zynk7020 Parallella !
In the meantime the new board will arrive....
and my eLink cluster will become a full 64 cores machine
here attached is the 48 cores single kernel-by-core Proof of Concept
Attachments
png.png
png.png (48.59 KiB) Viewed 1846 times
claudio4parallella
 
Posts: 55
Joined: Thu Aug 10, 2017 3:48 pm

Next

Return to Clustering

Who is online

Users browsing this forum: No registered users and 2 guests

cron