In a broad outline, the regular CPython interpreter will run on the host, and it will launch work kernels on the cores. The kernels will be written in a restricted subset of Python and will target the PyMite VM. It's a similar model to OpenCL, except the kernels will be written in Python.

Now how should it look in more detail?

It would be helpful to construct some examples and computational kernels - and I would appreciate some input from others on the forum. The idea is to write the algorithms/kernels in some sort of ideal syntax, and then see how well that can be approximated in Python.

Feel free to borrow from multiprocessing, OpenCL, Coarray Fortran, and any other parallel language/library.

As an example, here's an extremely simple case that sums a list:

- Code: Select all
`def kernel_sum(arg):`

s = 0.0

for a in arg:

s += a

return s

arg1 = [1,2,3,4,5,6,7,8]

# split the array 'arg1' across 4 instances of 'kernel_sum'

ret = launch_kernel(kernel_sum, arg1, n=4, reduction=sum)

# The previous line could be a convenience function for the following sequence:

# k = create_kernel(kernel_sum, n=4, reduction=sum)

# k.run(arg1)

# ret = k.wait()

Questions to consider:

1. How should the input distinguish between arrays to be split, and arrays/scalars that should be the same on each core?

2. What synchronization elements are needed?

3. How should the shared SDRAM segment be used/accessed? (From reading other forum posts, it's in a separate memory space from the host OS - can't simply share a host data structure)

4. What notation and what data structures should be used for cross core memory accesses?

5. Other?