I'm thinking about writing a neural spike code, which would use spatial distribution of neurons/synapses over processor nodes.
I understand this interconnect is both high-throughput and low-latency. Is there any point in collecting remote writes to a specific node, and processing them in a batch? Or is this just atomic memory writes with little overhead so that one would live with just implementing this naively.