Here's the abstract:
You can view the paper on the arXiv: http://arxiv.org/abs/1105.3843The provision of mechanisms for processor allocation in current distributed parallel programming models is very limited. This makes difficult, or even prohibits, the expression of a large class of programs which require a run-time assessment of their required resources. This includes programs whose structure is irregular, composite or unbounded. Efficient allocation of processors requires a process creation mechanism able to initiate and terminate remote computations quickly. This paper presents the design, demonstration and analysis of an explicit mechanism to do this, implemented on the XMOS XS1 architecture, as a foundation for a more dynamic scheme. It shows that process creation can be made efficient so that it incurs only a fractional overhead of the total runtime and that it can be combined naturally with recursion to enable rapid distribution of computations over a system.
Although I don't really mention it in the paper, all of the results are obtained from my 'sire' implementation which is an XCore project on Github (https://github.com/xcore/tool_sire). I've made a load of changes since then so it's now not in a particularly functional state, but you can check out the 'v0.1' tag to get a version which reflects the described functionality.
Comments, questions or criticisms welcome!