You wouldn't need an extra thread to perform the allocation; with Brinch Hansen's method the allocate and release functions would be tacked on the beginning and end of a function definition replacing (with XS1 instructions) ENTSP and RETSP.The Per Brinch Hansen is interesting and would be great for building virtual threads, however in this case using C for example it would require its own thread for allocation/deallocation.
It's also interesting what he observes about reuse of same-sized frames - you could envisage a more extreme system where all stack frames were allocated in fixed-size blocks to reduce the allocation overhead almost entirely.