So the thing about
fibers on Windows, that I discovered a few months ago, is that they're broken now.
"Now", in this instance, means since the release of Visual Studio 8.0 (that's Visual Studio 2005 for those of you playing along at home), on at least Windows XP; and in order to describe why, I'm going to have to go sideways a bit.
Activation contexts are — well, let me trim it down quite a bit and say that activation contexts are a solution to a problem I don't really have, at least in this context (pun not intended). The key points to keep in mind are that (a) they must be deactivated in reverse order of activation, the sequence being justly called the
activation context stack, and (b) they are (by default) automatically activated on calling a function which is a DLL entry point, and automatically deactivated on that function's return (or exceptional return (I hope)).
This should be enough for some of you to see the potential problems already, so let me actualize that potential by saying (c) activating a context returns a cookie, and you use that cookie to deactivate it, and (d) in Windows XP, the activation context stack is per-
thread, and
not per-fiber.
So let's say, "hypothetically", that I have this application that has code in multiple DLLs, some of which are (at least potentially) user-designed plugins. And let's say that it uses a small pool of fibers to let function execution "block" on calls to "modal" tasks, while still allowing the main modal loop to handle Windows events on the main thread. And let's say that this application passes around and invokes callbacks like candy on Halloween.
And let's say I'm up in the application main loop in executable A, and from there it calls DLL B, which calls DLL C, which executable A has already given pointers to static functions in A, as well as pointers to objects declared and constructed in the A-loaded plugin-DLL Z. And let's say DLL C calls Z, as it should, which calls a function in B which constructs a chooser-dialog and "blocks", yielding control back to a waiting fiber near the main loop. (The fiber will be unblocked and switched to when the user clicks 'OK'; this will cause the function in B to return the chosen value.) For those of you playing along at home, that's A -> B-> C (-> A?) -> Z -> B, so we're now in B's activation context. Well, that's probably okay; all these activation contexts are likely to be identical anyway. A's, B's, and C's certainly are.
So let's say we run through another ten thousand iterations of the main loop, and then the user does something else that causes a chooser-dialog to be constructed. (First one's still up, remember.) So that adds another half-a-dozen activation contexts onto the activation context stack. And let's say that the user clicks OK on the
first chooser-dialog.
At this point (possibly immediately, possibly when the OK-click-event-handling-fiber makes it back up to the main loop), control switches over to the suspended fiber, which tries to return... and promptly goes undefined, because it immediately tries to deactivate activation contexts that aren't anywhere near the top of the stack.
Keep in mind that this code was
correct under VS 7.1. And still is, on Windows Server 2003, where the activation context stack
is per-fiber
1.
So what are we to do?
We could turn off automatic activation context creation on DLL boundaries, and create a class that does it for us, but also registers itself with the fiber framework so that the context switch can tell those classes to deactivate and reactivate their relevant contexts as needed. But this means adding instances of this class to hundreds of entry points of a DLL, and silent failures if we miss one; onerous requirements on client plugins; and an inability to use any other external DLLs that take callbacks. Possibly including Windows DLLs.
We could detect what OS we're using, and try to mess with the
relevant entry in the
Thread Environment Block. The key word here doubtless being "try".
Or we could pull the fiber-switch code out, manually make our own code modeless, deprecate the old modeless calls, and (for the next release) degrade those function calls from modelessness to modality.
Sigh.
1 According to this MSDN blog post, which is the only documentation of this fact that I've ever been able to find. Notably, I have no idea one way or the other concerning Vista or Server 2008.