Some of those thoughts I didn’t get to last night.
First of all I should say that the threading in .NET rocks in my opinion. It’s fast, easy to program for, and very complete for a managed platform. The inclusion of intra-process mutexen is especially nice. So, here’s some code for utilizing some of the functionality I showed designs for yesterday.
PrimeSieveContext ctx = new PrimeSieveContext(1000000);
WorkDispatcher<long, PrimeSieveContext> facade = new WorkDispatcher<long, PrimeSieveContext>(3,ctx);
PrimeAdderWorkUnit wu0 = new PrimeAdderWorkUnit(1, 500000);
PrimeAdderWorkUnit wu1 = new PrimeAdderWorkUnit(500001, 750000);
PrimeAdderWorkUnit wu2 = new PrimeAdderWorkUnit(750001, 1000000);
facade.AddWorker(wu0);
facade.AddWorker(wu1);
facade.AddWorker(wu2);
<snip/>
long result = facade.Execute();
So when I said there’s a lot of room for improvement, the first item I’d like to look into is setting it up so that the WorkDispatcher decides for itself how many work units to split the problem domain into. That is somewhat at odds with the first improvement I made. Here’s a bit of the WorkDispatcher Execute method:
if (_maxThreads > 1)
{
//Start beyond the 1st one, spawn threads, wait until they are all done
for (int i = 1; i < _maxThreads; ++i)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(_workers[i].ExecuteAsync), _ctx);
}
TResultType rslt = _workers[0].Execute(_ctx);
WorkerSharedContext<TResultType> finalCtx = _ctx as WorkerSharedContext<TResultType>;
for (int i = 1; i < _maxThreads; ++i)
{
_workers[i].CompleteHandle.WaitOne();
finalCtx.AddResultComponent(_workers[i].Result);
}
return finalCtx.GetContextualResult();
}
First of all, I love the Generics in .NET 2 but several times I’ve ran across a situation where I wish one of the Template types could also be specified as implementing an interface or extending a base class. I suppose I need to refactor the design here. At any rate I noticed depending on which chunk of work I placed in the _workers[0] spot that I could keep both my CPUs maxed for longer and finish the work a few seconds sooner. Ideally I should just give the WorkDispatcher hints about this rather than ordering the chunks myself, so I added the following:
/// <summary>
/// May cause the WorkDispatcher to re-order upon Setting this value. The default value is FIFO
/// </summary>
public WorkDispatcherScheduleTypes ScheduleType
{
get { return _scheduleType; }
set
{
_scheduleType = value;
if (_scheduleType.Equals(WorkDispatcherScheduleTypes.OrderSensitive))
{
_workers.Sort(new WorkUnitOrderComparison<TResultType, TSharedContext>());
}
else if (_scheduleType.Equals(WorkDispatcherScheduleTypes.WeightSensitive))
{
_workers.Sort(new WorkUnitWeightComparison<TResultType, TSharedContext>());
_workers.Reverse();
}
The WorkUnits now have assignable Weight and Order properties, no rocket science here. Looking at the threading code in the previous figure we see where I am not yet meeting my stated goal of keeping the CPUs maxed as much as possible to complete the work using all the power available. In this example with 3 WorkUnits and two physical processors the ideal would be to run two in parallel with the 3rd one being assigned to a physical processor as soon as one becomes available. I read recently that this technique is being used with success in optimizing games for the PS3. Neither ThreadPool.QueueUserWorkItem nor the framework I have here so far support any sort of event for saying “I’m done” so the WorkDispatcher can put the next WorkUnit on a physical CPU so I’ve got some more code to write.
Lastly, the PrimeAdderWorkUnit class is currently responsible for properly locking the pieces of the SharedContext it might be working with at any time. It would be nice to create a mechanism for doing this automatically if such a means can be found in .NET. It would be oh so nice to be able to overload the “.” or “->” operator. I’ll revisit this when I can.