New Office

by Administrator 28. January 2008 16:47

CarSpot has just expanded onto the 5th floor of the building above the Ale House and behold, I get an new office.  I think this is one of the nicer ones; I think I managed to get this one because no one else was willing to sit next door to the President.

I feel like I need to put some UML diagrams up in order to make it feel cozy.  We got this space from a creative company who was up here previously.  We threw out all the iMacs but kept the general decor.  There's an Agry Coal Miner's Bowling Night theme going on everwhere but I kinda like it.

I used to play billiards a lot.  One of my uncles was a pool hustler, but that's a story for another day.  I think I'll be able to get back into practice.

We also have a full kitchen.  I'm still hiring too....

Tags:

Task Parallel Library

by Administrator 27. January 2008 16:36

Parallel Tasks

A while back (http://www.damonpayne.com/PermaLink,guid,fdb0de99-446c-428b-ba98-14ef8c5dfaf0.aspx) I had written about my desire to use brute force concurrency to make up for my poor prime number algorithm and started writing some code to help facilitate this.

At some point an article on the Task Parallel Library came up in my RSS reader and I also got caught up on Dot Net Rocks episodes, one of which was Joe Duffy talking about the TPL.  I went and read the article and I was generally pleased to see that the Task class and the Paralell ForEach was very generally going towards the same type of developer experience that I was shooting for with my WorkUnit<TResultType>.  Obviously, the TPL is 1000x more complete and detailed than my incredibly brief experience but in general it pleases me when I discover I was barking up the same tree as people who are far smarter than I am.

The TPL is just a CTP right now and requires the .NET Framework 3.5 which put me in a situation where I wanted to use it but couldn’t justify the risk on a production app.  I have a very intensive Data Import project that takes far too long to run through the test data set.  I wanted to show some thought processes here for the code I wrote to get some of the same functionality without TPL, which will also help one “to think parallel” in preparation for many-core systems becoming the norm in coming years.

The Command Pattern

I use the Command pattern quite a bit to chunk bits of work/responsibility into succinct and portable units.  Let’s review the summary for the Command pattern:

Encapsulate a request as an object, thereby letting you parameterize clients with different requests, queue or log requests, and support undoable operations.

Many people use the example of a GUI action that is invokable from more than one menu but the Command pattern is all over the place in .Net, SqlCommand for example.   Breaking units of work into Commands can also help code readability quite a bit.  In the Import process I mentioned above, the Main() method was over 700 lines when I inherited it, it’s now down to 50-ish due in large part to identifying what the units of work were and Commandifying them. 

The Command pattern also lends itself to completing large workloads faster by distributing the workload over multiple cores.  Take a look at the Task class from the TPL:

class Task<T> : Task

{

  Task ( Func<T> function );

  T Value { get; }            // does an implicit wait

}

Looks a bit Command-y to me.  Unfortunately, as I’ve mentioned, I can’t really use the TPL yet so let’s describe the problem and look at the home-rolled solution.

The Process

Let’s describe a generic data import process that involves data, photos, remote sites, and a data store:

There are two ways we can use concurrency to save time here.  The first involved the entire system being blocked by network I/O during some of these operations.  The second involves some of these tasks not being dependent on each other meaning we can take advantage of more cores by running several threads.  In this particular case, the I/O blocking deserves special consideration.  Suppose I have 50,000 rows of relational data to process from a file, and each row is processed very quickly.  I also have 50,000 images to download from another file that is logically part of this import, and each image takes three seconds to download.    The compute-bound tasks could still be running while the I/O bound tasks are taking their sweet time.  Now that we have a problem, let’s look at using commands and parallelism to save me time when I’m testing this import.

Solutions

After considering my problem I can clearly see that I want to have a DataImportCommand that I can treat generically by the task scheduling code I’m about to write.  From the Activity Diagram above, I decide on the following population of classes to model my problem:

 

In the Main() method I build up a List<DataImportCommand> in the logical order I would execute them in sequentially.  If I could use the TPL and these DataImportCommands had no dependencies I could use Paralell.For(myCommandList) since my generic list implements IEnumerable<T>.    I can’t, and they do, so we have two more items to design.

Using Threads

One thing that the TPL does that I have not home-rolled is to have a smart work-stealing task scheduler.  You want to have your work scheduled such that there is one Task (Command) running on every available CPU core at any given time.  At first thought I thought ThreadPool.QueueUserWorkItem would be the appropriate tool.  Reviewing the documentation for QueueUserWorkItem (which is always a good idea to do when writing thread code) this isn’t a good idea.  The managed thread pool should not be used for long running tasks and should not be used if you need to change the priority of threads.  Some of our DataImportCommands are surely long running, and when I deal with the Command dependency issue in a moment I may care about thread priority.  The managed thread pool is really for delegates and short tasks and such, so I spin up one real thread per DataImportCommand.  This is something you must consider on a case by case basis.  In this case I’ll really only have 6-10 Commands to do for one import.  According to a Dot Net Rocks guest, Outlook spins up 60 threads immediately upon starting so I’m not too concerned (yet) about having too many threads running.

                foreach (DataImportCommand cmd in commands)

                {

                    log.Info("Queing " + cmd);

                    ThreadStart ts = new ThreadStart(cmd.Execute);

                    Thread t = new Thread(ts);

                    t.Name = cmd.FriendlyName + "WorkerThread";

                    if (cmd.HasDependencies)

                    {

                        t.Priority = ThreadPriority.Normal;

                    }

                    else

                    {

                        t.Priority = ThreadPriority.AboveNormal;//Give first tasks priority

                    }

                    t.Start();

                }

Start a Thread, name the thread (for logging/debugging) and away we go.   Regarding the thread priorities, this is an optional experiment.  The experimental idea is that a thread with dependencies be given a lower priority and those without dependencies be given a higher priority since other threads may be depending on them.

Command Dependencies

I didn’t immediately see a way to determine dependencies between units of work automatically so we must resort to hinting, and by hinting I mean “explicitly spelling it out”.   We can clearly see that we can’t unzip a file before its downloaded and we can’t begin processing a data text file before it’s unzipped.  We must, therefore, come up with a programming lexicon for

1.       Describing that C depends on B, which in turn depends on A

2.       Blocking the execution of C until the tree of C’s dependencies is complete.  As I write this, I realize I didn’t actually use a Tree structure per se in my solution, but that a Tree and a functional programming language may be a better way of restructuring the problem.  I’ll revisit this idea at the end of the article.

We start by adding some functionality to the base class.  We need to be able to know when a Command is complete, we need to know what other Commands any given Command might depend on, we need to block a command from executing even though we’ve already created and started it’s thread, and we need to signal the Command thread that the prerequisite are now complete.  My classes now look like this:

 

When building commands we “hint” as to what the dependencies are.  We then fire off a thread for each command and let the dependencies work out.  The base class Execute method looks like this:

        public virtual void Execute()

        {           

            WaitOnDependencies();           

        }

… and we follow the best practice of calling overridden methods first in our derived classes…

        public override void Execute()

        {

            base.Execute();

//do our real work here…

            Complete = true;

        }

The Complete property is very vanilla with one minor exception:

        public bool Complete

        {

            get

            {

                return _isComplete;

            }

            set

            {

                _isComplete = value;

                _resetEvent.Set();

            }

        }

And now we can see what the WaitOnDependencies() method does for us:

        protected virtual void WaitOnDependencies()

        {

            if (HasDependencies)

            {

                foreach (DataImportCommand dep in Dependencies)

                {

                    if (!dep.Complete)

                    {

                        ILog log = LogManager.GetLogger(GetType());

                        log.Warn("Waiting on dependency command " + dep.FriendlyName);

                        dep.ResetEvent.WaitOne();

                    }

                }

            }

        }

So, each command checks its list of dependencies before executing its own Command code. On a dual core machine we go from seeing 50% utilization of each core to 90% utilization of each core and a tremendous decrease in the amount of time needed for the entire import to run.  To revisit the Thread priority question above, we could potentially increase the priority of the current thread in this method once we know the dependencies are accomplished.

Also, note that Consolas is the best code font ever.

A Tree by any other name…

Perhaps a more elegant way of modeling the problem would be to describe our tasks as a Tree of dependent tasks.  At this point I will change the lexicon to say Prerequisite instead of Dependency.   The root node of the tree is the one task or Command we can run without any prerequisite, perhaps this is just Main().  Using the same working example, and looking at which commands are set as dependencies of which commands we could model our problem like so:

 

Or, more generally, like so:

 

 

The tree’s we are talking about now are obviously not Binary Trees.  Right now I have two cores on my desktop on and four on my server.   Pretty soon I’ll have four on my desktop and eight on my server, then perhaps 16 cores on my server after that.  In light of this consideration, what can we gain by structuring the problem as a Tree?

What we are currently doing is attempting to start each command even if it’s not really ready to run, and we in fact know at runtime what it would take for a DataImportCommand to be ready to run.  When we take the shotgun approach and throw them all out there as eligible Threads, we are wasting some resources with the ManualResetEvents and blocking/waiting for Prerequisites to complete.  Assuming the Task tree in the above diagram, we might create 16 threads and all but two of them will be blocking until the “level 2” tasks complete.  Using a recursive structure might be a cleaner way to envision this, which I will explore in a future article.

 

 

Tags:

Program Control

by Administrator 11. January 2008 17:59

I saw this over at Peter Foot's blog and the first thing I thought was "Maybe they are using Exception handling to control program flow?"; by this I mean just trying to use a Proxy without first checking to see if it's valid, and handling this situation in a try/catch block.  I have the utmost repect for the CF team, so maybe this is not what they're doing, or maybe they have a good reason for doing so. 

Regardless of what the CF team is doing, this re-opened a barely healed wound.  I recently inherited a codebase which I am modifying to help a friend.  This code is riddled with the age-old no no of "Using exception handling blocks to control program flow".  I think I first saw this best practice in Bjarne Stroustroup's "The C++ Programming Language" at least ten years ago.  The code has things like catching NullReferenceException rather than checking for null first, and catching ClassCastException rather than checking a column's DBNull status.  Logic that should be the "if" or "else" of a condition statement is then executed in the catch{} block. Not only is this ugly practice making the code hard to read, but considering the frequency of these conditions the code is being slowed down quite a bit while the exception is handled. 

Tags:

Warner goes Blu-Ray exclusive

by Administrator 4. January 2008 21:12

Tags:

Blog Addiction

by Administrator 3. January 2008 15:20

Tags:

2008 Goals

by Administrator 1. January 2008 18:59

It would appear that I did not blog my 2007 goals, the meeting of which were a mixed bag.  I do have some solid 2008 goals.

  1. Debt:  Jen and I have never very agressively went after paying off the student loans and credit cards but carrying debt has started to bother me a bit more in light of the current state of the economy, and I'd just like to have that $$ to spend and save every month.  An aggressive but attainable 2008 goal is to reduce the debt by 33%.
  2. MVP: I'd like to keep my Microsoft MVP - Solution Architecture status.  I need to seriously get my arse in gear on this one, but since my wife starts on days next week I can get back into speaking engagements and such.
  3. Total home organization: This one is hard to explain without making me sound like an OCD nutjob.  There are various things all over the house that are just sitting on the floor of the closet, etc. because there was no other place to put them.  I spent a rather large amount of time this past fall setting up shelving, getting rid of moving boxes, and buying storage bins but there's honestly not a single room in the house I could call "done" except maybe for my kitchen.  I'm hoping to get this task to 90% before Kid #2 is born.
  4. Save: I save a fair bit every month and I usually drain the accounts 1x per year in order to purchase some kind of toy, usually audio/home theater related.  In 2008 I want to save enough to take advantage of this poor real estate market and buy some land.
  5. Let it go: This one is a bit personal, but I'll vaguely phrase it as "I don't have time for people who don't have time for me", no matter how much history there might be.

I have some other minor goals that are not official Resolutions.  There are some home improvement projects I'd like to get done, I'd like to build a great team at work (unfortunately largely out of my control), I'd like to build my cellar up to about 50 bottles of wine, and exercise a little bit. 

Tags:

About the author

Damon Payne is a Microsoft MVP specializing in Smart Client solution architecture. 

INETA Community Speakers Program

Month List

Page List

flickr photostream