Tuesday, May 13, 2008

So, five years ago when my daughter was brand new, I was a runner.  In the time since then I've gotten into the worst shape of my life and keep meaning to do something about that.  By looking at me, you wouldn't necessarily think that I've got a got or anything, but it's there; I also have various pants that don't fit.  The necessary motivation came recently when everyone at CarSpot decided to have an Al's Run team.  Al's Run 2003 was pretty much the last time I intentionally did exercise of any kind.  I have gone jogging a total of One Times so far this year and two miles was unbelievably painful.  Combined with a newborn who's not sleeping very much I am not in a good place to start training, but today I bought some actual running shoes so now I feel committed.  When I quit I was able to do an 8k with an average mile time of around 7:45; that was a long time ago and I didn't like wine and steaks as much as I do now.  I'm going to pick a 5k in July as an intermediate goal.

I've long been a fan of Casey Chestnut, who calls his blog Brains-n-Brawn.  Now that I'm running again maybe I can claim "Smarts and Swiftness" ?

 

Tuesday, May 13, 2008 2:33:22 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  |  Trackback

Hanselminutes show #112 brings together people from xUnit.Net, NUnit, an MBUnit to discuss unit testing frameworks.  The whole show is worth listening to, but they especially mentioned running tests in parallel, which of course I've done some work on:

http://www.damonpayne.com/2008/05/09/ConcurrentUnitTestingWithXUnitNet0.aspx

http://www.damonpayne.com/2008/05/09/ConcurrentUnitTestingWithXUnitNet1.aspx

The other thing thing mention is the other potential intersection of Unit Testing and Concurrency: testing for thread deadlocks, etc.  I have been working for a few weeks on an article and some semantics (no working code this time due to the scope) that deal with exactly this problem space.  I should have it published this week.

Tuesday, May 13, 2008 12:43:41 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  |  Trackback
 Monday, May 12, 2008

As I talk to other developers at Launch Events, user groups, and online, I am met with confusion when I talk about my concurrency experiments.  I get the feeling that this is looked at as some sort of Ivory Tower academic exercise, or even worse some ridiculous nonsense concocted to try to look intelligent.  Concurrency is important, and everyone should be thinking about it.

At the risk of sounding like a broken record, I’ll deliver my spiel again in a different way.

The silicon industry has presented us with terrible Bait ‘n Switch.  For more than 12 years (in my case anyway) I had the option to buy more megahertz and gigahertz at least once per year, oftentimes more.  Office is slow?  No worries, it’ll perform great as computers are upgraded.  Bioshock brings your system to its knees?  Just get a new processor and a new GPU and you’ll be rockin’ 60fps with Big Daddy.  Due to some lame excuse commonly called “The Laws of Physics” we’re not getting faster and faster CPUs anymore.  Moore’s law is not dead yet: we’re getting more transistors all right.  The problem is that these transistors are on two or more cores instead of one core, and a quad core 2.5ghz processor is not remotely the same animal as a 10ghz processor.  I do half expect  the silicon industry to pull a Cheap Stereo marketing trick any day now.  Go to best buy and look through the home theater in a box section.  You’ll see claims of “525 watt system!!” and so forth.  These are not 525 watts per channel systems (that would be a lot) but 525 watts total, for 7 channels, which is not the same level of performance.  When I next buy a CPU, I expect some branding telling me “This is a 20 GIGAHERTZ POWREHOUSE”.   Lacking a 10 GHz processor (longing sigh), however, the multi-core CPU is the consolation prize offered to us by the silicon industry.

Despite being stuck in the 2-3ghz limbo, software is still increasing in complexity.  People want responsive applications.  It is my opinion that we are currently preparing to exit a time of “Free Concurrency Improvement”.   I wish I had a more compelling name for this sweet plateau, so let me explain.  Modern operating systems happen to be very good  at task scheduling.  If I run Visual Studio 2008, SQL Server 2005 Studio, Outlook, MSN messanger, and Zune player at the same time, my machine may not be very responsive.  Moving from a single-core 2.4ghz processor to dual core 2.4ghz processor will make my machine more responsive.  Running JUST the Zune player, for example, is probably not any faster than it used to be.  Our Free Improvement here is that the Unit of Concurrency is the Windows  Process, and Windows is good at putting Process A on one core and Process B on the other core where they can both get more horsepower.

We are approaching a time, however, where 2.4ghz will not run a “mostly single threaded” application in an acceptable fashion even if the application gets a 2.4ghz core all to itself.  We need to stop thinking about concurrency as something that will keep Winamp from skipping when I open Visual Studio and start thinking about the next level of concurrency : running my ONE application as fast as possible by doing chunks of work concurrently on many cores.   This requires developers to re-think application design.  I did no threading in my computer science degree. The next generation of Computer Science graduates needs to be comfortable with concurrency before leaving college. 

10 gigahertz processors sure would be nice though.  10 gigahertz Quad Core.  Core 10 Quad, there, I branded it for you.  Intel, AMD, anyone listening?

Monday, May 12, 2008 7:56:27 AM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  |  Trackback
 Thursday, May 08, 2008

 

With a basic understanding of what XUnit is doing, we need to determine where we’re going to try to split things up across multiple cores.  Take a look at the sequence diagram fom the last article (here); we have a choice to make.  It’s better to make outer loops Parallel vs. inner loops.  The design decision this helps us make  is that the unit of concurrency is the Class. This means if I make 100 Tests inside a single class, it will run sequentially just as though we had no fancy concurrency code.  In the rest of this article we’ll look at the modifications needed to use Payneallel.ForEach with the unit tests.

XUnit GUI

We start our modifications in the XUnit GUI, which is refreshingly straightforward.  The first thing to do is make it easy to choose concurrent execution.  The XUnit GUI now looks like this, following the sequential execution of the control group:

The “Run concurrently” checkbox is my addition.  When you click the Run button:

        void OnClick_Run(object sender, EventArgs e)

        {

            _totalCount = 0;

            _testCount = GetTestCount();

            ResetUI(_testCount);                                

            buttonGo.Enabled = false;

            if (_concurrentChk.Checked)

            {

                ThreadStart ts = new ThreadStart(RunAsync);

                Thread t = new Thread(ts);

                t.Name = "xUnitAsyncThread";

                t.Start();

                textResults.AppendText("Running Async...\r\n");

            }

            else

            {

                wrapper.RunAssembly(TestCallback);

            }

Our xUnit ExecutorWrapper is “wrapper”.    In order to keep from screwing around with the GUI thread, we run XUnit on a new thread, which will in turn create many other threads using Payneallel.  By default, Payneallel will block the calling thread until all operations are done, however we cannot both block the GUI thread AND allow it to update itself as test results are available.  The RunAsync method is simple:

        void RunAsync()

        {

            wrapper.BeginRunAssembly(TestCallback);

        }

 

ExecutorWrapper

My next modification is to the ExecutorWrapper class.  I tried to make my changes to XUnit additive only, adding functionality by adding methods rather than modifying things that already work for sequential execution.    

        public void BeginRunAssembly(Action<XmlNode> callback)

        {

            XmlNodeCallbackWrapper wrapper = new XmlNodeCallbackWrapper(callback);           

            CreateObject("XUnit.Sdk.Executor+RunAssemblyParallel", executor, wrapper);

        }

I see no reason not to keep running the test in a separate AppDomain.  We have added another inner class to Executor, the RunAssemblyParallel class. 

Executor

Through experimentation I found that this would be the appropriate place to introduce parallel execution, at the Class level as I said previously.  This class is almost a copy of the RunAssembly class included with XUnit:

        public class RunAssemblyParallel : MarshalByRefObject

        {

            /// <summary/>

            public RunAssemblyParallel(Executor executor, object _handler)

            {

                DoParallel(executor, _handler);

            }

 

            protected void DoParallel(Executor executor, object _handler)

            {

                ICallbackEventHandler handler = _handler as ICallbackEventHandler;

                AssemblyResult results = new AssemblyResult(new Uri(executor.assembly.CodeBase).LocalPath);

 

                Action<Type> doOne = delegate(Type type)

                {

                    ITestClassCommand testClassCommand = TestClassCommandFactory.Make(type);

 

                    if (testClassCommand != null)

                    {

                        ClassResult classResult = TestClassCommandRunner.Execute(testClassCommand,

                                                                                 null,

                                                                                 result => OnTestResult(result, handler));

                        results.Add(classResult);

                    }

                };

 

                Type[] exportedTypes = executor.assembly.GetExportedTypes();

                int count = exportedTypes.Length;

 

                //Parallel Test execution

                Stopwatch sw = new Stopwatch();

                sw.Start();

                Payneallel.ForEach<Type>(exportedTypes, doOne, true);

                sw.Stop();

                Console.WriteLine("Time elapsed: " + sw.Elapsed);

                results.ExecutionTime = sw.Elapsed.TotalSeconds;

                OnTestResult(results, handler);

            }

        }

 }

Like the TPL, Payneallel likes an Action<T> to execute.  In the vanilla XUnit version of this code, there is no StopWatch and there is a regular foreach() block instead of Payneallel.ForEach.  The stopwatch is important because I can no longer trust XUnit to time the execution!  For a long time I ran and re-ran my tests and the Parallel code was always slower than the sequential version.  Then I had a “pwop” moment and found the following line of code:

                ExecutionTime += child.ExecutionTime;

Whoops!  We can’t just add the execution time of the children (from TimedCommand) when some of the commands are running at the same time. 

Results

With the Timing issue solved, I was successfully executing unit tests concurrently and saving a lot of time doing so.  Here is the same set of unit tests ran using my new Concurrent xUnit hack.

I’ll take 27 seconds over 51 seconds any day, and I have not done any optimization work yet, nor constructed a test case where the tests are nearly 4x faster on a four processor machine, but I expect to be able to get there.  As I mentioned before, the Class is the unit of concurrency with this experiment, so the amount of time saved will depend heavily on how the test cases are structured.  A more ideal method would be to first get a list of all of the individual methods marked with [Fact] and use the parallel semantics on that list instead. 

I have a side project that is woefully under unit tested, code that I inherited.  I write unit tests for the code I touch as I refactor it.  The unit tests will involve a lot of database access, calculations, and Presenter mocking.  I can’t disclose what this codebase is just yet, but I am in the process of testing TestDrivenàXUnitàNCover.  I depend heavily on NCover and I really can’t imagine manually trying to determine what I’ve got test coverage on anymore.  If this test is successful, I will eventually be able to report on how this concurrent unit testing works on 100,000 lines of code 99% covered by thousands of unit tests.   This should be a sufficient test case to prove this idea is sound.

As the years go by and we still don’t have 5ghz machines, designing frameworks with concurrency in mind will become increasingly important.

Thursday, May 08, 2008 7:13:40 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [2]  |  Trackback

Concurrent Unit Testing with xUnit – The answers come in dreams

Coil: the answers come in dreams

Well, not precisely in dreams, but in blog posts.  No sooner had I written (http://www.damonpayne.com/2008/04/17/ConcurrentUnitTesting.aspx) about how the over-design of NUnit was going to make it hard for me to implement concurrent unit testing than I see Scott Hanselman feature xUnit on his Daily Source Code (http://www.hanselman.com/blog/TheWeeklySourceCode24ExtensibilityEditionPlugInsProvidersAttributesAddInsAndModulesInNET.aspx ).  The words that caught my eye: the source is extremely tidy.  Scott probably meant that the organization of the solution was tidy but I grabbed it from CodePlex and started investigating.  The design is tidy too, could this be a better platform on which to complete my research?

I am generally liking xUnit.net so far, and I strongly expect I’ll be ditching NUnit in favor of this across the board assuming the integration with TestDriven and NCover work as I’d expect.  It’s nice to just say “using xUnit” instead of “using NUnit.Framework”, and I like that I don’t have to place a [TextFixture] attribute on the class.  But, these are small concerns saving a few keystrokes.  What we’re really concerned about is the original goal I wrote about:

On a sizable project, with a meaningful suite of unit tests, a developer practicing proper due diligence during the development lifecycle will spend a tremendous amount of time running Unit Tests.  This is an unfortunate disincentive for the developer to run said tests.

In general, the “Command” strucuture of a unit test foreshadows parallel-ability  Properly designed unit tests should be easy to run in parallel: a unit test should Stand Alone, meaning each test case does not depend on state set up elsewhere.  xUnit does two more things that help us out here.  The first is by removing the notions of “TestFixtureSetup/Teardown” they’ve made it much harder to shoot yourself in the foot at the Class level by relying on state, though for my example this is merely food for future thought as we’ll see later.  The second is that it would appear they use a Randomizer to make sure the [Fact] methods in a Class do not run in any dependable order. 

I set up a suite of 17 unit tests, implemented in 7 classes.  The tests do incredibly useful things like divide int.MaxValue by things and SpinWait().  Using xUnit is simple:

using XUnit;

 

namespace DamonPayne.xUnit.Tests

{

    public class FooTester : TestBase

    {

        [Fact]

        public void Fact1()

        {

            Console.WriteLine("FooTester::Fact1");

            System.Threading.Thread.SpinWait(int.MaxValue / 2);

            Assert.False(false);

        }

The last thing to do before jumping into code is to establish a baseline.   My 17 tests take 51.78 seconds to run in the xUnit GUI in all their spin-waiting glory.

Payneallel Revisited

While doing the research for this article, I found a few minor issues with my Payneallel.ForEach code I used with the Tree Concurrency articles (http://www.damonpayne.com/2008/04/03/ManagingConcurrencyWithTrees0.aspx ).  The first issue dealt with the code I used to wait for all concurrent iterations to be done before returning to the calling thread.  If the number of tasks was less than the number of processors on the machine, the “never touched” worker threads would never Finish().  The second dealt with an interesting thread-timing issue related to when a Worker declared itself “Busy”.  Concurrency is fun!  At any rate, here is the revised Payneallel code I used within xUnit:

using System;

using System.Collections.Generic;

using System.Threading;

 

namespace XUnit.Sdk

{

    /// <summary>

    /// Contains static methods and internal helper classes for executing concurrent operations

    /// </summary>

    public static class Payneallel

    {

        /// <summary>

        /// Concurrently perform the body action on each item in source

        /// </summary>

        /// <typeparam name="TSource"></typeparam>

        /// <param name="source"></param>

        /// <param name="body"></param>

        public static void ForEach<TSource>(IEnumerable<TSource> source, Action<TSource> body)

        {

            ForEach<TSource>(source, body, true);

        }

 

        /// <summary>

        ///

        /// </summary>

        /// <typeparam name="TSource"></typeparam>

        /// <param name="source"></param>

        /// <param name="body"></param>

        /// <param name="waitAll"></param>

        public static void ForEach<TSource>(IEnumerable<TSource> source, Action<TSource> body, bool waitAll)

        {

            WorkerPool<TSource> pool = new WorkerPool<TSource>();

 

            foreach (TSource src in source)

            {

                Worker<TSource> worker = pool.GetWorker();

                //Console.WriteLine("Using worker " + worker.Name);

                worker.Arg = src;

                worker.Work = body;

                worker.Go();

            }

 

            if (waitAll)

            {

                pool.WaitAll();

            }

        }

    }

   

 

    /// <summary>

    ///

    /// </summary>

    /// <typeparam name="T"></typeparam>

    class WorkerPool<T>

    {

 

        public WorkerPool()

        {

            _workers = new List<Worker<T>>(Environment.ProcessorCount);

 

            for (int i = 0; i < Environment.ProcessorCount; ++i)

            {

                Worker<T> worker = new Worker<T>("Payneallel " + i);

                _workers.Add(worker);

                worker.Done = new Action<T>(WorkerDone);

                worker.Go();

            }

 

            _workerDoneEvent = new ManualResetEvent(false);

        }

 

 

        private ManualResetEvent _workerDoneEvent;

        private static List<Worker<T>> _workers;

        private object _syncRoot = new object();

 

        /// <summary>

        ///

        /// </summary>

        public void WorkerDone<T>(T arg)

        {

            lock (_syncRoot)

            {

                _workerDoneEvent.Set();

            }

        }

 

 

 

        public Worker<T> GetWorker()

        {

            Worker<T> worker = GetFreeWorker();

 

            while (null == worker)

            {

                _workerDoneEvent.WaitOne(5, true);

                worker = GetFreeWorker();

            }

 

            _workerDoneEvent.Reset();

 

            return worker;

        }

 

 

 

        private Worker<T> GetFreeWorker()

        {

            foreach (Worker<T> w in _workers)

            {

                if (!w.Busy)

                {