Wednesday, July 02, 2008

http://arstechnica.com/news.ars/post/20080702-intel-an-expensive-many-core-future-is-ahead-of-us.html

Expensive for who?  For software developers of course.  It seems to me that the silicon industry is trying very hard to obscure a basic fact: even if new technologies such as Software Transactional Memory become common, even if you give us 100 cores to play with, even if things like the Parallel Extensions from Microsoft are elegant and easy to use, even if Intel's compiler research turns out to be a huge help,  not all problems can benefit from parallelism.  In many cases, programmers can go against what would today be considered good practices and make copies of huge shared data structures, (at least we're not being told that we're never going to have more than 4GB of memory) in order to reduce data sharing between threads.  However, there are many problems that need shared read-write data.  Throwing massive numbers of cores at these problems will result in performance slower than single-core performance as resources are eaten up acquiring locks.  On the Windows platform, all of our GUI technologies still use a "compartment" model whereby objects are owned by a single very special thread and we are not alowed to touch them except by marshaling onto this Special message pump.  What good are these 100 core systems going to do my WPF applications? 

Wednesday, July 02, 2008 12:28:46 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [1]  |  Trackback
 Monday, June 23, 2008

Dan has an article showing some nice syntactical sugar for spawning threads.  Dan has been studying the model of CCR, currently part of Robotics Studio.  The article specifically mentions the Compact Framework, but if you are doing full-framework development I would encourage you to check out the Parallel Extensions library as well.  It was mentioned at TechEd that the CCR might be refactored to use the TPL, so it'd be worth taking a peak at.  Using System.Threading.Tasks you could write code that starts to look less like the traditional ThreadStart code and more like what Dan is doing.  Without spending more than 30 seconds on this idea:

TaskCreationOptions tco = TaskCreationOptions.Detached;

TaskManagerPolicy tmp =

new TaskManagerPolicy(1, Environment.ProcessorCount,1,0, System.Threading.ThreadPriority.AboveNormal);

TaskManager tm = new TaskManager(tmp);

Task t = Task.Create(

(a) => { Console.WriteLine("doing some work"); },

tm, tco);

If we start to see some multi-core mobile processors, it might be an interesting excercise to port a subset of PFX to the Compact Framework.

Monday, June 23, 2008 12:05:23 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [1]  |  Trackback
 Thursday, June 05, 2008

Ok, so, I found some great Silverlight tips on the afternoon of Day two courtesy of Jeff Prosise (http://www.wintellect.com/TechnicalBioDetail.aspx?Tech=5) .   I am definitely going to be taking his advice on some points and blogging about it as I convert my Beta 1 examples to Beta 2 examples.  Jeff had an implementation of The Game of Life in Silverlight that got me thinking about getting some Concurrency peanut butter into the Silverlight chocolate.  Would the LINQ based ray tracer work without too much modification in Silverlight?  Surely the Task Parallel Library will not work in Silverlight without serious modification and many things will likely not be available at all.  Still, I smell a side project: to what degree could we at least get Parallel.ForEach<> , Parallel.For, and perhaps some of the basic System.Threading.Tasks.Task things working in Silverlight 2.  Possible uses for this include a Silverlight based Folding @home style app and I’m sure there are others.

2008 TechEd MVP Party

I’m writing this on the morning after the 2008 MVP party, hosted in the Voodoo Room at the House of Blues in Orlando.  What a blast!  I showed up and realized that while I knew no one most other folks seemed to already know other MVPs.  Shortly thereafter Carl Franklin walked in the door.  Since I don’t get to meet Internet Famous people often, I said hi and thanked him for introducing me to Steely Dan with his rendition of Home at Last .  He seemed amused and gave me the big High Five when I said I actually bought Aja because of him. 

There were two guys there doing the whole Blues Brother s schtick.  It would seem not as many people came as RSVP’d, so I was able to call The Vanderboom and get him in to the party.  The lower attendance also meant that the free drink tickets flowed in a steady stream for all of us.  There are some pictures circulating of me swing-dancing with a woman who runs a user group in Seattle.  Hopefully no other incriminating evidence exists, everyone remaining when they kicked us out was getting wacky by that time. 

Thanks to Microsoft for putting this event on, and all the MVPs who came for a great time.

Thursday, June 05, 2008 10:01:14 AM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  |  Trackback
 Wednesday, June 04, 2008

This morning’s session with Stephen Toub on the parallel extensions to the .net framework was excellent.  Stephen had a  quad-core laptop that was enormous and hotter than the surface of the sun.  Intel’s quad-core mobile chips for laptops should be out this summer.  Anyone want to buy a used Vostro with a Core2 Duo?  He also had an early prototype of a system based on the Intel Dunnington architecture that appeared to have 20 cores.  I suddenly felt that my Quad Core setup at home was hardly adequate to do real concurrency research.

I owe him several comments and I hope my notes are adequate to remember all of them.  I brought some more questions I didn’t ask yesterday.

Q: Stephen has stated that the goal is for developers to extend System.Threading.Tasks.Task in order to do concurrency at a lower level than Parallel.ForEach<>, yet Task has an internal constructor keeping you from extending it.

A: The PFX team realized this during their internal code reviews and this will be fixed in a future CTP.

Q: In the ray-tracing example, you can see one of the pitfalls of static partitioning of workload.  Compare the top of the image vs. the bottom.  In ray-tracing the objects and reflective surfaces make for much more intensive computations, and so the top of the screen draws very quickly while the bottom of the screen fills in rather slowly.  Is thought going into hinting semantics that would allow users of PFX to indicate that certain units of execution have a greater “weight” relative to others?  Would you handle this by changing priority of some Task threads so Windows gives it extra cycles, or changing the static partitioning algorithms?

A: With the current Task API, you can set the priority of a Task, which ultimately means thread priority.  There is design discussion going on related to how we might be able to give hints to the TPL.

I just got out of a “deep dive” on the Silverlight 2 rendering pipelines.  There were some good debugging hints I didn’t know about, but other than that the dive was not as technical as I’d hoped.  Right now I’m in a social networking session led by Rob Howard, and then on to more Silverlight.

Wednesday, June 04, 2008 2:20:01 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [1]  |  Trackback
 Tuesday, June 03, 2008

My flight got in pretty late last night and after fixing a shampoo leak I went down for a few drinks with the Vanderboom.  Despite going to bed at 1:30am and getting up at 6am, I woke up more rested than the average night of Ethan waking up constantly.

The first item of business this morning was the Keynote with Bill Gates, in his last public speaking appearance as a full-time Microsoft employee.  The Keynote was an extended version of the “Bill’s last day” video with several high level presentations by other MSFT folks and a Q&A with Bill at the end.  The highlight of the show was the “Ballmer Bot”, a robot that rolled out onto stage screaming “Developers, developers, developers…”

My first session this morning was Steve Teixeira on the Parallel Computing platform.  He showed several brief PFX demonstrations on Stephen Toub’s Quad-core laptop.  Stephen was present in the audience.    I hung around for quite a long time after this session was over talking to Stephen.  I asked if anyone at Microsoft was thinking about tipping over the sacred cow of thread affinity, specifically as it relates to the Holy Gui Thread.  You know, all those WinForms and WPF objects that you cannot touch from any other thread.  This seems like a serious bottleneck in the age of Many-Core computing.  The example I gave was potentially performing the calculations for animations on other threads.  Apparently, WPF almost did not go the static-apartment route, but there were a laundry list of issues with the OS/Shell having some baggage that ultimately drove that team back to the old ways.  A lot of people at Microsoft are thinking about this though, so that’s a good thing.  We also talked briefly about managing dependencies between parallel tasks, the PFX team definitely intends for us to make our units of work inherit from Task.  Stephen thinks we should be able to use the new “ContinueWith” semantics in the June CTP to handle executing a task-tree. (http://www.damonpayne.com/2008/04/03/ManagingConcurrencyWithTrees0.aspx )  I will investigate this, though perhaps not until the plane ride home. 

My next Silverlight session was full, so I’m going to look for another session to round out the day.  The Vanderboom and I are having dinner with MSFT Central Region folks at a nice steak house later.  You’ve gotta love these events.

Tuesday, June 03, 2008 1:50:08 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  |  Trackback
 Monday, June 02, 2008
Monday, June 02, 2008 9:16:40 AM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  |  Trackback
 Thursday, May 29, 2008

The PFX team has announced their TechEd sessions.  I'm there.

http://blogs.msdn.com/pfxteam/archive/2008/05/28/8557291.aspx

Thursday, May 29, 2008 9:17:45 AM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  |  Trackback
 Tuesday, May 13, 2008

Hanselminutes show #112 brings together people from xUnit.Net, NUnit, an MBUnit to discuss unit testing frameworks.  The whole show is worth listening to, but they especially mentioned running tests in parallel, which of course I've done some work on:

http://www.damonpayne.com/2008/05/09/ConcurrentUnitTestingWithXUnitNet0.aspx

http://www.damonpayne.com/2008/05/09/ConcurrentUnitTestingWithXUnitNet1.aspx

The other thing thing mention is the other potential intersection of Unit Testing and Concurrency: testing for thread deadlocks, etc.  I have been working for a few weeks on an article and some semantics (no working code this time due to the scope) that deal with exactly this problem space.  I should have it published this week.

Tuesday, May 13, 2008 12:43:41 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  |  Trackback
 Monday, May 12, 2008

As I talk to other developers at Launch Events, user groups, and online, I am met with confusion when I talk about my concurrency experiments.  I get the feeling that this is looked at as some sort of Ivory Tower academic exercise, or even worse some ridiculous nonsense concocted to try to look intelligent.  Concurrency is important, and everyone should be thinking about it.

At the risk of sounding like a broken record, I’ll deliver my spiel again in a different way.

The silicon industry has presented us with terrible Bait ‘n Switch.  For more than 12 years (in my case anyway) I had the option to buy more megahertz and gigahertz at least once per year, oftentimes more.  Office is slow?  No worries, it’ll perform great as computers are upgraded.  Bioshock brings your system to its knees?  Just get a new processor and a new GPU and you’ll be rockin’ 60fps with Big Daddy.  Due to some lame excuse commonly called “The Laws of Physics” we’re not getting faster and faster CPUs anymore.  Moore’s law is not dead yet: we’re getting more transistors all right.  The problem is that these transistors are on two or more cores instead of one core, and a quad core 2.5ghz processor is not remotely the same animal as a 10ghz processor.  I do half expect  the silicon industry to pull a Cheap Stereo marketing trick any day now.  Go to best buy and look through the home theater in a box section.  You’ll see claims of “525 watt system!!” and so forth.  These are not 525 watts per channel systems (that would be a lot) but 525 watts total, for 7 channels, which is not the same level of performance.  When I next buy a CPU, I expect some branding telling me “This is a 20 GIGAHERTZ POWREHOUSE”.   Lacking a 10 GHz processor (longing sigh), however, the multi-core CPU is the consolation prize offered to us by the silicon industry.

Despite being stuck in the 2-3ghz limbo, software is still increasing in complexity.  People want responsive applications.  It is my opinion that we are currently preparing to exit a time of “Free Concurrency Improvement”.   I wish I had a more compelling name for this sweet plateau, so let me explain.  Modern operating systems happen to be very good  at task scheduling.  If I run Visual Studio 2008, SQL Server 2005 Studio, Outlook, MSN messanger, and Zune player at the same time, my machine may not be very responsive.  Moving from a single-core 2.4ghz processor to dual core 2.4ghz processor will make my machine more responsive.  Running JUST the Zune player, for example, is probably not any faster than it used to be.  Our Free Improvement here is that the Unit of Concurrency is the Windows  Process, and Windows is good at putting Process A on one core and Process B on the other core where they can both get more horsepower.

We are approaching a time, however, where 2.4ghz will not run a “mostly single threaded” application in an acceptable fashion even if the application gets a 2.4ghz core all to itself.  We need to stop thinking about concurrency as something that will keep Winamp from skipping when I open Visual Studio and start thinking about the next level of concurrency : running my ONE application as fast as possible by doing chunks of work concurrently on many cores.   This requires developers to re-think application design.  I did no threading in my computer science degree. The next generation of Computer Science graduates needs to be comfortable with concurrency before leaving college. 

10 gigahertz processors sure would be nice though.  10 gigahertz Quad Core.  Core 10 Quad, there, I branded it for you.  Intel, AMD, anyone listening?

Monday, May 12, 2008 7:56:27 AM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [0]  |  Trackback
 Thursday, May 08, 2008

 

With a basic understanding of what XUnit is doing, we need to determine where we’re going to try to split things up across multiple cores.  Take a look at the sequence diagram fom the last article (here); we have a choice to make.  It’s better to make outer loops Parallel vs. inner loops.  The design decision this helps us make  is that the unit of concurrency is the Class. This means if I make 100 Tests inside a single class, it will run sequentially just as though we had no fancy concurrency code.  In the rest of this article we’ll look at the modifications needed to use Payneallel.ForEach with the unit tests.

XUnit GUI

We start our modifications in the XUnit GUI, which is refreshingly straightforward.  The first thing to do is make it easy to choose concurrent execution.  The XUnit GUI now looks like this, following the sequential execution of the control group:

The “Run concurrently” checkbox is my addition.  When you click the Run button:

        void OnClick_Run(object sender, EventArgs e)

        {

            _totalCount = 0;

            _testCount = GetTestCount();

            ResetUI(_testCount);                                

            buttonGo.Enabled = false;

            if (_concurrentChk.Checked)

            {

                ThreadStart ts = new ThreadStart(RunAsync);

                Thread t = new Thread(ts);

                t.Name = "xUnitAsyncThread";

                t.Start();

                textResults.AppendText("Running Async...\r\n");

            }

            else

            {

                wrapper.RunAssembly(TestCallback);

            }

Our xUnit ExecutorWrapper is “wrapper”.    In order to keep from screwing around with the GUI thread, we run XUnit on a new thread, which will in turn create many other threads using Payneallel.  By default, Payneallel will block the calling thread until all operations are done, however we cannot both block the GUI thread AND allow it to update itself as test results are available.  The RunAsync method is simple:

        void RunAsync()

        {

            wrapper.BeginRunAssembly(TestCallback);

        }

 

ExecutorWrapper

My next modification is to the ExecutorWrapper class.  I tried to make my changes to XUnit additive only, adding functionality by adding methods rather than modifying things that already work for sequential execution.    

        public void BeginRunAssembly(Action<XmlNode> callback)

        {

            XmlNodeCallbackWrapper wrapper = new XmlNodeCallbackWrapper(callback);           

            CreateObject("XUnit.Sdk.Executor+RunAssemblyParallel", executor, wrapper);

        }

I see no reason not to keep running the test in a separate AppDomain.  We have added another inner class to Executor, the RunAssemblyParallel class. 

Executor

Through experimentation I found that this would be the appropriate place to introduce parallel execution, at the Class level as I said previously.  This class is almost a copy of the RunAssembly class included with XUnit:

        public class RunAssemblyParallel : MarshalByRefObject

        {

            /// <summary/>

            public RunAssemblyParallel(Executor executor, object _handler)

            {

                DoParallel(executor, _handler);

            }

 

            protected void DoParallel(Executor executor, object _handler)

            {

                ICallbackEventHandler handler = _handler as ICallbackEventHandler;

                AssemblyResult results = new AssemblyResult(new Uri(executor.assembly.CodeBase).LocalPath);

 

                Action<Type> doOne = delegate(Type type)

                {

                    ITestClassCommand testClassCommand = TestClassCommandFactory.Make(type);

 

                    if (testClassCommand != null)

                    {

                        ClassResult classResult = TestClassCommandRunner.Execute(testClassCommand,

                                                                                 null,

                                                                                 result => OnTestResult(result, handler));

                        results.Add(classResult);

                    }

                };

 

                Type[] exportedTypes = executor.assembly.GetExportedTypes();

                int count = exportedTypes.Length;

 

                //Parallel Test execution

                Stopwatch sw = new Stopwatch();

                sw.Start();

                Payneallel.ForEach<Type>(exportedTypes, doOne, true);

                sw.Stop();

                Console.WriteLine("Time elapsed: " + sw.Elapsed);

                results.ExecutionTime = sw.Elapsed.TotalSeconds;

                OnTestResult(results, handler);

            }

        }

 }

Like the TPL, Payneallel likes an Action<T> to execute.  In the vanilla XUnit version of this code, there is no StopWatch and there is a regular foreach() block instead of Payneallel.ForEach.  The stopwatch is important because I can no longer trust XUnit to time the execution!  For a long time I ran and re-ran my tests and the Parallel code was always slower than the sequential version.  Then I had a “pwop” moment and found the following line of code:

                ExecutionTime += child.ExecutionTime;

Whoops!  We can’t just add the execution time of the children (from TimedCommand) when some of the commands are running at the same time. 

Results

With the Timing issue solved, I was successfully executing unit tests concurrently and saving a lot of time doing so.  Here is the same set of unit tests ran using my new Concurrent xUnit hack.

I’ll take 27 seconds over 51 seconds any day, and I have not done any optimization work yet, nor constructed a test case where the tests are nearly 4x faster on a four processor machine, but I expect to be able to get there.  As I mentioned before, the Class is the unit of concurrency with this experiment, so the amount of time saved will depend heavily on how the test cases are structured.  A more ideal method would be to first get a list of all of the individual methods marked with [Fact] and use the parallel semantics on that list instead. 

I have a side project that is woefully under unit tested, code that I inherited.  I write unit tests for the code I touch as I refactor it.  The unit tests will involve a lot of database access, calculations, and Presenter mocking.  I can’t disclose what this codebase is just yet, but I am in the process of testing TestDrivenàXUnitàNCover.  I depend heavily on NCover and I really can’t imagine manually trying to determine what I’ve got test coverage on anymore.  If this test is successful, I will eventually be able to report on how this concurrent unit testing works on 100,000 lines of code 99% covered by thousands of unit tests.   This should be a sufficient test case to prove this idea is sound.

As the years go by and we still don’t have 5ghz machines, designing frameworks with concurrency in mind will become increasingly important.

Thursday, May 08, 2008 7:13:40 PM (Central Standard Time, UTC-06:00)  #    Disclaimer  |  Comments [2]  |  Trackback

Concurrent Unit Testing with xUnit – The answers come in dreams

Coil: the answers come in dreams

Well, not precisely in dreams, but in blog posts.  No sooner had I written (http://www.damonpayne.com/2008/04/17/ConcurrentUnitTesting.aspx) about how the over-design of NUnit was going to make it hard for me to implement concurrent unit testing than I see Scott Hanselman feature xUnit on his Daily Source Code (http://www.hanselman.com/blog/TheWeeklySourceCode24ExtensibilityEditionPlugInsProvidersAttributesAddInsAndModulesInNET.aspx ).  The words that caught my eye: the source is extremely tidy.  Scott probably meant that the organization of the solution was tidy but I grabbed it from CodePlex and started investigating.  The design is tidy too, could this be a better platform on which to complete my research?

I am generally liking xUnit.net so far, and I strongly expect I’ll be ditching NUnit in favor of this across the board assuming the integration with TestDriven and NCover work as I’d expect.  It’s nice to just say “using xUnit” instead of “using NUnit.Framework”, and I like that I don’t have to place a [TextFixture] attribute on the class.  But, these are small concerns saving a few keystrokes.  What we’re really concerned about is the original goal I wrote about:

On a sizable project, with a meaningful suite of unit tests, a developer practicing proper due diligence during the development lifecycle will spend a tremendous amount of time running Unit Tests.  This is an unfortunate disincentive for the developer to run said tests.

In general, the “Command” strucuture of a unit test foreshadows parallel-ability  Properly designed unit tests should be easy to run in parallel: a unit test should Stand Alone, meaning each test case does not depend on state set up elsewhere.  xUnit does two more things that help us out here.  The first is by removing the notions of “TestFixtureSetup/Teardown” they’ve made it much harder to shoot yourself in the foot at the Class level by relying on state, though for my example this is merely food for future thought as we’ll see later.  The second is that it would appear they use a Randomizer to make sure the [Fact] methods in a Class do not run in any dependable order. 

I set up a suite of 17 unit tests, implemented in 7 classes.  The tests do incredibly useful things like divide int.MaxValue by things and SpinWait().  Using xUnit is simple:

using XUnit;

 

namespace DamonPayne.xUnit.Tests

{

    public class FooTester : TestBase

    {

        [Fact]

        public void Fact1()

        {

            Console.WriteLine("FooTester::Fact1");

            System.Threading.Thread.SpinWait(int.MaxValue / 2);

            Assert.False(false);

        }

The last thing to do before jumping into code is to establish a baseline.   My 17 tests take 51.78 seconds to run in the xUnit GUI in all their spin-waiting glory.

Payneallel Revisited

While doing the research for this article, I found a few minor issues with my Payneallel.ForEach code I used with the Tree Concurrency articles (http://www.damonpayne.com/2008/04/03/ManagingConcurrencyWithTrees0.aspx ).  The first issue dealt with the code I used to wait for all concurrent iterations to be done before returning to the calling thread.  If the number of tasks was less than the number of processors on the machine, the “never touched” worker threads would never Finish().  The second dealt with an interesting thread-timing issue related to when a Worker declared itself “Busy”.  Concurrency is fun!  At any rate, here is the revised Payneallel code I used within xUnit:

using System;

using System.Collections.Generic;

using System.Threading;

 

namespace XUnit.Sdk

{

    /// <summary>

    /// Contains static methods and internal helper classes for executing concurrent operations

    /// </summary>

    public static class Payneallel

    {

        /// <summary>

        /// Concurrently perform the body action on each item in source

        /// </summary>

        /// <typeparam name="TSource"></typeparam>

        /// <param name="source"></param>

        /// <param name="body"></param>

        public static void ForEach<TSource>(IEnumerable<TSource> source, Action<TSource> body)

        {

            ForEach<TSource>(source, body, true);

        }

 

        /// <summary>

        ///

        /// </summary>

        /// <typeparam name="TSource"></typeparam>

        /// <param name="source"></param>

        /// <param name="body"></param>

        /// <param name="waitAll"></param>

        public static void ForEach<TSource>(IEnumerable<TSource> source, Action<TSource> body, bool waitAll)

        {

            WorkerPool<TSource> pool = new WorkerPool<TSource>();

 

            foreach (TSource src in source)

            {

                Worker<TSource> worker = pool.GetWorker();

                //Console.WriteLine("Using worker " + worker.Name);