Parallel Computing

P

Boring history prologue

I started playing with ‘computers’ back in the days when they came with 3K of memory (Vic 20 anyone?). And thank goodness I was too young to have experienced the punch card era… They quickly scaled up so that by the time I was at uni, 16MB of RAM was becoming standard. Fast forward to today and we can buy consumer notebooks with 16GB of RAM. Never in our wildest uni dreams did we think we’d have Gigabytes of memory to play with.

Today’s equivalent

Perhaps the equivalent of my uni experience for today’s student is the number of cores in a machine. In the future they’ll look back nostalgically at the days of having only 2 cores in their machines… ahhh those were the days – how did we ever survive with so few?

The reality of course is that we are already well down the path of multi-core CPUs. 4 cores is normal, 8, 16 and 32 are almost in the consumer space, and it won’t be long before talking about having thousands of cores is normal. Yes, thousands of cores.

That’s progress right? So, why is it of interest?

Why is this interesting?

The reason parallelism is interesting is because it solves the problem of ‘the next biggest bottleneck’. Performance gains are simply a process of finding the biggest bottleneck, reducing it, and moving on to the next biggest bottleneck. It’s a continuous cycle.

As CPUs have increased in performance, computational and graphic bottlenecks have been drastically reduced. But CPU speed has started to reach a limit. Tasks are queuing up waiting for CPU cycles. How do we services those tasks if the CPU is running at it’s maximum speed? Answer: we start adding the ability to provide more cycles. How do we do that? Simple – We add more cores. Problem solved. 

Well, not quite. Because if tasks are queuing up, more cores just allows us to service more tasks in the same time. What we really want is to get through a single task quicker. This is where parallelism comes in. We want to be able to break a single task down into chunks that can be processed by multiple cores. But how on earth do we do that, with all its breaking apart, management and re-connecting once processed?

And that’s why this is such an interesting problem.

Resources

Before I go any further let me recommend three excellent resources (which I’ve pulled most of the thoughts in this post from).

First up, have a read of this MSDN Magazine article on Parallelism in Visual Studio 2010. This article briefly describes the problem, plus highlights code samples of how the problem is approached. IT finishes with some details about the debugging and profiling tools for parallelism in Visual Studio 2010.

Next, have a listen to .Net Rocks episode 375 with Steve Teixeira. I listened to this entire episode twice. First to get an introduction, and second to make sense of the context they were talking about. This is one of the better episodes in the last few months in my opinion. Steve is extremely eloquent at expressing the problem space, and Microsoft vision, in simple terms.

Last, take a look at this webcast on the Parallel Computing Platform team’s chat about parallelism, and how they approach it in Visual Studio. Steve features in this, and Daniel Moth demonstrates some of the new VS tooling (mentioned in the MSDN article above).

Oh, and you’ll probably also want to subscribe to the Parallel Programming blog.

Parallelism in context

Parallelism is not a new subject. It’s been around for years and is an important component in High Performance Computing (HPC).

The reason it is now a mainstream topic (ie not just lost in the dark halls of academia) is because it now affects even the most most basic consumer. It has moved from the server room to the home user desktop. You’d actually have trouble trying to find a computer with a single core these days.

Is parallelism my problem?

This is a fair question. After all, the hardware vendors have been tackling this issue for a while. And they’ve started moving up the chain, with companies like Intel actively engaging in the software side of parallelism running on their chipsets.

Surely the problem of dealing with parallelism needs to lie at the lowest level, where the framework, operating system and drivers closest to the cores need to do the hard work of determining what gets processed where.

This has traditionally been my view. As a programmer I’ve long held that I should be able to write my code focussing on the business problem, and the framework (and OS) should take care of the technical issues (like maximising performance). And in many regards that is fine.

Sure, the framework can take some of the burden, but what about when something goes wrong? How do you debug code that gets parallelised, if you don’t even understand what parallelism is?

And that’s why it has finally dawned on me that parallelism is every programmer’s problem. We need to understand the basics of parallelism, in the same way we need to understand multi-threading, web state, and unit testing. We need to be able to understand when and where it is applicable, the appropriate patterns to follow and the methodologies in our companies to best apply it.

Parallel computing is a significant mind shift for programmers. It’s not taught (much) at universities, and it certainly isn’t marketed as a ‘sexy’ side to development. The toolset to date has been almost non-existent, and it’s no surprise there are hardly any applications written with parallelism in mind.

Also, consider it from a commercial perspective. The developers and companies who ignore parallelism will quickly find themselves at a distinct competitive disadvantage to those who embrace and design their offerings with parallelism in mind.

Visual Studio 2010 and .NET 4.0

Over at the MSDN Parallel Computing Developer Center the Parallel Extensions to the .NET Framework 3.5 are available as a CTP. This was first released back in November 2007, and the latest release is the June 2008 CTP. Installing this allows you to add some parallelism calls in your .NET code. Parallel LINQ (PLINQ) is one manifestation, whereby you can add .AsParallel() on your LINQ queries to parallelize them.

[As an aside, all the help sample code is in C#, so parallelism is obviously not intended to be of interest to any VB devs out there :-) ]

The exciting news is that all this parallelism stuff is being greatly enhanced & baked into .NET 4.0, and Visual Studio 2010 is adding significant tooling to allow it be coded, profiled and debugged easily. It will also include clarifying the terminology (eg understanding the difference between threads and tasks).

Much of the goodness will be shown at PDC this week.

Parallelism at PDC

I’ll be interested to see how the topic of parallelism gets reported from PDC. There’s at least 9 sessions on the agenda, so Microsoft is certainly giving it some attention. But what will the punters think?

Parallelism at PDC

Applications

Obviously parallelism applies to computational tasks (ie it doesn’t solve the problems of disk I/O and network latency for example).

You might think the obvious place for parallelism is in tackling time consuming tasks. And you may be right. Rendering video would be a good example. Typically this is a long task, and having a thousand cores render out your animation (or special effects or whatever) would be a big boost.

But this only the start. Big gains are also available in small sizes. Consider intensive in memory data manipulation (eg LINQ to Objects). It may only be a 2 second task, but if you could break that into parts and have 1000 cores perform the analysis you may find it completed in a millisecond or two. Not much difference to the user experience on its own, but when combined with numerous other activities, it very quickly starts to add up. Imagine if every single background task on your machine was simply broken down and dealt with by a thousand cores. Imagine a day when the only waiting time is the human interaction.

The next bottleneck

Let’s jump ahead a few years and assume parallel computing is understood and practiced by all. What’s the next bottleneck? Assuming we can process data in parallel, the bottleneck is likely to be in physically moving the data around. The inherent latency in moving a terabyte from my machine in Sydney to yours in New York is going to be an interesting one to solve.

5 comments

  • Nice overview Craig, though I remain skeptical. Currently in Windows Vista I have 65 processes running, with 863 threads, and all I’m running is Firefox, and the stats in XP aren’t much different.

    The 1000 cores of the future could be quite utilized just running Outlook and all the Windows guff :)

    (Also consider each of those 863 threads takes 1mb of memory at least, and I suddenly understand where memory in Vista goes)

  • Nice overview Craig, though I remain skeptical. Currently in Windows Vista I have 65 processes running, with 863 threads, and all I’m running is Firefox, and the stats in XP aren’t much different.

    The 1000 cores of the future could be quite utilized just running Outlook and all the Windows guff :)

    (Also consider each of those 863 threads takes 1mb of memory at least, and I suddenly understand where memory in Vista goes)

By Craig Bailey

Archives