Thursday, June 19, 2008

The False Hope Of Apple's Snow Leopard

The problem began several years ago. The processor community realized that despite the fact that they could continue to make chips with smaller transistors, they could no longer make chips with substantially faster clock speeds. There were two separate but related problems.

First, when adding more clock speed to chips, they were beginning to run too hot. Second, this heat generation was a reflection of energy consumption, which in turn meant that the cost of operation of new processors would be too high. And so the industry shifted direction and decided to put more processor cores on the same chip, but without increasing the clock speed. And so, what began was an urgent race to figure out how to leverage multiple processors in a mainstream computing environment.

Last week, Apple announced that their next operating system, Snow Leopard, is going to revolutionize computing by taking much better advantage of these multi-core processors. And perhaps in relative terms, as compared to Leopard, XP or Vista, this is true. Apple’s multi-core handling technology is called Grand Central and indeed I am sure it will bring important speed improvements. But from everything I can tell, there is nothing here that is going to bring back the kind of performance doubling speed increases to all applications that we used to see.

The problem is that most algorithms and program logic cannot be made to run better across many processors. This is not a swipe at Apple, because the problem is indeed industry wide. It’s just a recognition of basic logic principles, and an admonition to not get your hopes up as it relates to the real long-term impact of the industry’s efforts in this area.

The problem of multi-core computing is really very simple. As most of us have experienced, every problem *can’t* be solved better or faster with more people. Some problems can be solved faster by adding a few people, but most problems cannot. In truth, most problems can best, or only be solved by one person at a time. And so it is with computing. The vast majority of problems can only be solved by one logic thread at a time. The reason is obvious. For most process-oriented work, step B is based on the results of step A. And step C is based on the results of step B, and so on.

Of course there are definitely problems, and important problems, that *can* be solved by multiple processors. In fact there are problems that can leverage every single processor you can throw at it. Graphics is one such problem. Similarly, most every conception of how we model human or near human intelligence can infinitely leverage parallel computing. This includes old school AI techniques like neural networks, and new conceptions of how to model the brain’s neocortex like the promising work at Numenta, a company founded by creators of the original Palm Pilot. Additionally, parallel computing will solve many other far more mundane problems. So I am not saying we will not continue to see significant benefits from shrinking transistors.

But the problem is with that core thread, the main “thinker” inside the computer. You might think of it as the ringmaster. That guy is just not getting any faster. Though it may learn to leverage a couple of processors to some degree, it will top out very quickly. This core thread is at the heart of PC performance today, and its days of rapid speed gains are finished. For now, all we will really see are impressive, domain specific performance increases. Some of these will indeed be important. But the era of wholesale speed improvements tied to new processor generations is gone, probably forever.


  1. This sounds likely true, at least for a period of time, but arguably it is platform centric in its app focus. Near term innovation in packaging, energy consumption, computation, and graphics will cetner on highly mobile platforms like cell phones. Evaluation criteria are somewhat different.

    Another way of putting the question is: "How do we define performance?" What do we want to enable in terms of software services and integration through our hardware choices?

    In such guise, the impact of the trend you note is important for certain existing (and critical) markets, but might still be subject to disruption from new vectors.

  2. Certainly there will be continued benefit to multiple cores. But I do think for many applications we will rapidly reach diminishing returns. In truth there is no difference between real life human organizational theory and organizing processes inside a CPU.Its the same reason companies cant just get things done faster by hiring more people.

  3. You're looking at it wrong. While Grand Central and "block" computing adds highly-variable increases on an application by application basis, consider that even while running a single user-facing application there are plenty of processes running. And this isn't the typical use-case.

    Across multiple applications running, the cores are quite under-utilized. Grand Central isn't *just* about speeding up a given application. Think of it as load-balancing on the threads level, as well as getting prioritization for free.

    The boost in parallelizable tasks is obvious, and those kinds of applications are appearing more and more as things become more and more media-centric.

    For those that aren't so easily optimized, serial tasks are STILL often able to have sections of code running independently of others. Grand Central also offers the ability to spawn a bunch of tasks which *are* serial, and which are automatically coalesced in the end.

    Additionally, from a *developer* perspective, this is a huge win, which means more efficient execution, less system slowdown (manually managing pthread lifecycles is difficult work).

    There's no false hope about Grand Central: it does what it's supposed to do: make more efficient use of system resources. And that's done by keeping cores from waiting around doing nothing.

    Way back when, object-oriented programming was difficult for everyone to "get". Now it's easy.

    Give it all time. We developers will absorb the new paradigm, find new ways to do old things, and you'll end up with a more stable and more responsive environment.

    It's a huge deal. Trust me.

  4. God,

    "Grand Central also offers the ability to spawn a bunch of tasks which *are* serial, and which are automatically coalesced in the end."

    Unless Apple has figured out some way to travel across the space/time continuum, this structurally inaccurate. Still I am not saying there is no value to grand central. I am simply saying that it will not allow us to map more cores directly to more performance as clock speed did.

  5. This article is correct, many problems will not benefit from adding more cores, but conversely many will.

    I am surprised no-one has mentioned the "The Mythical Man Month" by Frederick Brooks. It would seem equally applicable to this.


  6. Multicore is definitely over-promised. I agree with what you wrote and recommend what Donald Knuth had to say on this topic:

    "I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks!"

  7. So far, I haven't found anything detailed about Grand Central - only the generic description of the technology on Apple's press releases that to me sounds just as a system API for a Master / Worker pattern. Being a system level, it can probably do something better than implementing it at application level, but I can't find anything exceptional with that. Does somebody have more information (non NDAed?)

  8. Unless Apple has figured out some way to travel across the space/time continuum, this structurally inaccurate. Still I am not saying there is no value to grand central. I am simply saying that it will not allow us to map more cores directly to more performance as clock speed did.

    Clock speed is important, but so is chip architecture: Instruction pre-fetch and decoding, look-ahead buffers and caches, speculative execution, where both sides of a branch are executed before the test; all these contributed greatly to processor performance, beyond raw clock speed. The question is whether innovations of this type can be promoted to a cross-processor architecture. Who knows? There may be innovative memes floating around out there waiting for some designer's brilliant insights.

  9. Anonymous,

    Indeed "who knows" is always a valid answer, though not something to depend on. This is particularly true when one of the greatest computer scientists and algorithm experts Donald Knuth (see comment above) doesn't think parallelism is much good for most algorithmic problems.

  10. Well, Knuth wrote the books; I've owned them for thirty years and now they're going cheaply on Ebay.

    Processors don't know a damned thing about algorithms. We started out by writing branch prediction into the compilers, later, after lengthy heuristics, put it on-chip, and finally started throwing newly found processing power at the computation itself, ignoring the edge conditions until necessary. The gains through parallelization more than made up for the latency of flushing and refilling the pipeline and invalidating the cache.

    Moreover, our view of "power" is much more task-oriented today; threading, load-balancing and asynchronicity all make for a "faster" computing experience.

    Grand Central might not be the philosopher's stone, but it is responsible and ethically correct approach to today's windfall in processor power.

    Not everything sucks!

  11. Anonymous,

    I am not sure how one could characterize any piece of software as being ethically correct or not. It doesnt seem to me ethics has anything to do with this.

    But again, you are off the point which has nothing to do with whether it is good or not to do Grand Central. Of course it is. The only question is what the resulting value will be. My point is that when we go for example to chips that have 64 or 128 cores, most computer programs will *not* run anywhere near 64 or 128 times faster, and so we should not over estimate what Grand Central will facilitate.

  12. Welcome back, Hank. :)

    I totally agree with you. I had this debate with someone a while back. Dual core/quad core processors are simply not worth it at the moment unless you only use the computer for the specific domains in which they actually benefit you (graphics, games, etc). I'd say that until Microsoft and Apple (and all the other big software companies) learn to leverage these multi-core processors effectively, they are just a waste of money (again, unless you specifically need them). "Dual core" is just another buzzword in my opinion.

  13. Indeed dual and quad-core are not a waste of money IMHO since with this small number of cores the operating system scales well at the level of process. I mean, it's highly possible that you're running two different applications doing something at the same time. Of course this starts to be false from 8 cores up.

    Going back to the general question, I feel more optimistic: my point is that most applications needing power are usually more easily parallelizable; most applications that aren't easily parallelizable often don't need to be so fast. I mean, it's not a law, of course, it's just an observation.

    Consider a word processor, for instance, which I believe it's a typical application not easy to parallelize. I don't believe that the basic tasks of a word processor are really CPU intensive: after all, you could write a document made by hundreds of page even in the '90s with a 680x0 microprocessor. You don't type faster than at that old time. What has changed is the complexity of the underlying windowing system (font accuracy, pixel count, effects, transparencies) but all of those things are managed by the operating system and well parallelizable.

    Conversely, one could benefit for having more speed in a spreadsheet, or compressing a video or manipulating photos, and those tasks are all relatively easy to parallelize. Am I too optimistic?

  14. How can you posibly say that multiple cores won't rock with grand central???? I just about have an anurism every time I have to wait for Illustrator to finish a process on my quad machine, while I look at only one core full!!!

    I don't care how you slice it, when you can do more and more things faster and at all at the same time, thanks to Grand Central, you can count me in!

    As far as I'm concerned, that is doubling my computting power, hell, that's more like quadroopling processing!

  15. Good discussion but do you really need GoB in here? He contaminates any discussion by spilling Kool-Aid over everybody.

  16. Having written a lot of multi-threaded applications, it's clear to me that there is no free lunch when it comes to taking advantage of parallelization opportunities such as presented by multi-core chips. You need to use your brain as a programmer to think about what can run concurrently, and when you are done, your head hurts (at least mine did). Problems as simple as the need to serialize access to shared resources (certainly not an uncommon problem) can bring multicore to its knees. And I can't imagine a compiler smart enough to figure out how to parallelize an application without number of 'hints' from the programmer (who again, is forced to think).

    In my opinion, Snow Leopard will take a different direction. There will be improvements to the core OS for sure. But, I predict that Apple will use multiple cores to improve areas such as UI and device support. For example, if they put a core against the UI, imagine what new kinds of animations could be done. If they put a core against DVD playback, there might finally be a reasonably priced machine that can 'boot' a DVD quickly, and not get choppy when something else is happening on the machine.

    And, true to form, the scenarios enabled would have the 'WOW' factor with customers -- and not just be improvements that no-one can see, or deliver (say) a modest improvement in speed.

    Some of the smartest CS minds in the world have worked on this problem for years and I just don't see a magical breakthrough just because it's Apple. (Do you really think Microsoft's Windows kernel doesn't already know how to do a lot (most?) of what Apple is likely to deliver?

  17. No comments? :-)

  18. Sorry Mr Multicore.

    Since I totally agree with everything you said, and you seemed to mirror/agree with much of what I said I didnt think to comment. You are absolutely right, and you have I think added substantively to the record with your comment.

    Thank you very much.

  19. Thanks Hank... Glad I make sense to someone periodically. Multi-core/parallelism has been a big personal area of interest for a long time. And I haven't found any silver bullets. I'd be interested in hearing from anyone who thinks we are evolving to a point here advances are getting closer.

  20. in b4 somebody reverse-engineers Grand Central and compiles it into the Linux kernel.

  21. Interesting thoughts here. Perhaps we need to look at things from a different point of view. Take cars, for example. At what point does one have 'enough' horsepower and torque? For a couple of years, I drove a RAM SRT/10 that had over 500hp and 500 ft./lbs. of torque. It was fun to drive, but there were rarely occasions where I could open it up. The gas mileage was awfully low, and the speeding tickets cost me some bucks. Of course you can tell that I have long since traded that one in.

    Now lets apply the same thoughts to computing. Lots of folks are buying inexpensive netbooks for web browsing and email. I guess that I'm rambling, but the there will be different fits for different applications.

    I think that the real revolution will come around when we all learn to look at things in a new way to keep innovation alive.


  22. I am no expert in programing OS, but i think the OS have been making a big mistake since a lot of time ago. Basically by trusting the fast development of newer and faster clock speed chips so have they wasted a lot of opportunities making their programs more efficients.

    There is a lot of instructions unuesd and acts like a tail in the x86 instructions that make the hole structure heavier than it should be.

    Am i right with this? Because can i still remember my old high school days when we compared the intel x86 based chips with the RISC based.

    Please excuse my poor english.


  23. Good comment by Pete. What is all this obsession with the latest and greatest processors? There are only a handful of consumer related apps that actually need more CPU - gaming, video and audio (DAW). It's like people have been brainwashed by a combination of Intel and Microsoft. I'm not mentioning Apple because they really don't seem too bothered about anything more than minor incremental CPU upgrades in their consumer machines now.

    Let's face it - any Core2Duo type chip from the past few years offers plenty of power for the majority of people - and I bet disk performance is the main bottleneck when people moan there computer is slow. On a Windows machine it's probably because your system is fragmented and loading a ton of cr*p on startup (anti-virus etc).

    I think Apple are absolutely doing the right thing with SL in trimming it down and optimising it, as it will run better on any computer how ever many cores. Reducing the size of the executables and compressing them means everything will load quicker - and the computer will feel much snappier. Things like Grand Central are great, but loads of cores just aren't necessary for most people - they will be wasted!

    Even Microsoft have finally realised people are questioning why they should have to keep on shelling out for new PCs when all they want to do is browse the web.

  24. I find it odd that at this late point in Snow Leopard development, with builds in the hands of so many developers, there is not a single leak or benchmark related to Grand Central.

  25. What a dweeb. Everything sucks because you suck. Everything rocks because I rock.

  26. Even if a 486 with DOS can keep up with my typing a document, I find comfort in the fact that once my serial processing is done, I could render texts, audio, images, video on different cores concurrently.

    And send it off online, while rendering a different angle on a 3D scene I worked on earlier, watching TV on the same machine, while-I-wait :)

  27. you are so so so wrong my friend. let me tell you about this little bit of technology called pure c, it will change the game. it utilizes hyper nano threading on not just every core, but on the perimeter as well as across the total chip. It stacks and restacks each thread within itself effectivley quadrupling each threads load capacity giving searing results. so next tiem do your homework. pure c remember it. its totally oil

  28. God you guys are dorks. Its a computer, who cares. If it works and looks pretty, thats all most people care about. I could give a flip if multicores are being utilized efficiently.

  29. Has anyone looked at their process list on their Mac lately?
    Just having the ability to run all background processes on processor A, and opening photoshop on my core B, wouldn't that make a speed increase that I would see everytime I run an app?


Note: Only a member of this blog may post a comment.