Too many cores, not enough brains

A number of comments on my last post suggested I work on multi-core parallelism. There are a number of reasons why I am steering clear of that topic.

First of all, it seems everyone is doing it, and I hate crowds. Secondly, I was heavily involved in the early days of multithreading and transactions back in the 70’s and 80’s. Been there, done that, got the Lucite trophy.

More importantly, I believe the whole movement is misguided. Remember that we already know how to exploit multicore processors: with now-standard multithreading techniques. Multithreaded programming is notoriously difficult and error-prone, so the challenge is to invent techniques that will make it easier. But I just don’t see vast hordes of programmers needing to do multithreaded programming, and I don’t see large application domains where it is needed. Internet server apps are architected to scale across a CPU farm far beyond the limits of multicore. Likewise CGI rendering farms. Desktop apps don’t really need more CPU cycles: they just absorb them in lieu of performance tuning. It is mostly specialized performance-intensive domains that are truly in need of multithreading: like OS kernels and database engines and video codecs. Such code will continue to be written in C no matter what.

There are only two groups that want multicore. One is the computer hardware industry, which has become addicted to rapid obsolescence, and whose sales will fall if computers start lasting more than a few years. The other group is researchers looking for a topic. Multicore programming is an irresistible bait for researchers: it is a hard puzzle, with quantifiable results, and a shot at changing the world. Unfortunately it is turning out that there is little quantifiable difference between the plethora of alternatives, leaving it as a matter of subjective judgment, which they hate. I already see signs that the multicore research frenzy has peaked.

That said, Subtext does suggest a transactional model of computation related to some of the recent research. But parallelism is just a possible side-benefit. The point is to allow the programmer to express their intent without having to mentally compile a linear execution schedule. Performance is not the problem. Write that on the board 100 times.


Update: It looks like I really stepped in it this time, so let me clarify my position.

Giving a multicore CPU to a programmer is like giving a drink to an alcoholic. Saint Knuth taught us that premature optimization is the root of all evil. I’m here to tell you that virtually all of programming is premature optimization. Once, just once, try to write a program without ever thinking about performance. I know I can’t. We have this fatal attraction to performance problems: they are just so shiny and clear-cut: a chance to be clever with a well-defined problem. The other issues we grapple with in programming are so vague and ambiguous and ultimately unknowable. But those are the essential issues, and thinking about performance is a distraction from them.

Inventing new programming languages and techniques to get better performance on multicore chips goes completely in the wrong direction. We should be figuring out how we could improve programming given unbounded performance. Then figure out how to optimize it. I have come to believe that our programming languages and design patterns are all a giant case of premature optimization. So maybe we need multicore, and maybe we need to complicate programming as a result, but it is just digging ourselves deeper into the hole we’re in, and should not be celebrated.


Update: see rebuttal in next post.

31 Replies to “Too many cores, not enough brains”

  1. I agree performance is not a problem. People read about what Fortune x00 companies or websites with millions of daily visitors do to scale, and think they need to copy it. You’re going to get a lot of apathy or confusion, for this reason.

    However, performance wasn’t my (intended) point. Set-based feature dispatch is fundamentally superior to element-based feature dispatch.

    Database engines aren’t likely to be taken over by Subtext, for sure. Every major DB vendor now pretty much has figured out how to scale at least to 4 processors with perfect parallelization. As Stonebraker points out in his VLDB’07 paper on H-Store, memory is the wall, not programming languages. (Although Stonebraker did mention we might see a shift in the design of languages that act as the human interface to the query planner. I disagree here, and its obvious from the lack of attention CJ Date’s and Hugh Darwen’s has gotten that new languages are not welcome; SQL is the lingua franca and niche expression languages like SPARQL recrystallize around it (SPASQL).) Even here, you are missing the ball if you think I’d want to use Subtext to write a db engine. However, could subtext be used to help model the heuristics for determining when to hash join, loop join or merge join? In other words, planning algorithms. These were studied a lot in the 80s, of course, but I don’t know of any one who reasons about heuristics using visual languages?

    OS kernels could use visual modeling, but probably not a whole-sale new language where everything is written in it. This is in spite of the fact C is not always the right language for the problem, merely good enough with no suitable alternative. Helping programmers better see dependencies is important. I’m doubt I’d want to use a tool like Subtext to reason about tiny performance gains, though.

  2. (A) Mobile devices are the future
    (B) Mobile devices need low power
    (C) Battery technology continues to improve at a relatively slow rate
    (D) Power consumption scales approx. expontentially with clock-speed
    (E) Some mobile applications have legitimate high performance needs

    –> a mobile device with multiple cores, and power management can meet user requirements in terms of battery life and performance that a single core device cannot

    [You are right. Mobile is a throwback to the bad old days of memory and CPU constraints. I did my time in that prison. They won’t take me back alive. – Jonathan]

  3. Actually, performance is a problem.Depends in what field you put your computer to work.For trivial tasks such as running Office, performance might not be a problem but try thinking about multiplying 4000×4000 matrices in Matlab repeatedly in the same script, doing some transforms and then passing data through some neural networks and you’ll soon be very grateful you have multithreading capabilities.

  4. I suggested GPU programming, not multi-core programming. The two are quite different. Personally I think that CSP-style message passing is the best approach for multi-core programming but GPUs require a quite different approach.

    I think that assuming that performance is not a problem is misguided, especially if you are going to concentrate on web programming. Writing a web application is VERY VERY EASY, as long as you ignore scalability issues. The difficulty of writing web applications comes from making them perform well under high load, dealing with huge numbers of concurrent requests, and still being reliable and appearing to be consistent.

  5. Klaus Haundjhauben | 8/25/2008 at 1:54 am | Permalink
    (A) Mobile devices are the future
    (B) Mobile devices need low power

    The above are hefty assumptions. They require (A) to be true, and a lack of progression on the battery part.

  6. I beg to differ.

    The only reason that “Internet server apps are architected to scale across a CPU farm” is that we have no other choice. It winds up meaning that we either have to buy a stupid arse large server to be the database, or we don’t have any form of real time communication between the people hitting the site.

    No, we really do need concurrency that normal programmers can work with. Thankfully we have both Haskell with STM and Erlang with Actors. Otherwise we’d be in serious trouble right about now.

  7. *applauds*

    We need more people posting more comments like these. Keep at it, and don’t be defensive: your points are clear and judicious.

  8. You forget another reason why farms are necessary. You computer and hard drives only have so much I/O. CPU power is great, but you can still only read and write disks at about 100MB/s streaming. That is slow. Add some random access in there and your down below 10MB/s. I/O is currently the great bottleneck. Even RAM is slow, anything outside of your 2-12MB of cache is slow. The memory hierarchy is getting deeper.

    So, to repeat, CPU speed is inconsequential when you cannot feed the CPU enough data.

  9. There is a niche of high performance applications that already use scalable multithreading for performance, for instance games. Those will take advantage of the multicore to the max.

    Most desktop apps, at least at present, don’t do tasks that are easily parallelizable. They still require multithreading, but for a different reason–latency.

    But real applications of multicores are the ones we don’t even anticipate. Buid a massive multicore and they will come!

  10. Here are the points, in all their silliness.

    1. First of all, it seems everyone is doing it, and I hate crowds.

    2. I believe the whole movement is misguided.

    3. There are only two groups that want multicore [neither of them include me].

    4. Performance is not the problem [in my programs].

    5. I’m here to tell you that virtually all of programming is premature optimization.

    6. Inventing new programming languages and techniques to get better performance on multicore chips goes completely in the wrong direction. We should be figuring out how we could improve programming given unbounded performance.

    Notwithstanding the fact that most off of this is opinion, how is 6 anything but contradictory? You don’t want a language to give you unbounded performance (scaled linearly by hardware) but you want to figure out how to get unbounded performance. Riiight.

  11. The biggest problem with multi-core _CPU’s_, IMHO, is they don’t scale across application load. As the entry notes, there are some well defined end-user applications that can make good use of multi-core machines, but in general, users simply don’t run applications that scale well. Once the machine’s basic housekeeping load no longer fills a core, it’s unlikely that adding more than another or two will have any effect on user-perceived performance, at all.

    Worse, the only problems that multi-core solves at the other end of the scale are those problems that can be solved with multiple cores. That sounds redundant, but it’s not. It’s simply the fact that we concentrate on those problems because that’s a hardware direction we can move it.

    But if your goal, like most people’s, is to make a single application run faster, cores don’t help at all. Threading something like Excel or Access is _extremely_ difficult, yet these are the programs that most people want to run faster. Even in the specialist fields, there are a huge number of real-world problems that simply don’t scale that way, and what we really need is a single very much faster core.

    That’s why I found this so interesting:

    http://en.wikipedia.org/wiki/Explicit_Data_Graph_Execution

  12. Yeah, the bulk of the gaming industry, so desperate for every clock cycle that isn’t devoted to graphics (AI (pathfinding), real-time physics, more AI (speech and gestural recognition), and a bunch more AI (intelligent avatars)) that’s just a niche.

    [Well, yes. Game engines, as opposed to game content, are a very specialized niche. – Jonathan]

  13. @Klaus Haundjhauben

    Although a research result demonstrating even 10% improvement in energy efficiency would absolutely make it into Journal of Power Sources, you aren’t listing applications or functionality in mobile devices that drain battery life prematurely. I’ll help you out: the BIG ones are encryption (wireless) and encoding/decoding (media). For encryption to be fast on an embedded device, hand-coded assembly by a human optimization expert is ideal. Also, subtext probably wouldn’t help better understand things like secure hash functions, where the goals (such as taking one bit as input and distributing it randomly across n ouput bits) aren’t mathematically well-understood. And media partly comes down to needing better codecs, and not using a small resolution display to push a high resolution movie to the screen.

    The best way to push mobile devices forward is through capitalism. R&D seeks the wallets of capitalistic entrepreneurs.

    @Darrell Wright

    Correct. These are still von Neuman machines, they’re just lots-in-boxes.

    @Jonathan Edwards
    @Added some clarifying remarks.

    You can’t clarify for people who are apathetic or confused. In the words of Phil Greenspun, “What most people need is someone to tell them what to do. It is cheaper and better to adopt a false religion than to remain a skeptical atheist, seeking after truth oneself. Of course, it is better and cheaper still to adopt a reasonable religion.” (For those of you wondering the source: http://philip.greenspun.com/wtr/vignette.html )

  14. Bartosz Milewski: “Most desktop apps, at least at present, don’t do tasks that are easily parallelizable. They still require multithreading, but for a different reason–latency.”

    I think that in the future the only desktop apps in existence will be those that do need a lot of local performance. Data storage, editing and processing is moving off the desktop and into networked services. What’s left? Media processing and games, which require parallel number crunching.

  15. I think Darrell Wright nailed it — the real performance barrier isn’t the CPU, it’s…everything else. It used to be we optimized code to avoid I/O, because it was so slow. Now, we optimized everything so it fits in cache, because the cache:memory performance gap is as large as the memory:disk gap used to be. It’s also the reason why we scale across machines — once all your cores are stuck in I/O wait, you need another machine (with another disk subsystem) to take the input. Sorry, no opportunities for Subtext, unless you can come up with a language that somehow encourages multiplexing I/O…

  16. I would agree with you, If I was a programmer that was going to spend the rest of his life making user interfaces and not doing anything with any kind of performance bound.

  17. the idea that disk is the limit and that CPUs are free and that most applications do not need to be parallelized is certainly true in the “niche” of server and desktop computing.

    However, move into the embedded world, especially line-rate processing of data streams that forms the basis of the mobile phone systems, the core network, the internet, and other inconsequential parts of our lives, and the picture is very very different. The kinds of processing performance that you need to do 4G/LTE mobiles is staggering.

    Also, robotics control and automotive control systems are hitting the wall of single-thread performance in many cases and has to be parallelized in some way.

    At the hard power ceilings of 2W, 9W, 30W, the only way to more available instruction cycles is to go massively multicore to keep clock speeds down. It is the Cray problem of harnessing 1024 chickens rather than 2 oxen…

    That CPU and memory are constrained is not “the bad old days”, it is reality today. And hopefully also in desktops and servers tomorrow, as that offers a great chance to reduce the aggregate power consumption of our computing infrastructure. More efficient code in the end saves natural resources and helps avert global warming… to take the argument to its extreme.

Comments are closed.