Παραλληλισμός, νόμος του Moore και το τέλος της αθωότητας

vasvas · Post by **vasvas** » Thu Apr 26, 2007 12:22 am

http://www.gotw.ca/publications/concurrency-ddj.htm

Your free lunch will soon be over. What can you do about it? What are you doing about it?

The major processor manufacturers and architectures, from Intel and AMD to Sparc and PowerPC, have run out of room with most of their traditional approaches to boosting CPU performance. Instead of driving clock speeds and straight-line instruction throughput ever higher, they are instead turning en masse to hyperthreading and multicore architectures. Both of these features are already available on chips today; in particular, multicore is available on current PowerPC and Sparc IV processors, and is coming in 2005 from Intel and AMD. Indeed, the big theme of the 2004 In-Stat/MDR Fall Processor Forum was multicore devices, as many companies showed new or updated multicore processors. Looking back, it’s not much of a stretch to call 2004 the year of multicore.

And that puts us at a fundamental turning point in software development, at least for the next few years and for applications targeting general-purpose desktop computers and low-end servers (which happens to account for the vast bulk of the dollar value of software sold today). In this article, I’ll describe the changing face of hardware, why it suddenly does matter to software, and how specifically the concurrency revolution matters to you and is going to change the way you will likely be writing software in the future.

Arguably, the free lunch has already been over for a year or two, only we’re just now noticing.

The Free Performance Lunch
There’s an interesting phenomenon that’s known as “Andy giveth, and Bill taketh away.” No matter how fast processors get, software consistently finds new ways to eat up the extra speed. Make a CPU ten times as fast, and software will usually find ten times as much to do (or, in some cases, will feel at liberty to do it ten times less efficiently). Most classes of applications have enjoyed free and regular performance gains for several decades, even without releasing new versions or doing anything special, because the CPU manufacturers (primarily) and memory and disk manufacturers (secondarily) have reliably enabled ever-newer and ever-faster mainstream systems. Clock speed isn’t the only measure of performance, or even necessarily a good one, but it’s an instructive one: We’re used to seeing 500MHz CPUs give way to 1GHz CPUs give way to 2GHz CPUs, and so on. Today we’re in the 3GHz range on mainstream computers.

The key question is: When will it end? After all, Moore’s Law predicts exponential growth, and clearly exponential growth can’t continue forever before we reach hard physical limits; light isn’t getting any faster. The growth must eventually slow down and even end. (Caveat: Yes, Moore’s Law applies principally to transistor densities, but the same kind of exponential growth has occurred in related areas such as clock speeds. There’s even faster growth in other spaces, most notably the data storage explosion, but that important trend belongs in a different article.)

If you’re a software developer, chances are that you have already been riding the “free lunch” wave of desktop computer performance. Is your application’s performance borderline for some local operations? “Not to worry,” the conventional (if suspect) wisdom goes; “tomorrow’s processors will have even more throughput, and anyway today’s applications are increasingly throttled by factors other than CPU throughput and memory speed (e.g., they’re often I/O-bound, network-bound, database-bound).” Right?

Right enough, in the past. But dead wrong for the foreseeable future.

The good news is that processors are going to continue to become more powerful. The bad news is that, at least in the short term, the growth will come mostly in directions that do not take most current applications along for their customary free ride.

Over the past 30 years, CPU designers have achieved performance gains in three main areas, the first two of which focus on straight-line execution flow:

clock speed

execution optimization

cache

Increasing clock speed is about getting more cycles. Running the CPU faster more or less directly means doing the same work faster.

Optimizing execution flow is about doing more work per cycle. Today’s CPUs sport some more powerful instructions, and they perform optimizations that range from the pedestrian to the exotic, including pipelining, branch prediction, executing multiple instructions in the same clock cycle(s), and even reordering the instruction stream for out-of-order execution. These techniques are all designed to make the instructions flow better and/or execute faster, and to squeeze the most work out of each clock cycle by reducing latency and maximizing the work accomplished per clock cycle.

Chip designers are under so much pressure to deliver ever-faster CPUs that they’ll risk changing the meaning of your program, and possibly break it, in order to make it run faster

Brief aside on instruction reordering and memory models: Note that some of what I just called “optimizations” are actually far more than optimizations, in that they can change the meaning of programs and cause visible effects that can break reasonable programmer expectations. This is significant. CPU designers are generally sane and well-adjusted folks who normally wouldn’t hurt a fly, and wouldn’t think of hurting your code… normally. But in recent years they have been willing to pursue aggressive optimizations just to wring yet more speed out of each cycle, even knowing full well that these aggressive rearrangements could endanger the semantics of your code. Is this Mr. Hyde making an appearance? Not at all. That willingness is simply a clear indicator of the extreme pressure the chip designers face to deliver ever-faster CPUs; they’re under so much pressure that they’ll risk changing the meaning of your program, and possibly break it, in order to make it run faster. Two noteworthy examples in this respect are write reordering and read reordering: Allowing a processor to reorder write operations has consequences that are so surprising, and break so many programmer expectations, that the feature generally has to be turned off because it’s too difficult for programmers to reason correctly about the meaning of their programs in the presence of arbitrary write reordering. Reordering read operations can also yield surprising visible effects, but that is more commonly left enabled anyway because it isn’t quite as hard on programmers, and the demands for performance cause designers of operating systems and operating environments to compromise and choose models that place a greater burden on programmers because that is viewed as a lesser evil than giving up the optimization opportunities.

Finally, increasing the size of on-chip cache is about staying away from RAM. Main memory continues to be so much slower than the CPU that it makes sense to put the data closer to the processor—and you can’t get much closer than being right on the die. On-die cache sizes have soared, and today most major chip vendors will sell you CPUs that have 2MB and more of on-board L2 cache. (Of these three major historical approaches to boosting CPU performance, increasing cache is the only one that will continue in the near term. I’ll talk a little more about the importance of cache later on.)

Okay. So what does this mean?

A fundamentally important thing to recognize about this list is that all of these areas are concurrency-agnostic. Speedups in any of these areas will directly lead to speedups in sequential (nonparallel, single-threaded, single-process) applications, as well as applications that do make use of concurrency. That’s important, because the vast majority of today’s applications are single-threaded, for good reasons that I’ll get into further below.

Of course, compilers have had to keep up; sometimes you need to recompile your application, and target a specific minimum level of CPU, in order to benefit from new instructions (e.g., MMX, SSE) and some new CPU features and characteristics. But, by and large, even old applications have always run significantly faster—even without being recompiled to take advantage of all the new instructions and features offered by the latest CPUs.

That world was a nice place to be. Unfortunately, it has already disappeared.

--> Διαβάστε το υπόλοιπο στο σύνδεσμο http://www.gotw.ca/publications/concurrency-ddj.htm

Και αρχίστε να διαβάζετε για παράλληλο προγραμματισμό. (και τα GPUs έχουν ψωμί επίσης -- δείτε και το άρθρο που έστειλε ο rapadder.)

AmmarkoV · Post by **AmmarkoV** » Fri Apr 27, 2007 2:54 pm

Vasvas++

Ωραίο άρθρο..

AmmarkoV · Post by **AmmarkoV** » Fri Jun 22, 2007 2:48 am

Πολύ ενδιαφέρον αρθράκι , πάνω στο θέμα

Quietly but surely, we are heading into a new computing era that will bring one of the most dramatic changes the IT industry has seen. Acceleration technologies will inject lots of horsepower into the CPU, increasing the performance capability of the microprocessor not just by 10 or 20%, but in some cases by up to 100x. These new technologies, which are expected to become widely available through heterogeneous multi-core processors, create challenges for software developers – but Intel claims to have found a way to make the transition easy.

AnINffected · Post by **AnINffected** » Sun Jun 24, 2007 3:50 am

Πολύ ενδιαφέρον το πρώτο άρθρο.
Το δεύτερο θα το κοιτάξω προσεχώς...

I’ve come across many teams whose application worked fine even under heavy and extended stress testing, and ran perfectly at many customer sites, until the day that a customer actually had a real multiprocessor machine and then deeply mysterious races and corruptions started to manifest intermittently

Αυτό σημαίνει οτί, για να testάρουμε ικανοποιητικά ένα multi-threaded πρόγραμμα, πρέπει να έχουμε περισσότερους/πολυ-πύρηνους επεξεργαστές;

AmmarkoV · Post by **AmmarkoV** » Sun Jun 24, 2007 4:05 am

αχά

AnINffected · Post by **AnINffected** » Sun Jun 24, 2007 11:47 am

Αυτο σημαίνει "ναι";

Την πάτησα τότε!
Δεν πειράζει, ας μάθω τα βασικά προς το παρόν στον πτωχό μονοπύρηνο μου, και για σοβαρές εφαρμογές (που λέει ο λόγος

) έχει ο Θεός...