Introduction
Right now we are in the middle of a paradigm shift of software development, a change of the same magnitude as the object-oriented programming paradigm in the 1990s. Even though the classes and objects of the 1990s seemed a bit strange when we first heard of them, they are like a clear summer sky compared to the cloud of confusion that is programming models of concurrency computing.
Concurrent (or parallel) programming has been around for many decades in areas such as high-performance and cluster computing. It isn't something new, but it is more relevant then ever these days. The reason for this is multi-core processors. On a modern PC you have two or four cores. Intel recently presented a research prototype with 80 cores! Multi-core is the path processor manufacturers (e.g. Intel and AMD) have chosen for years to come.
Back in the days, everything just got faster with each successive processor generation. Those were the days of the "free lunch". In 2003 this development abruptly ceased. To harvest the performance potential of modern multi-core processors, a programmer has to develop parallel programs - programs that express units of work that can execute in parallel.
Background
During the last thirty years of CPU development better performance has been achieved in three main areas:
- clock speed
- execution optimization
- cache
Increasing the clock speed is about getting more cycles, i.e. doing the same work faster.
Execution optimization is about doing more work per cycle. Different technologies have been developed for this, e.g. pipelining, branch prediction, out-of-order execution, etc. All this techniques try to minimize internal idle states and delays in the processor.
Using caches on a CPU is about staying away from RAM. Write and read operations on RAM are slow - using a cache reduces the average time to access memory. The CPU cache is a smaller and faster memory which stores copies of the data from the most frequently used main memory locations.
These traditional areas of performance boosting have more or less come to an end. In 2003 the exponential increase in clock speeds suddenly halted. The reason is several physical issues, i.e. heat problems and too high power consumption. When it comes to execution optimization the chip designers today deploy so many and nasty tricks in order to get more speed out of each cycle that they almost risk the semantics of your code.
The clock race is definitely over.
Although we can expect some performance boosting from a continuing growth of on-die cache sizes, the main development is going to be more and more cores.
Problems with parallel programming
Today there are some major problems with the parallel programming model (or models as we are about to see):
- Too many flavours.
- Concurrency is hard!
- Lack of good tools.
Although parallel programming has been around for decades there is yet no silverbullit. There are literary hundreds of parallel programming languages and APIs out there: Pthreads, Windows threads, Intel Threading Building Blocks (TBB), OpenMP, Ct, GPGPU, Unified Parallel C, Erlang, to name a few.
Concurrency really is hard. First we have the problem with finding the parallelism in the first place. When you read texts about parallel programming you see examples like plotting a Mandelbrot, problems that are very easy to chunk up in a large number of parallel tasks. In real life with large and complex software, it is not that easy.
There are also a new set of problems and potential bugs emerging with the parallel programming model: race conditions, deadlocks, livelocks to name a few. These bugs can be very nasty and hard to find.
Suggestions for master thesis tasks
There is no detailed problem statement or specific task at this point. This is something that can be discussed together with the student. ÅF has an interest in gaining knowledge about parallel programming, existing programming models and tools.
The thesis will probably be on the form of an investigation of the current situation in this field. It can be a general approach or a more specific task. Some examples or suggestions:
- Compare the most common parallel programming APIs (e.g. Threads (GNU or Windows), OpenMP and TBB). What are the differences? What are the pros and cons of the different APIs?
- Performance comparison of different parallel computing techniques (e.g. using GPGPU with a NVIDIA or ATI GPU compared to a quad-core Intel processor using a standard parallel programming API). Is GPGPU a practical approach to parallel programming or is it just an interesting theory?
- Investigate how multi-core processors and concurrency is used today among developers. Is there a demand among customers? Are many applications in need of the possible performance boost from multi-core processors or is it a demand invented by semiconductor and processor manufacturers?
- Why is parallel programming hard? What can be done and what is currently done in order to simplify it? What are the big processor manufacturers doing in order to spread the concept of multi-core among developers?