Unit X: Multicore Computers - Computer Architecture - BCA Notes (Pokhara University)

Breaking

Saturday, June 20, 2020

Unit X: Multicore Computers - Computer Architecture

Introduction of Multicore Computers:

A multicore computer combines two or more processors on a single computer chip. The multicore architecture poses challenges to software developers to exploit the capability for multi-threading across multiple cores. The main variables in a multicore organization are the number of processors on the chip, the number of levels of cache memory, and the extent to which cache memory is shared.

A multicore computer, also known as a chip multiprocessor, combines two or more processors (called cores) on a single piece of silicon (called a die). Each core consists of all the components of an independent processor, such as registers, ALU, pipeline hardware, and control unit, plus L1 instruction and data caches. In addition to the multiple cores, contemporary multicore chips also include L2 cache and, in some cases, L3 cache.

Hardware Performance Issues:

Microprocessor systems have experienced a steady, exponential increase in execution performance. This increase is due to refinements in the organization of the processor on the chip, and the increase in the clock frequency.
Multicore Computers Hardware Performance Issues Increase in Parallelism Power Consumption Software Performance Issues Software on Multicore Architect

Increase in Parallelism:

The organizational changes in processor design have primarily been focused on increasing instruction-level parallelism, so that more work could be done in each clock cycle.

1. Pipelining:

Individual instructions are executed through a pipeline of stages so that when one instruction is executing in one stage of the pipeline, another instruction is executing in another stage of the pipeline.

In the case of pipelining, simple three-stage pipelines were replaced by pipelines with five stages, then many other stages, with implementations having over a dozen stages. There is a practical limit to how far this trend can be taken, because with more stages, there is the need for more logic (hardware), more interconnections and more control signals.

2. Superscalar:

Multiple pipelines are constructed by replicating execution resources. This enables parallel execution of instructions in parallel pipelines, so long as hazards are avoided.

With the superscalar organization, performance increase can be achieved by increasing the number of parallel pipelines. There are limitations, as the number of pipelines increases. More logics are required to manage hazards and to stage instruction resources.

A single thread of execution reaches the point where hazards and resource dependencies prevent the full use of the multiple pipelines available. As the complexity of managing multiple threads over a set of pipelines limits the number of threads and the number of pipelines that can be effectively utilized.

3. Simultaneous Multithreading (SMT):

Register banks are replicated so that multiple threads can share the use of pipeline resources. The increase in complexity to deal with all of the logical issues related to very long pipelines, multiple superscalar pipelines, and multiple SMT register banks means that increasing amounts of the chip area is occupied with coordinating and signal transfer logic. This increases the difficulty of designing, fabricating, and debugging the chips. Power issues are the other big challenges.

Alternative Chip Organizations:

Power Consumption:

To maintain the higher performance, the number of transistors per chip rise and high clock frequencies. Unfortunately, power requirements have grown exponentially as chip density and clock frequency have risen up. One way to control power density is to use more of the chip area for cache memory. Memory transistors are smaller and have power density in the order of magnitude lower than that of logic.

Multicore Computers Hardware Performance Issues Increase in Parallelism Power Consumption Software Performance Issues Software on Multicore Multicore Organization

The figure above shows where the power consumption trend is leading. Assuming about 50% - 60% of the chip area is devoted to memory, the chip will support cache memory of about 100 MB and leave over 1 billion transistors available for logic.

In recent decades the Pollack’s rule was observed, which states that performance increase is roughly proportional to the square root of the increase in complexity. If we double the logic in a processor core, it delivers only 40% more performance. The use of multiple cores has the potential to provide near-linear performance improvement with the increase in the number of cores.

Power considerations provide another motive for moving toward a multicore organization. The chip has such a huge amount of cache memory, it becomes unlikely that anyone thread of execution can effectively use all that memory. In SMT, a number of relatively independent threads or processes have a greater opportunity to take full advantage of the cache memory.

Software Performance Issues:

Software in Multicore:

The potential performance benefits of a multicore organization depend on the ability to effectively exploit parallel resources available to the application. Amdahl’s law state that,

Multicore Computers Hardware Performance Issues Increase in Parallelism Power Consumption Software Performance Issues Software on Multicore Multicore Organization

The law assumes a program in which a fraction (1 – f) of the execution time involves code serial and a fraction ‘f’ that involves code is infinitely parallelizable with no scheduling overhead. A number of classes of applications benefits directly from the ability to scale throughput with the number of cores.

1. Multithreaded Applications:

Multithreaded applications are characterized by having a small number of highly threaded processes. Examples of threaded applications include Lotus Domino or Siebel CRM (Customer Relationship Manager).

2. Multiprocessor Applications:

Multiprocessor applications are characterized by the presence of many single-threaded processes. Examples of multiprocessor applications include Oracle Database, SAP, and PeopleSoft.

3. Java Applications:

Java language greatly facilitates multithreaded applications. Java Virtual Machine is a multithreaded process that provides scheduling and memory management for Java applications. Java applications that can benefit directly from multicore resources include application servers such as Sun’s Java Application Server, BEA’s Weblogic, IBM’s Websphere, etc. All applications that use a Java 2 platform, Enterprise Edition (J2EE platform) application server can immediately benefit from multicore technology.

4. Multi-instance Applications:

Even if an individual application does not scale to take advantage of a large number of threads, it is still possible to gain advantage from multicore architecture by running multiple instances of the application in parallel. If multiple application instances require some degree of isolation, then virtualization technology can be used to provide each of them with its own separate and secure environment. 

Multicore Organization:

The main variable in the multicore organization are as follows:

  1. The number of core processors on the chip.

  2. The number of levels of cache memory.

  3. The amount of cache memory that is shared.

The Above figure shows an organization found in some of the earlier multicore computer chips and still found in embedded chips. Here, only on-chip is L1 with each core having its dedicated L1 cache. L1 cache is divided into instruction and data caches. An example of this organization is the ARM11 MPCore.

The above figure shows that this organization is also one in which there is no on-chip cache sharing there is enough area available on the chip to allow for L2 cache. An example of this organization is the AMD Opteron.

The above figure shows a similar allocation of chip space to memory, but with the use of shared L2 cache. The Intel Core Duo has this organization.

The above figure shows that the amount of cache memory available on the chip continuous to grow to have a shared L3 cache with dedicated L1 and L2 for each core processor. The Intel Core i7 is an example of this organization.

An advantage of having only dedicated L2 caches on the chip is that each core enjoys more rapid access to its private L2 cache. As both the amount of memory available and the number of cores grows, the use of a shared L3 cache combined with either a shared L2 cache or dedicated per core L2 caches provides better performance than simply a massive shared L2 cache.

Another organizational design decision in a multicore system is whether the individual core will be Superscalar or will implement Simultaneous Multithreading (SMT). For example, the Intel Core Duo uses Superscalar cores, whereas the Intel Core i7 uses SMT cores. SMT scales up the number of hardware-level threads that the multicore system supports. As software is developed to fully exploit parallel resources, an SMT approach appears to be more attractive than a Superscalar approach.

Dual-Core and Quad-Core Processor:

1. Dual-Core Processor:

A dual-core processor is a CPU with two processors or "execution cores" in the same integrated circuit. Each processor has its own cache and controller, which enables it to function as efficiently as a single processor. However, because the two processors are linked together, they can perform operations up to twice as fast as a single processor can.

The Intel Core Duo, the AMD X2, and the dual-core PowerPC G5 are all examples of CPUs that use dual-core technologies. These CPUs each combine two processor cores on a single silicon chip. This is different than a "dual processor" configuration, in which two physically separate CPUs work together. However, some high-end machines, such as the PowerPC G5 Quad, use two separate dual-core processors together, providing up to four times the performance of a single processor.

While a dual-core system has twice the processing power of a single-processor machine, it does not always perform twice as fast. This is because the software running on the machine may not be able to take full advantage or both processors. Some operating systems and programs are optimized for multiprocessing, while others are not. Through programs that have been optimized for multiple processors will run especially fast on dual-core systems, most programs will see at least some benefit from multiple processors as well.

2. Quad-Core Processor:

A quad-core CPU has four processing cores in a single chip. It is similar to a dual-core CPU but has four separate processors (rather than two), which can process instructions at the same time.

Quad-core CPUs have become more popular in recent years as the clock speeds of processors have plateaued. By including multiple cores in a single CPU, chip manufacturers can generate higher performance without boosting the clock speed. However, the performance gain can only be realized if the computer's software supports multiprocessing. This allows the software to split the processing load between multiple processors (or "cores") instead of only using one processor at a time. Fortunately, most modern operating systems and many programs provide support for multiprocessing.

Some examples of quad-core CPUs include the Intel Core 2 Quad, Intel Nehalem, and AMD Phenom X4 processors. The Intel processors are used in Mac, Windows, and Linux systems, while the AMD processors are only used in Windows and Linux systems. While four cores may seem impressive, some high-end computers have two quad-core CPUs, giving them a total of eight processing cores.

Difference:

Dual-core

Quad-core

A dual-core processor as suggested by the name implies a CPU having two processors.

Quad-core is a multi-core processor that means a CPU having four processors.

Not as fast in terms of speed.

Considered to perform quicker.

Consume less power/ energy since they have only two core processors.

Consume more power/energy since they have four core processors.

Dual-core cannot perform many tasks simultaneously.

Quad-core can perform many tasks simultaneously.

Do not heat the gadget.

Cause a great deal of heat and hence heat the gadget.

Dual-core processors lack a good graphic.

Quad-core processors have superior and high-quality graphics.

Power Efficient Processors:

Power management has become a major issue in the design of multi-core chips. There are many negative effects that result from increasing power consumption such as unstable thermal properties of the die and hence affecting the system performance which makes power consumption issues sometimes more important than speed. An important observation is that threads running on different cores do not need the same power all time to execute at high performance. There are some waiting times due to memory read/write operations for example which require saving unnecessary processing power. So, to achieve a good balance between scalar performance/throughput performance and power it is essentially required to dynamically vary the amount of power used for processing according to the temporal analysis of the code needs.

Developed power management techniques can be classified into two main categories: reactive and predictive.

In reactive techniques, the technique reacts to performance changes in the workload. In other words, a workload may initially have states that need high performance, others of I/O waits, and low performance. When the state of the workload changes, the technique reacts to that change accordingly. However, there might be some lag between workload phase changes and power adaptation changes which may lead to states of either in-efficient energy consumption or performance degradation.

On the other hand, predictive techniques, for example, overcome this issue. Those techniques predict phase changes in the workload before they happen, and hence act immediately before a program phase changes. That leads to optimal energy-saving and performance results. However, there is no workload that can be fully predicted, so reactive techniques are used for portions that cannot be predicted (which is usually more than 60% of the entire workload). So, reactive techniques are inevitable to use and consequently we concentrate on this study on those techniques.

No comments:

Post a Comment

If you have any doubt, then don't hesitate to drop comments.