
| SoC drawer: The Cell Broadband Engine chip: High-speed offload for the masses | |||||||||||||||||||||||||||||||||||||||||||||||||||
| 摘自: IBM developerWorks Worldwide 被阅读次数: 582 | |||||||||||||||||||||||||||||||||||||||||||||||||||
由 yangyi 于 2007-05-14 21:57:04 提供 | |||||||||||||||||||||||||||||||||||||||||||||||||||
Level: Introductory Sam Siewert (Sam.Siewert@Colorado.edu), Adjunct Professor, University of Colorado 17 Apr 2007 Cell Broadband Engine™ (Cell/B.E.) chips are leading the broadband revolution in computing and provide the core silicon DNA for supercomputing, medical image processing, and many emergent applications, as worldwide connectivity and bandwidth continue to change the world we live in. This article explores the performance of application code on the Sony® PLAYSTATION 3®'s Cell Broadband Engine system running Yellow Dog Linux®. A simple program demonstrates how multithreaded applications that use the Synergistic Processing Elements to offload work can enjoy tremendous speedup. This article provides an overview of installing and using Yellow Dog Linux on the Sony PLAYSTATION 3 (PS3) to explore the capabilities of the Cell/B.E. processor. The PS3 provides an amazing low-cost platform for multithreaded data- and compute-intensive applications. It's very accessible and easy to program, and fun, too. In this article, I'll use the Cell/B.E. SDK to build a benchmarking application using POSIX threads that are mapped onto the processor's Synergistic Processing Elements (SPEs) over the Element Interconnect Bus (EIB). (See the Resources section below for a link to the SDK.) The Cell/B.E. processor is an advanced SoC design that provides 205 GFLOPS of performance running at 3.2GHz with an symmetric multithreaded Power Processing Element (PPE) and up to eight SPEs that can be used for offloading work. On the PS3, one SPE is dedicated to the built-in Sony GameOS, also known as the hypervisor, and one SPE is disabled to increase PS3 yield and lower cost. So, on the Yellow Dog Linux PS3 platform used in this article, the PPE has six SPEs to which it can offload over the EIB. I'll walk you through some Pthreads-based code that you can download and use to compare Cell/B.E. processor performance to most other multicore or symmetric multithreaded architectures. I think you'll find that, compared to most other options, the Cell/B.E. processor provides truly amazing offload performance at low cost and power -- especially on the PS3. Finally, I'll discuss the capabilities that the Cell/B.E. processor brings to systems and its potential for use in embedded or large, scalable clustered systems. The amazing capability of the Cell/B.E. processor will undoubtedly revolutionize many emergent applications in broadband, graphics, and high-performance computing (HPC), and is leading the way in SoCs joining mainstream computing. A few notes on getting your PS3 system going It is easy to get Yellow Dog Linux running on the PS3; however, I would like to point you to the resources I used and note a few insights that helped make my install painless and fun. Also, please note that the developerWorks Power Architecture technology zone includes many great PS3 Linux articles that helped me as well. (See the Resources section for links.)
Jumping into SPE offloading enthusiastically
The whole reason I was interested in getting a PS3 to run Linux was to see just how well the SPE offload in a Cell/B.E. system worked. Okay, I wanted to justify HD-quality Madden Football, too, but I'll guess that the SPE offload is what you're most interested in. Playing a game or two on the PS3, it's clear that there's some real compute power under the hood, but writing your own code is truly believing. The C code found in Listing 1 includes three basic benchmarks, using iterations of a 64-bit Fibonacci sequence:
Find the full source for this code in the Download section of this article. The code and makefile found in src1.zip is the first version of the code. It was written using POSIX threads only; src2.zip contains a revised version of the code that includes SPE offload. The exact code run, including the makefile, is provided. To build the PS3 SPE offload version, use A note on Pthreads scheduling and compiler optimization
Listing 1. Pthread used to evaluate sequential, PPE threaded, and SPE threaded performance
The test code used to benchmark the speedup provided by the six SPEs available on the PS3 uses the Cell SDK to embed code and to pass data to the SPEs through the EIB.
The Following this procedure is simple, but, on the downside, the parameters passed in with the code must be 16-byte aligned; they are also not type checked or otherwise checked to ensure consistency between the two Listing 2. The Fibonacci workload program for the SPEs
The code in Listing 2 is downloaded to each SPE and I first completed testing using the Pthreaded code just to take a look at the speedup provided by the PPE symmetric multithreading to the Fibonacci worker threads. If you download and run this code, the acceleration provided by the PPE SMT becomes apparent when the Fibonacci sequence is run for thousands of iterations or more. Basically, there is a point at which the overhead of thread creation and management is overcome by the speedup gained from threads being executed with SMT acceleration. I further tested this simple Pthread code on the PPE to see how it scales with an increasing number of threads. In general, the PPE SMT provides a constant speedup that is significant. Looking at PPE SMT-based speedup as a ratio of sequential time taken divided by Pthread time taken for each thread set will show you how speedup varies with number of threads. Speedup is fairly constant as threads are scaled. The PPE itself provides significant thread scaling, but it is intended to provide control and workload management for the SPEs, which provide much greater speedup for threads. You can best achieve huge performance advantages on Cell/B.E. systems by downloading thread code onto SPEs. You might expect that six SPEs would provide a speed improvement of about a factor of six (minus the overhead of code download and message passing), but I was pleasantly surprised to find an even greater speedup on my system. I suggest that you download the code and give it a try on the PS3 or any other Cell/B.E. system to measure speedup. I only tested the code on YDL on the PS3, but I would expect it to work on just about any Cell/B.E. platform that runs Linux. POSIX threads have been designed to be portable and the SPE offload uses the Cell/B.E. SDK. When the SMT of the PPE and pipelining is employed using the SPEs, I found that the speedup was greater than I thought it might be on my system. The EIB allows the PPE to start downloads on multiple SPEs and to overlap their starts and stops very efficiently so that the entire process is fully pipelined. Acceleration is also provided on the SPEs by vector processing features not found on the PPE, including 128-bit wide vector processing. So, not only can the PPE efficiently start overlapping execution on all SPEs, but the code would be expected to execute faster thanks to vector processing as well. The test code provided in this article will reveal the true power of Cell/B.E. processing for threaded applications, and to some extent, the vector processing capabilities of the SPEs. Cell/B.E. processors also include AltiVec accelerating operations for graphics, image processing, and digital signal processing that will further improve performance for applications that can employ these instructions. The code provided in this article is limited to integer operations. In future articles, I plan to take a closer look at floating point workloads.
The original and true promise of broadband to our planet
For me, using the PS3 to run Linux brings back memories of the early days of computing, when the world was filled with the promise of silicon-based revolution. Maybe it's the programming on a TV that reminds me of the early computers. Better yet, Cell/B.E. technology has been designed to help realize the full potential of Web-based networked computing with high-end graphics and video. The World Wide Web in its early days was seen as revolutionary in that it would surely lead to work at home, less travel, less commuting, less pollution, more global communication, and a flat world with fair e-commerce, and would ultimately serve to provide more efficiency and fairness in the world as a whole -- well, at least a few idealists like myself thought this. Cell/B.E. technology is exciting because the PS3 may be the lowest-cost, highest-performance computer ever provided to the general public. Sure, the PS3 is a costly game platform, but it can do far more than play games. The idea of Cell/B.E. chips as the DNA of the fastest computer available to the masses and the core of the fastest computer period (Blue Gene/L) is truly promising. With broadband transport becoming widely available to most of the world, a multicore processor to make good use of it has now finally also been introduced. As the world faces issues like global warming and political rifts, the emergence of new technology that can help us work more effectively at home, minimize grueling business travel, communicate better, have more fun, and get excited about computing again is a welcome sight. I have to wonder: Do current estimates of potential reductions in greenhouse gasses take into account the extent to which broadband might reduce commuting and global travel for business? Emergent new applications like telemedicine, effective high-definition video conferencing, and true virtual presence can help change the shape of things to come for the better. Either way, Cell/B.E. technology sure does help my threads run faster.
The PS3 is a great and relatively low-cost way to explore and evaluate the capability of Cell/B.E. technology -- and it's fun, too. It can also serve as a development platform for work on Cell-based HPC or embedded software and has the ability to serve as a great Linux platform at home. While it is short two SPEs compared to HPC/embedded Cell/B.E. platforms, it can host the same SDK and be used quite readily to develop SPE offloading code. It will be interesting to see how many PS3s wind up running Linux -- I suspect all will also be used as the game platforms they were intended to be, but the idea of a game system designed to host Linux from the beginning was in my opinion an excellent decision for both PS3 and Cell/B.E. technology. While Cell/B.E. chips and the PS3 may not solve global warming, they will keep a few people off the road and at home nights and weekends.
Learn
Get products and technologies
原文链接: http://www-128.ibm.com/developerworks/power/library/pa-soc12/?S_TACT=105AGX54&S_CMP=NLLX | |||||||||||||||||||||||||||||||||||||||||||||||||||
