
| Assembly language for Power Architecture, Part 1: Programming concepts and beginning PowerPC instructions | ||||||||||||||||||||||||
| 摘自: IBM developerWorks Worldwide 被阅读次数: 680 | ||||||||||||||||||||||||
由 yangyi 于 2007-04-28 18:36:41 提供 | ||||||||||||||||||||||||
Level: Intermediate Jonathan Bartlett (johnnyb@eskimo.com), Director of Technology, New Medio 03 Oct 2006 The POWER5™ processor is a 64-bit workhorse used in a variety of settings. Starting with this introduction to assembly language concepts and the PowerPC instruction set, this series of articles introduces assembly language in general and specifically assembly language programming for the POWER5. The POWER5 and other PowerPC processor families The POWER5™ processor is the latest processor in a line of high-powered processors supporting the PowerPC® instruction set. The first 64-bit processor in this line of processors was the POWER3. The Macintosh G5 processor was an extension of a POWER4 processor with an additional vector processing unit. The POWER5 processor is the latest generation POWER processor, having both dual-core and symmetric multithreading capabilities. This allows a single chip to process four threads simultaneously! Not only that, each thread can execute a group of up to five instructions every clock cycle.
The PowerPC instruction set is used on a wide variety of chips from IBM and other vendors, not just the POWER line. It is used in server, workstation, and high-end embedded scenarios (think digital video recorders and routers, not cell phones). The Gekko chip is used in Nintendo's GameCube and the Xenon is used in the Microsoft Xbox 360. The Cell Broadband Engine is the up-and-coming architecture using the PowerPC instruction coupled with eight vector processors. The Sony PlayStation 3 will use the Cell, as well as numerous other vendors considering it for a wide variety of multimedia applications. As you can see, the PowerPC instruction set is useful far beyond the POWER processor line. The instruction set itself can operate in either a 64-bit mode or a reduced 32-bit mode. The POWER5 processor supports both, and Linux distributions on POWER5 support both applications compiled for 32-bit and 64-bit PowerPC instruction sets Getting access to a POWER5 processor All current IBM iSeries and pSeries servers use POWER5 processors and can run Linux. In addition, open source developers can request access to POWER5 machines for porting applications through IBM's OpenPower Project (see Resources for a link). Running a PowerPC distribution on a G5 Power Macintosh will give you access to a slightly modified POWER4 processor, which is also 64-bit. G4s and earlier are only 32-bit. Debian, Red Hat, SUSE, and Gentoo all have one or multiple distributions supporting the POWER5 processor, with Red Hat Enterprise Linux AS, SUSE Linux Enterprise Server, and OpenSUSE being the only ones supporting the IBM iSeries of servers (the rest supporting IBM pSeries of servers). High level versus low level programming Most programming languages are fairly processor-independent. While they may have specific features that rely on certain processor abilities, they are more likely to be operating-system-specific than processor-specific. These high-level programming languages are built for the express purpose of providing distance between the programmer and the hardware architecture. This is for several reasons. While portability is one of them, probably more important is the ability to provide a friendlier model that is geared more towards how programmers think as opposed to how the chip is wired. In assembly language programming, however, you are working directly with the processor's instruction set. This means that you have essentially the same view of the system that the hardware does. This has the potential to make assembly language programming more difficult because the programming model is geared towards making the hardware work instead of closely mirroring the problem domain. The benefits are that you can do system-level work easier and perform optimizations that are very processor-specific. The drawbacks are that you actually have to think on that level, you are tied to a specific processor line, and you often have to do a lot of extra work to get the problem domain accurately modeled. One nice thing about assembly language that most people don't think about is that it is very concrete. In high-level languages, there is a lot going on with every expression. You sometimes have to wonder just what is occurring under the hood. In assembly language programming, you can have a full grasp of exactly what the hardware is doing. You can step through the hardware-level changes every step of the way. Fundamentals of assembly language Before getting into the instruction set itself, the two keys to understanding assembly language are understanding the memory model and understanding the fetch-execute cycle. The memory model is very simple. Memory stores only one thing -- numbers with a limited range called a byte (on most computers, this is a number between 0 and 255). Each memory location is located using a sequential address. Think of a giant roomful of post-office boxes. Each box is numbered, and each box is the same size. This is the only thing that computers can store. Therefore, everything must ultimately be structured in terms of fixed-range numbers. Thankfully, most processors have the ability to combine multiple bytes together as one unit to handle larger numbers, and also numbers with different ranges (such as floating-point numbers). However, how specific instructions treat a region of memory is irrelevant to the fact that every memory location is stored in the exact same manner. In addition to the memory lying in sequential addresses, processors also maintain a set of registers, which are temporary locations for holding data being manipulated or configuration switches. The fundamental process that controls processors is the fetch-execute cycle. Processors have a register known as the program counter, which holds the address of the next instruction to execute. The fetch-execute works in the following way:
The reality of how this occurs is actually much more complicated, especially since the POWER5 processor can execute up to five instructions simultaneously. However, this suffices for a mental model. The PowerPC architecture is characterized as a load/store architecture. This means that all calculations are performed on registers, not main memory. Memory access is simply for loading data into registers and storing data from registers into memory. This is different from, say, the x86 architecture, in which nearly every instruction can operate on memory, registers, or both. Load/store architectures typically have many general-purpose registers. The PowerPC has 32 general-purpose registers and 32 floating-point registers, which are each numbered (as apposed to the x86, where the registers are named rather than numbered). The operating system's ABI (application binary interface) will likely make special use of the first. It also has a few special-purpose registers for holding status information and return addresses. There are other special-purpose registers available to supervisor-level applications, but these are beyond the scope of this article. The general-purpose registers are 32 bits on 32-bit architectures and 64 bits on 64-bit architectures. This article series focuses on the 64-bit architectures.
Instructions in assembly language are very low-level -- they can only perform one (or sometimes a few) operations at a time. For example, while in C I can write So, without further ado, let's dive into the PowerPC instruction set. Here are some PowerPC instructions that are useful to beginners:
Notice that all instructions that compute a value use the first operand as the destination register. In all of these instructions the registers are specified only by their number. For example, the instruction for loading the number 12 into register 5 is Each PowerPC instruction is 32 bits long. The first six bits determine the instruction and the remaining portions have different functions depending on the instruction. The fact that they are fixed-length allows the processor to process them more efficiently. However, the limitation to 32 bits can cause a few headaches, which we will encounter. The solutions to most of these headaches will be discussed in part 2.
Many of the instructions above make use of the PowerPC extended mnemonics. This means that they are actually specializations of a more general instruction. For example, all of the conditional branches mentioned above are actually specializations of the
Now let's get into some actual code. The first program we write won't do anything at all except load two values, add them together and exit with the result as a status code. Type the following into a file named sum.s: Listing 1. Your first POWER5 program
Before discussing the program itself, let's build it and run it. The first step in building this program is to assemble it:
This produces a file called sum.o containing the object code, which is the machine-language version of your assembly code, plus additional information for the linker. The "-m64" switch tells the assembler that you are using the 64-bit ABI as well as 64-bit instructions. The generated object code is the machine-language form of the code, however, it cannot be run directly as-is. It needs to be linked it so that it is ready for the operating system to load it and run it. To link, do the following:
This will produce the executable sum. To run the program, do:
This will print out "3", which is the final result. Now let's look at how the code actually works. Since assembly language code works very close to the level of the operating system itself, it is organized very closely to the object and executable files which it will produce. So, to understand the code, we first need to understand object files. Object and executable files are divided up into "sections". Each section is loaded into a different place in the address space when the program is executed. They have different protections and purposes. The main sections we will concern ourselves with are:
The first thing our program does is switch to the .data section, and set alignment to an 8-byte boundary (.align 3 advances the assembler's internal address counter until it is a multiple of 2^3).
The line that says
The next directive,
After this, we have a similar set of directives defining the address
The
Now we can switch to actual program code. The
The first set of instructions loads in the address of the first value (not the value itself). Because PowerPC instructions are only 32-bits long, there are only 16 bits available within the instruction for loading constant values (remember, the address of
The first instruction used stands for "load immediate shifted". This loads the value on the far right side (bits 48-63 of That's quite a lot of work just to load a single 64-bit value. That's why most operations on PowerPC chips operate through registers instead of immediate values -- register operations can use all 64-bits at once, rather than being limited by the instruction length. In the next article we will cover the addressing modes that make this easier.
Now, remember, this only loads in the address of the value we want to load. Now we want to load the value itself into a register. To do this, we will use register 7 to tell the processor what address we want to load the value from. This will be indicated by putting "7" in parentheses. The instruction A similar process is used to load the second value into register 5.
After the registers are loaded, we can now add our numbers. The instruction
Now that we have computed the value we want, the next thing we want to do is use this value as the return/exit value for the program. The way that you exit a program in assembly language is by issuing a system call to do so (exiting is done using the
On PowerPC machines, this is done by adding. The
The exit system call takes one parameter -- the exit value. This is stored in register 3. Therefore, we need to move our answer from register 6 to register 3. The "register move" instruction
The instruction to call the operating system is simply
Just to point out, a lot of these instructions are redundant, but used for teaching purposes. For example, since Listing 2. A short version of the first program
For our next program, we will have a program that is a little more functional -- we will find a maximum of a set of values and exit with the result. Here is the program; enter it as max.s: Listing 3. Finding a maximum value
To assemble, link, and run the program, do:
Hopefully now that you have experience with one PowerPC program and know a few instructions, you can follow the code a little bit. The data section is the same as before, except that we have several values after our
In the program itself, one thing to notice is that we documented what we were using each register for. This practice will help you immensely to keep track of your code. Register 3 is the one we are storing the current maximum value in, which we initially set to 0. Register 4 contains the address of the next value to load. It starts out at value_list and advances forward by 8 each iteration. Register 5 contains the address immediately following the data in value_list. This allows a simple comparison between register 4 and register 5 to know when we are at the end of the list, and need to branch to
Note that we marked each branch point with its own symbolic label, which allowed us to use those labels as the targets for branch instructions. For example,
Another instruction to note is
Hopefully, you now have a basic feel for assembly language programming on the PowerPC. The instructions may look a little weird at first, but they will become second-nature with practice. In the next article, I'll cover the various addressing modes of both the PowerPC processor, and how they can be used to do 64-bit programming more effectively. In the third article, we will cover the ABI more fully, discussing what registers are used for what purpose, how to call and return from functions, and other interesting aspects of the ABI. Learn
Get products and technologies
Discuss
原文链接: http://www-128.ibm.com/developerworks/linux/library/l-powasm1.html?ca=dgr-lnxw09PowPC-ASM1 | ||||||||||||||||||||||||