
| Assembly language for Power Architecture, Part 2: The art of loading and storing on PowerPC | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 摘自: IBM developerWorks Worldwide 被阅读次数: 638 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
由 yangyi 于 2007-04-28 18:36:45 提供 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Level: Intermediate Jonathan Bartlett (johnnyb@eskimo.com), Director of Technology, New Medio 29 Nov 2006 The previous article in this series introduced assembly language programming using the 64-bit PowerPC® instruction set on POWER5 and other processors that use these instructions. This article drills down and discusses the specifics of 64-bit PowerPC assembly language programming on Linux® and UNIX®-like operating systems, focusing on data access methods and position-independent code. Addressing modes and why they are important Before getting into addressing modes, let's review computer memory concepts. You may recognize these facts about memory and programming, but as modern programming languages attempt to de-emphasize the physical aspects of the computer, a refresher can be helpful:
New assembly language programmers are sometimes surprised by how many different ways you can access memory. These different ways are called addressing modes. Some modes are logically equivalent but differ in their purpose. They are considered different addressing modes because they may be implemented differently based on the processor. There are actually two addressing modes that don't access memory at all. In
immediate mode, the data to be used is part of the instruction (for
example, the The most obvious addressing mode for accessing main memory is called direct addressing mode. In this mode, the instruction itself contains the address from which to load the data. This mode is often used for global variable access, branching, and subroutine calls. A similar mode is relative addressing mode, which calculates the address based on the current program counter. This is often used for short-range branches where the destination is near the current location, so specifying an offset rather than an absolute address makes more sense. It is similar to direct addressing mode in that the final address is known at either assemble or link time. The indexed addressing mode makes the most sense as a way to access array elements for global variables. It has two parts: a memory address and an index register. The index register is added to the specified address, and the result is used as the address for the memory access. Some platforms (not PowerPC) allow programmers to specify a multiplier for the index register. Therefore, if each array element is 8 bytes long, you can use 8 as a multiplier. This allows the index register to be used exactly like an array index. Otherwise, the index register would have to be increased/decreased in increments of the data size. The register indirect addressing mode uses a register to specify the whole address for the memory access. This is used for numerous situations, including, but not limited to:
Base-pointer addressing mode acts just like indexed addressing mode (the specified number and the register are added together for the final address), except that the function of the two components are switched. In base-pointer addressing mode, the register has the base address and the literal number has the offset. This is very useful for accessing members of a struct. The register can hold the address of the whole struct, and the numeric portion can be modified depending on the structure member to be accessed. For instance, let's say you have a struct that has three fields: the first is 8 bytes, the second is 4 bytes and the last is 8 bytes. Then, let's say that the address of the struct itself is in a register called register X. If you want to access the second member of the structure, you'll need to add 8 to the value in the register. So, using base-pointer addressing, you would specify register X as the base pointer and 8 as the offset. To access the third field, you would specify register X as the base pointer and 12 as the offset. To access the first field, you can actually use indirect addressing instead of base-pointer addressing, since there is no offset (this is why on many platforms the first structure member is the fastest to access; you can use a simpler addressing mode -- in PowerPC it does not matter). Finally, in indexed register indirect addressing mode, both the base and the index are stored in registers. The memory address used is determined by adding the two registers together.
The importance of instruction formats To learn how addressing modes work for load and store instructions on PowerPC processors, you must first understand a little bit about the PowerPC instruction format. The PowerPC uses a load/store (also called RISC) instruction set, which means that the only time it accesses main memory is for loading into registers or copying a register to memory. All of the actual processing takes place between registers (or between registers and immediate-mode operands). The other main type of processor architecture, CISC (the x86 processor being a popular CISC instruction set), allows for memory access in nearly every instruction. The reason for the load/store architecture is that it allows the rest of the processor to be more efficient. In fact, most modern CISC processors actually translate their instructions to an internalized RISC format for efficiency. Each instruction on the PowerPC is exactly 32 bits long, with the instruction's opcode (the code telling the processor which instruction it is) taking the first six bits. This 32-bit length includes all immediate-mode values, register references, explicit addresses, and instruction options. This makes for a pretty small squeeze. In fact, the largest length available for a memory address to any instruction format is only 24 bits! This would give you, at most, only 16MB of addressable space. Don't worry -- there are lots of ways around this. This is just to point out why instruction format matters on the PowerPC processor -- you need to know how much space you have to work with! You don't need to memorize all of the instruction formats to make use of them. However, knowing some of the basic ones will help you read PowerPC documentation and understand some of the general strategies and nuances in the PowerPC instruction set. The PowerPC has 15 different instruction formats, many with several subformats. However, you only need to be concerned with a few of them.
Addressing memory using the D-Form and DS-Form instruction formats The D-Form instruction is one of the primary memory-access instruction forms. It looks like this:
This form is used to perform loads, stores, and immediate-mode calculations. It can be used for the following addressing modes:
As you can see, the D-Form instruction is very flexible and is used for any register-plus-address memory access form. However, its usability for direct addressing and indexed addressing is extremely limited, because it only has a 16-bit address field to work with! This gives a maximum range of only 64K. Therefore, the direct and indexed addressing modes are only rarely used to fetch and store memory. Instead, this form is much more often used for immediate, indirect, and base-pointer addressing modes, because in these addressing modes the 64K limit is not nearly as problematic because the base register can have the full 64-bit range. The DS-Form is only used in 64-bit instructions. It is just like the D-Form, except that it uses the last two bits of the address for an extended opcode. However, it pads the Value portion of the address to the right with two zeros. This gives it the same range as D-Form instructions (64K), but limits it to 32-bit aligned memory. For the assembler, the value is specified normally -- it is simply condensed by the assembler. For example, if you wanted an offset of 8, you would still enter 8; the assembler would just convert the value to the bit representation 0b000000000010 instead of 0b00000000001000. If you entered a value that was not a multiple of 4, the assembler would give an error. Note that in D-Form and DS-Form instructions, if the source register is set to 0, instead of using register 0 it simply does not use the register parameter. Let's now look at instructions built from D-Forms and DS-Forms. Immediate-mode instructions are specified in assembler like this:
Here Listing 1. Immediate-mode instructions
In the non-immediate-mode uses of the D-Form, the second register is added to the value to give the final address of the memory to load from or store to. These instructions have the general form:
In this form, the address to load/store is specified as
Listing 2. Load/store instruction examples using the D-Form and DS-Form
If you look carefully, you can see that there is sort of a "base opcode"
specified at the beginning of the instruction, with several modifiers following.
Addressing memory using the X-Form instruction format The X-Form is used for indexed register indirect addressing, where the values of two registers are added together to determine the address for loading/storing. The X-Form has the following format:
The opcodes are formatted like this:
Here Here are some example instructions using the X-Form: Listing 3. Examples using X-Form addressing
The advantage of X-Form, beside being very flexible, is that you have a significantly extended address range. In the D-Form, only one value -- the register -- could specify a full range. In the X-Form, since you have two registers, both components can specify as large a range as necessary. Therefore, in situations where base-pointer addressing or indexed addressing would be used, but the 16-bit range of the constant part of the D-Form is too small, the value can be stored in a register and the X-Form can be used.
Writing position-independent code Position-independent code is code that works no matter what part of memory it is loaded into. Why do you need position-independent code? Position-independent code allows libraries to be loaded into arbitrary locations in the address space. This is what allows libraries to be arbitrarily combined -- since none of them have specific locations they are bound to, they can be loaded with any other library without worrying about address space conflicts. The linker takes care of making sure libraries are each loaded into their own space. By using position-independent code, the libraries don't have to worry about where they are loaded. Ultimately, however, position-independent code needs to have a method of locating global variables. It does this by maintaining a global offset table that provides addresses for all global contents that a function or group of functions access (or even a whole program, in most cases). A register is reserved for holding the pointer to the table. Then, all accesses are done by an offset into the table. The offsets are constant. The table itself is set up by the program linker/loader, which also initializes register 2 to hold the global offset table pointer. Using this method, the linker/loader can put both program and data wherever it deems appropriate, and only needs to set up a global offset table containing all of the global pointers. It is easy to get bogged down in a discussion of all of this. Let's look at some code and analyze what is going on at each step of the way. This is the "add numbers" program used in the previous article, but adapted for position-independent code. Listing 4. Accessing data through the global offset table
To assemble, link, and run the code, do the following: Listing 5. Assembling, linking, and running the code
The data definition and entry point declaration are both the same as before.
However, now, instead of having to use 5 instructions to load the address of
Using this method, most programs can contain all of the global data they use within a single global offset table. The DS-Form can address up to 64K of memory from a single base. Note that in order to get the full range of the DS-Form, register 2 points to the middle of the global offset table, so that it can make use of both positive and negative offsets. Since you are locating pointers to data (instead of data directly), you have access to approximately 8,000 global variables (local variables are in registers or in the stack, which will be discussed in the third article in this series). And even if this were not enough, there can exist multiple global offset tables. The mechanism for this is also discussed in the next article. While this is much more compact and readable (not to mention relocatable) than the five-instruction data load in the last article, you can still do better. In the 64-bit ELF ABI, the global offset table is actually a subset of a larger section known as the table of contents. In addition to creating global offset table entries, the table of contents can contain variables, which, rather than containing addresses of global data, contain the data items themselves. The size and number of these variables must be small, since the table of contents is only 64K. To declare a table of contents data item, you have to switch to the
This will create a table of contents entry. To access data that is directly within the table of contents, you need to refer
to it using Listing 6. Difference between @got and @toc
As you can see, if you look up a symbol that defines data within the
Now let's look at the adding numbers example using values from the ToC: Listing 7. Adding numbers defined in the .toc section
As you can see, by using
Loading and storing multiple values The PowerPC also has the ability to perform multiple loads and stores with a
single instruction. Unfortunately, this is restricted to word-sized (32 bit) data.
These are very simple D-Form instructions. You specify the base address register,
the offset, and the starting destination register. The processor will then load
data into all the registers starting with the listed destination register through
register 31, starting with the address specified with the instruction, and moving
forward. The instructions for this are Listing 8. Loading and storing multiple values
And here is our add numbers program again using multiple values: Listing 9. The add numbers program using multiple values
Most load/store instructions can update the main address register with the final
effective address that was used in the load/store instruction. For example,
Efficient loading and storing is critical for efficient code. Knowing the instruction formats and addressing modes available helps you understand the possibilities and limitations of a platform. The D-Form and DS-Form instruction formats on the PowerPC are critical for position-independent code. Position-independent code allows you to create shared libraries and allows you to use fewer instructions to load global addresses. The next article in this series will cover branching, function calls, and integrating with C code. Learn
Get products and technologies
Discuss
原文链接: http://www-128.ibm.com/developerworks/linux/library/l-powasm2.html?ca=dgr-lnxw09PowPC-ASM2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||