Intel Core 2

摘自: www.linuxhardware.org  被阅读次数: 266


yangyi 于 2007-12-31 20:02:43 提供


Last week I presented the details of the new "Penryn" Core 2 processor from Intel. In that article, I reviewed each new feature of the processor and covered how each feature might affect your PC's performance. Finally, I finished off the article with interviews of software developers that discussed how they expect the new processor to affect their projects. So after all of that, it's time to actually put the new processors to the test. In this review, I put Intel's new Core 2 Extreme QX9650 and Core 2 Extreme QX9770 up against the previous top processor from Intel, the Core 2 Extreme QX6850. We'll see how the new core performs clock-for-clock against the last generation processor and look at where the "Penryn" architecture is going. All of this will be evaluated on Intel's latest chipset, the X38 Express.
.
.

Intel X38 Express Chipset

The Intel X38 Express chipset is Intel's latest performance chipset targeted at the enthusiast and workstation markets. The chipset consists of the X38 MCH (memory controller hub) and the ICH9 (I/O controller hub). The X38 MCH replaces the P35 MCH and adds a couple of additional current and next generation features. The ICH9 is the same ICH that was employed on the P35 chipset.

The X38 MCH brings dual PCI Express x16 slots to the Intel chipset line and even introduces PCI Express 2.0 which doubles throughput on the PCIe bus. This new support means two full speed x16 links for dual video card configurations from ATI/AMD. (Currently NVIDIA still doesn't support SLI on Intel chipsets.) The X38 MCH also has an enhanced memory controller which Intel claims will reduce overall system memory latency. Add on official DDR3-1333 support, and you have a decent upgrade from the P35.

As mentioned, the ICH9 is the same ICH that was available on the P35 chipset. That's actually a good thing here because Intel ICH's are known for being very stable and the ICH9 already has support for all the latest peripherals. Everything from an addition six PCIe lanes, 12 USB 2.0 ports, High-Definition Audio, SATA 3G, and Gigabit LAN is all available on the ICH9.

Configuration

Now, let's go over the configuration I used for this review. Below you can see photos of some of the key components of the system and a table outlining the hardware and software configuration.

The processors I'll be covering today include the top of last generation's line, the fastest currently available chip, and a processor that will not actually be available until Q1 of next year. The QX6850 is based on the "Kentsfield" core and operates at 3.0GHz on a 1333MHz FSB with 2x4MB L2 cache. It was the previous "Extreme" model from Intel that has been replaced by the new QX9650 processor. The QX9650 is a "Yorkfield" processor, based on the "Penryn" architecture. It also operates at 3.0GHz on a 1333MHz FSB, but has 2x6MB of L2 cache. Finally, we have the QX9770, which is also a "Yorkfield" processor, that operates at 3.2GHz on a 1600MHz FSB with 2x6MB L2 cache. The QX9770 will not be available until Q1 of next year, but Intel sent samples to reviewers to give you a taste of what's to come.

ASUS Maximus Formula (Special Edition)

4GB (4x1GB) Corsair Dominator DDR2


The center of the test system is the new ASUS Maximus Formula (Special Edition). It is an X38 chipset based motherboard designed for the enthusiast and, more specifically, gamers. This board screams "extreme" with more bonus features than you will find on 99% of the boards out there. It supports up to 1600MHz FSB CPUs, up to 8GB of DDR2 memory, and memory operating at upwards of 1200MHz. In terms of expansion slots, it's decked out with dual x16 PCIe slots that support ATI CrossFire Technology, three PCIe 1x slots, and 2 standard 32-bit PCI slots. I plan to spend an entire article on this motherboard later, so I'll save further details for then.

For compatibility testing with 64-bit Linux, I went to Corsair looking for 4GB of memory. They hooked me up with their top-of-the-line Dominator memory. The QUAD2X4096-8500C5DF pack of memory includes four 1GB modules rated at up to 1066MHz. For testing here, I chose to run the memory at 800MHz just to insure stability.

Cooler Master GeminII

Components Installed


To ensure proper cooling on this rig, without a lot of noise, I went with the Cooler Master GeminII cooler. This heatsink is massive, as you can see in the photos above. It supports dual 120mm fans and will not only keep the processor cool, but also the surrounding components. This seemed like a big plus considering the heatsinks on the motherboard and on the memory.

To round-out the system, ASUS provided Linux Hardware with an NVIDIA 8600GTS-based graphics card, which provides outstanding performance without breaking the bank.

As usual, I went with a custom Gentoo 64-bit install. This will allow us to get the most out of the processor with custom compile options and the latest software. In the "Software Versions" section, you can grab a copy of my kernel .config, if you're interested in the settings I chose.

CPU

Intel Core 2 Extreme QX6850
(3.0GHz, 1333MHz FSB, 2*4MB L2)
(Quad-Core)

Intel Core 2 Extreme QX9650
(3.0GHz, 1333MHz FSB, 2*6MB L2)
(Quad-Core)

Intel Core 2 Extreme QX9770
(3.2GHz, 1600MHz FSB, 2*6MB L2)
(Quad-Core)

CPU Cooling Device

Cooler Master GeminII +
2 x Cooler Master 120mm Silent Fans

Motherboard

ASUS Maximus Formula (Special Edition)
(Intel X38 Express + Intel ICH9R Chipset)
BIOS: 0701

Memory

4GB (4x1GB) Corsair Dominator DDR2-1066 SDRAM
QUAD2X4096-8500C5DF
(Tested at 800MHz, CAS Latency 5)

Video Card

ASUS EN8600GTS
(NVIDIA 8600GTS GPU, 256MB GDDR3)

Sound Card

Creative Labs Sound Blaster Audigy 2 ZS

Network Card

On-board Gigabit Ethernet

Hard Drive

HITACHI Ultrastar A7K1000 1TB 7200 RPM 32MB Cache

Power Supply

PC Power and Cooling Turbo-Cool 1KW-SR
(1000 Watts)

Distribution

Gentoo Linux 64-bit (amd64)

Software Versions

GCC: 4.1.2
GLIBC: 2.6.1
Kernel: Linux 2.6.23 (.config download)
X-Windows: X.org 7.2
NVIDIA Driver: 100.14.19

Compile Optimizations

-march=nocona -O2 -pipe


Stability

As with most Intel platforms I've seen, this system was an absolute rock throughout testing. Setup was a breeze and a stable system was easy to configure by simply setting the BIOS to the default settings. As mentioned above, for this review, I ran memory at default DDR2-800 settings. If you plan on building a system on this, or any other, new platform, be sure to flash the BIOS on your motherboard. This will ensure the latest compatibility settings for your memory and processors are configured correctly.

Overclocking

Since I had this nice new enthusiast motherboard and a new core with a smaller manufacturing process, I felt it was only fitting to test overclocking on these CPUs. This should show you how far you could potentially push one of these processors or even lower clocked models once available. It will also give a look at how these processors scale.

All Extreme processors from Intel are multiplier unlocked, so it was quite easy to simply bump up the multiplier without needing to touch the FSB speed. On both the QX9650 and QX9770, a 4.0GHz overclock (400MHz x 10) was easily obtained with a small bump in core voltage to 1.40V, which is still under Intel's maximum voltage in their datasheet.

At 4.0GHz, the system was 100% stable, even though it was running pretty hot, peaking at a little over 70C. Better/faster fans on the heatsink I chose or even a higher-end cooler altogether would have taken care of this. With more extreme cooling, it's unknown how far I might have been able to take these chips.

Performance

Next, we'll move on to the benchmarks. The benchmarks below will include the three processors from the "Configuration" section, as well as the overclocked QX9770 at 4.0GHz.

For the first benchmark, I downloaded the sources to GlibC 2.5 and compiled them using the make "-j#" flag, which compiles the application using "#" number of threads. For all processors, I used "-j4" so that all four cores would be utilized. To recreate this benchmark, place the source tar file for GlibC 2.5 in a directory containing the following script which I called "benchmark-glibc.sh":

#!/bin/sh

tar xf glibc-2.5.tar.bz2 && mkdir glibc-build && cd glibc-build && ../glibc-2.5/configure --disable-sanity-checks && make -j$1

Then just make the shell script executable (chmod +x benchmark-glibc.sh) and use the "time" command to get your results by passing the number of threads as the argument to the script: time ./benchmark-glibc.sh 4

For the compile test, you can see about a 7% performance advantage of the QX9650 over the previous QX6850. This may be attributed to any or all of the optimization in the core including the larger cache size. Considering that both processors operate at the same frequency, this is a nice advantage. The QX9770 gives you about what you'd expect from a 200MHz advantage at about 5.4%. As we move to the 4GHz mark, we gain almost 25% more performance over the 3GHz part as expected. Scaling of the core seems to stay right in line with expectations.

For the audio encoding benchmark, I use the open Ogg Vorbis format and the Linux Vorbis tools. The single-threaded OggEnc benchmark encodes a 11 minute 22 second audio wave file of classical music. The multi-threaded benchmark re-encodes 21 songs from Ogg Vorbis into MP3s. It uses a piece of software I wrote called AudConvert to utilize each processor core for simultaneous encodes.


Both of the graphs above look similar to what we saw in the compile benchmark. There is once again a small performance boost for the new core and incremental performance improvements for extra clock speed. Looking at these graphs side-by-side, it's interesting to see how well the multi-processing power of these chips scale. You'll notice no real loss in performance even though all cores are being fully utilized.

From audio encoding, I move right into video encoding. For this benchmark, I ripped a 1 hour 56 minute movie from a non-copy-protected DVD. I then re-encoded that DVD into MPEG4 using FFMPEG and Mencoder. I used the following profile in my ~/.mplayer/mencoder.conf file:

[ffmpeg]
profile-desc="MPEG4/MP3 encoding"
ovc=lavc=1
lavcopts=vcodec=mpeg4:
threads=4:mbd=2:trell=1:v4mv=1:vstrict=1
oac=mp3lame=1
lameopts=fast=1:preset=standard
vf-add=harddup
ffourcc=DX50

Above, you will see the setting you will need to change depending on number of available cores is highlighted in red. Then a simple command will perform the encode and return the time:
time mencoder -profile ffmpeg -o output.avi <input files>

Again we have very comparable results to those above. Nothing is really more pronounced for Mencoder and FFMPEG than we saw from the previous tests.

Moving away from encoding, we turn next to POVRay and the process of using ray tracing to produce 3D images. A special thanks goes to Nicolas Calimet for making this benchmark possible. He worked with me on getting pre-release code and then worked through some minor issues. The multi-threaded POVRay can be found on the Beta page. You can run the benchmark with povray –benchmark +wt4, where "4" represents the number of threads.

In the benchmark above, you can see that I ran the test with two different binaries. One binary was compiled with GCC, and the other was compiled with Intel's icc compiler. The graphs once again follow the same pattern as we see in earlier benchmarks. What's more interesting in this benchmark is that the Intel compiler gives you a slightly faster binary than GCC. Note that the compiler options used may not be totally equal here. The developer chose the options. They can be accessed by looking at the version information in the binaries included in the package.

Finally, let's move on to graphics applications. The first benchmark is the industry standard for workstation graphics, SPECViewperf. The instructions for running this benchmark can be found in the package itself. It involves compiling your binaries for your flavor of *nix and then running a shell script to execute the benchmark. This benchmark is completely single-threaded, so multiple cores should have very little impact here.

Overall, this is still the same pattern we would expect from the results above. There is a discrepancy in light-08 where the QX9650 drops off that is probably just an anomaly. There are also a couple of tests where the processor makes very little difference. This is caused by tests that are much more GPU dependent than CPU dependent.

In the gaming benchmarks below, I first show Quake 3 for a little old-school fun on the left. I chose Quake 3 primarily because it is the latest id Software sources released that you can now compile from scratch. I compiled the game shown below from icculus.org's sources labeled 1.34-rc3 using the compile options from the table above. On the right, is the latest game for Linux, Enemy Territory: Quake Wars. Both of these games are supposedly multi-threaded, but I'm uncertain to what extent. My guess is two to three threads max.


In Quake 3, I almost broke the "magic" 1000FPS mark and we one again see an increase in speed almost as we'd expect. It's not quite as pronounced here because of how high we are in terms of frames per second. As we move out of 640x480, we also see much less dependence on CPU performance. The same goes for the graph on the right. Notice the impressive boost in frame rate for clock speed at 640x480, while at 1024x768, we've pretty much neck and neck.

Summarizing the benchmarks, we see that no matter what the application, the impact on performance is almost similar across the board. We see a small boost for the new "Penryn" core and then we see increased performance relative to the increase in clock speed. It doesn't appear that any one application benefits from "Penryn" over another.

Power Usage, Pricing, and Availability

To wrap up this review, let's take a look at power usage, price, and availability.

In the graph above, we can see the power usage of each system under idle and full loads. The idle category here does not use any sort of clock throttling. The "Penryn" core is an amazing piece of silicon when it comes to power consumption. Clock for clock, the "Penryn" core used more than 20% less power, and that's for more transistors and greater performance. In fact, the "Penryn" power usage doesn't get up to the same level as "Kentsfield" until it's about 1GHz clocked higher.

The pricing on all three of the chips presented here is in the top of the line category from Intel. This means street price on the QX6850 is about $1,000, the QX9650 is about $1,200, and we can only expect that the QX9770 will be above that upon launch. The availability on the QX6850 seems to be pretty high, especially with the "Penryn" based chips now available. The QX9650 is available from a few retailers and sold out at others. This is probably just temporary with people rushing to get the new chip. The QX9770 is a paper launch from Intel and seems to be out there just to take away more of AMD's fire from the Phenom release. I don't have a specific launch date from Intel other than to say Q1'08.

If you're looking for a cheaper processor from Intel, the Q9450 processor, which will be clocked at 2.66GHz (8*333), is right around the corner. It should be priced right above the $300 mark.

Conclusion

With the information presented above, we have a much more complete picture of what we can expect from "Penryn" based processors. For the same clock speed you get a slightly faster chip with a lot lower power usage. The "Penryn" architecture should take Intel through the next couple of years, and as I showed, up through 4GHz and possibly beyond. The X38 motherboard from ASUS was a joy to work with and presented a very stable platform. As usual, the Intel chipset was 100% stable and actually had 100% support for Linux.

While the price of the QX chips presented here may be off the scale, keep in mind that the more affordable chips will be right around the corner. At that point, purchasing a lower clocked chip should be a great investment. These chips have a lot of headroom and should last you a while, especially if you're not scared of a little overclocking.

From here, I'm going to take a look at using GCC 4.3 for optimizing the latest Core 2 processors and also take a closer look at some of the specific components used in this review, starting with the ASUS Maximus Formula motherboard. Expect the first of those reviews to show up next week.

原文链接: http://www.linuxhardware.org/article.pl?sid=07/11/21/152234