Intel X38 Express Chipset
The Intel X38 Express chipset is Intel's latest
performance chipset targeted at the enthusiast and workstation
markets. The chipset consists of the X38 MCH (memory controller hub)
and the ICH9 (I/O controller hub). The X38 MCH replaces the P35 MCH
and adds a couple of additional current and next generation features.
The ICH9 is the same ICH that was employed on the P35 chipset.

The X38 MCH brings dual PCI Express x16 slots to the
Intel chipset line and even introduces PCI Express 2.0 which doubles
throughput on the PCIe bus. This new support means two full speed x16
links for dual video card configurations from ATI/AMD. (Currently
NVIDIA still doesn't support SLI on Intel chipsets.) The X38 MCH also
has an enhanced memory controller which Intel claims will reduce
overall system memory latency. Add on official DDR3-1333 support, and
you have a decent upgrade from the P35.
As mentioned, the ICH9 is the same ICH that was
available on the P35 chipset. That's actually a good thing here
because Intel ICH's are known for being very stable and the ICH9
already has support for all the latest peripherals. Everything from
an addition six PCIe lanes, 12 USB 2.0 ports, High-Definition Audio,
SATA 3G, and Gigabit LAN is all available on the ICH9.
Configuration
Now, let's go over the configuration I used for this review. Below
you can see photos of some of the key components of the system and a
table outlining the hardware and software configuration.
The processors I'll be covering today include the top of last
generation's line, the fastest currently available chip, and a
processor that will not actually be available until Q1 of next year.
The QX6850 is based on the "Kentsfield" core and operates
at 3.0GHz on a 1333MHz FSB with 2x4MB L2 cache. It was the previous
"Extreme" model from Intel that has been replaced by the
new QX9650 processor. The QX9650 is a "Yorkfield"
processor, based on the "Penryn" architecture. It also
operates at 3.0GHz on a 1333MHz FSB, but has 2x6MB of L2 cache.
Finally, we have the QX9770, which is also a "Yorkfield"
processor, that operates at 3.2GHz on a 1600MHz FSB with 2x6MB L2
cache. The QX9770 will not be available until Q1 of next year, but
Intel sent samples to reviewers to give you a taste of what's to
come.
The center of the test system is the new ASUS
Maximus Formula (Special Edition). It is an X38 chipset based
motherboard designed for the enthusiast and, more specifically,
gamers. This board screams "extreme" with more bonus
features than you will find on 99% of the boards out there. It
supports up to 1600MHz FSB CPUs, up to 8GB of DDR2 memory, and memory
operating at upwards of 1200MHz. In terms of expansion slots, it's
decked out with dual x16 PCIe slots that support ATI CrossFire
Technology, three PCIe 1x slots, and 2 standard 32-bit PCI slots. I
plan to spend an entire article on this motherboard later, so I'll
save further details for then.
For compatibility testing with 64-bit Linux, I went to Corsair
looking for 4GB of memory. They hooked me up with their
top-of-the-line Dominator memory. The QUAD2X4096-8500C5DF
pack of memory includes four 1GB modules rated at up to 1066MHz. For
testing here, I chose to run the memory at 800MHz just to insure
stability.
To ensure proper cooling on this rig, without a lot of noise,
I went with the Cooler Master GeminII cooler. This heatsink is
massive, as you can see in the photos above. It supports dual 120mm
fans and will not only keep the processor cool, but also the
surrounding components. This seemed like a big plus considering the
heatsinks on the motherboard and on the memory.
To round-out the system, ASUS provided Linux Hardware with an
NVIDIA 8600GTS-based graphics card, which provides outstanding
performance without breaking the bank.
As usual, I went with a custom Gentoo 64-bit install. This will
allow us to get the most out of the processor with custom compile
options and the latest software. In the "Software Versions"
section, you can grab a copy of my kernel .config, if you're
interested in the settings I chose.
|
CPU
|
Intel Core 2 Extreme QX6850 (3.0GHz, 1333MHz FSB, 2*4MB
L2) (Quad-Core)
|
Intel Core 2 Extreme QX9650 (3.0GHz, 1333MHz FSB, 2*6MB
L2) (Quad-Core)
|
Intel Core 2 Extreme QX9770 (3.2GHz, 1600MHz FSB, 2*6MB
L2) (Quad-Core)
|
|
CPU Cooling Device
|
Cooler Master GeminII + 2 x Cooler Master 120mm
Silent Fans
|
|
Motherboard
|
ASUS Maximus Formula (Special Edition) (Intel
X38 Express + Intel ICH9R Chipset) BIOS: 0701
|
|
Memory
|
4GB (4x1GB) Corsair Dominator DDR2-1066
SDRAM QUAD2X4096-8500C5DF (Tested at 800MHz, CAS Latency 5)
|
|
Video Card
|
ASUS EN8600GTS (NVIDIA 8600GTS GPU, 256MB
GDDR3)
|
|
Sound Card
|
Creative Labs Sound Blaster Audigy 2 ZS
|
|
Network Card
|
On-board Gigabit Ethernet
|
|
Hard Drive
|
HITACHI Ultrastar A7K1000 1TB 7200 RPM 32MB Cache
|
|
Power Supply
|
PC Power and Cooling Turbo-Cool 1KW-SR (1000
Watts)
|
|
Distribution
|
Gentoo Linux 64-bit (amd64)
|
|
Software Versions
|
GCC: 4.1.2 GLIBC: 2.6.1 Kernel: Linux 2.6.23
(.config download) X-Windows:
X.org 7.2 NVIDIA Driver: 100.14.19
|
|
Compile Optimizations
|
-march=nocona -O2 -pipe
|
Stability
As with most Intel platforms I've seen, this system was an
absolute rock throughout testing. Setup was a breeze and a stable
system was easy to configure by simply setting the BIOS to the
default settings. As mentioned above, for this review, I ran memory
at default DDR2-800 settings. If you plan on building a system on
this, or any other, new platform, be sure to flash the BIOS on your
motherboard. This will ensure the latest compatibility settings for
your memory and processors are configured correctly.
Overclocking
Since I had this nice new enthusiast motherboard and a new core
with a smaller manufacturing process, I felt it was only fitting to
test overclocking on these CPUs. This should show you how far you
could potentially push one of these processors or even lower clocked
models once available. It will also give a look at how these
processors scale.
All Extreme processors from Intel are multiplier unlocked, so it
was quite easy to simply bump up the multiplier without needing to
touch the FSB speed. On both the QX9650 and QX9770, a 4.0GHz
overclock (400MHz x 10) was easily obtained with a small bump in core
voltage to 1.40V, which is still under Intel's maximum voltage in
their datasheet.
At 4.0GHz, the system was 100% stable, even though it was running
pretty hot, peaking at a little over 70C. Better/faster fans on the
heatsink I chose or even a higher-end cooler altogether would have
taken care of this. With more extreme cooling, it's unknown how far I
might have been able to take these chips.
Performance
Next, we'll move on to the benchmarks. The benchmarks
below will include the three processors from the "Configuration"
section, as well as the overclocked QX9770 at 4.0GHz.
For the first benchmark, I downloaded the sources
to GlibC 2.5 and compiled them using the make "-j#"
flag, which compiles the application using "#" number of
threads. For all processors, I used "-j4" so that all four
cores would be utilized. To recreate this benchmark, place the source
tar file for GlibC 2.5 in a directory containing the following script
which I called "benchmark-glibc.sh":
#!/bin/sh
tar xf glibc-2.5.tar.bz2 && mkdir
glibc-build && cd glibc-build &&
../glibc-2.5/configure --disable-sanity-checks && make -j$1
Then just make the shell script executable (chmod +x
benchmark-glibc.sh) and use the "time" command to
get your results by passing the number of threads as the argument to
the script: time ./benchmark-glibc.sh 4

For the compile test, you can see about a 7%
performance advantage of the QX9650 over the previous QX6850. This
may be attributed to any or all of the optimization in the core
including the larger cache size. Considering that both processors
operate at the same frequency, this is a nice advantage. The QX9770
gives you about what you'd expect from a 200MHz advantage at about
5.4%. As we move to the 4GHz mark, we gain almost 25% more
performance over the 3GHz part as expected. Scaling of the core seems
to stay right in line with expectations.
For the audio encoding benchmark, I use the open Ogg
Vorbis format and the Linux Vorbis tools. The single-threaded
OggEnc benchmark encodes a 11 minute 22 second audio wave file of
classical music. The multi-threaded benchmark re-encodes 21 songs
from Ogg Vorbis into MP3s. It uses a piece of software I wrote called
AudConvert
to utilize each processor core for simultaneous encodes.
Both of the graphs above look similar to what we
saw in the compile benchmark. There is once again a small performance
boost for the new core and incremental performance improvements for
extra clock speed. Looking at these graphs side-by-side, it's
interesting to see how well the multi-processing power of these chips
scale. You'll notice no real loss in performance even though all
cores are being fully utilized.
From audio encoding, I move right into video encoding.
For this benchmark, I ripped a 1 hour 56 minute movie from a
non-copy-protected DVD. I then re-encoded that DVD into MPEG4 using
FFMPEG and Mencoder. I used the following profile in my
~/.mplayer/mencoder.conf file:
[ffmpeg] profile-desc="MPEG4/MP3
encoding" ovc=lavc=1 lavcopts=vcodec=mpeg4:threads=4:mbd=2:trell=1:v4mv=1:vstrict=1 oac=mp3lame=1 lameopts=fast=1:preset=standard vf-add=harddup ffourcc=DX50
Above, you will see the setting you will need to change
depending on number of available cores is highlighted in red. Then a
simple command will perform the encode and return the time:
time
mencoder -profile ffmpeg -o output.avi <input files>

Again we have very comparable results to those above.
Nothing is really more pronounced for Mencoder and FFMPEG than we saw
from the previous tests.
Moving away from encoding, we turn next to POVRay
and the process of using ray tracing to produce 3D images. A special
thanks goes to Nicolas Calimet for making this benchmark possible. He
worked with me on getting pre-release code and then worked through
some minor issues. The multi-threaded POVRay can be found on the Beta
page. You can run the benchmark with povray –benchmark
+wt4, where "4" represents the number of threads.

In the benchmark above, you can see that I ran the test
with two different binaries. One binary was compiled with GCC, and
the other was compiled with Intel's icc compiler. The graphs once
again follow the same pattern as we see in earlier benchmarks. What's
more interesting in this benchmark is that the Intel compiler gives
you a slightly faster binary than GCC. Note that the compiler options
used may not be totally equal here. The developer chose the options.
They can be accessed by looking at the version information in the
binaries included in the package.
Finally, let's move on to graphics applications. The
first benchmark is the industry standard for workstation graphics,
SPECViewperf.
The instructions for running this benchmark can be found in the
package itself. It involves compiling your binaries for your flavor
of *nix and then running a shell script to execute the benchmark.
This benchmark is completely single-threaded, so multiple cores
should have very little impact here.

Overall, this is still the same pattern we would expect
from the results above. There is a discrepancy in light-08 where the
QX9650 drops off that is probably just an anomaly. There are also a
couple of tests where the processor makes very little difference.
This is caused by tests that are much more GPU dependent than CPU
dependent.
In the gaming benchmarks below, I first show Quake 3
for a little old-school fun on the left. I chose Quake 3 primarily
because it is the latest id Software sources released that you can
now compile from scratch. I compiled the game shown below from
icculus.org's sources labeled
1.34-rc3 using the compile options from the table above. On the
right, is the latest game for Linux, Enemy Territory: Quake Wars.
Both of these games are supposedly multi-threaded, but I'm uncertain
to what extent. My guess is two to three threads max.
In Quake 3, I almost broke the "magic" 1000FPS mark
and we one again see an increase in speed almost as we'd expect. It's
not quite as pronounced here because of how high we are in terms of
frames per second. As we move out of 640x480, we also see much less
dependence on CPU performance. The same goes for the graph on the
right. Notice the impressive boost in frame rate for clock speed at
640x480, while at 1024x768, we've pretty much neck and neck.
Summarizing the benchmarks, we see that no matter what the
application, the impact on performance is almost similar across the
board. We see a small boost for the new "Penryn" core and
then we see increased performance relative to the increase in clock
speed. It doesn't appear that any one application benefits from
"Penryn" over another.
Power Usage, Pricing, and Availability
To wrap up this review, let's take a look at power usage, price,
and availability.

In the graph above, we can see the power usage of each
system under idle and full loads. The idle category here does not use
any sort of clock throttling. The "Penryn" core is an
amazing piece of silicon when it comes to power consumption. Clock
for clock, the "Penryn" core used more than 20% less power,
and that's for more transistors and greater performance. In fact, the
"Penryn" power usage doesn't get up to the same level as
"Kentsfield" until it's about 1GHz clocked higher.
The pricing on all three of the chips presented here is
in the top of the line category from Intel. This means street price
on the QX6850 is about $1,000, the QX9650 is about $1,200, and we can
only expect that the QX9770 will be above that upon launch. The
availability on the QX6850 seems to be pretty high, especially with
the "Penryn" based chips now available. The QX9650 is
available from a few retailers and sold out at others. This is
probably just temporary with people rushing to get the new chip. The
QX9770 is a paper launch from Intel and seems to be out there just to
take away more of AMD's fire from the Phenom release. I don't have a
specific launch date from Intel other than to say Q1'08.
If you're looking for a cheaper processor from Intel,
the Q9450 processor, which will be clocked at 2.66GHz (8*333), is
right around the corner. It should be priced right above the $300
mark.
Conclusion
With the information presented above, we have a much more complete
picture of what we can expect from "Penryn" based
processors. For the same clock speed you get a slightly faster chip
with a lot lower power usage. The "Penryn" architecture
should take Intel through the next couple of years, and as I showed,
up through 4GHz and possibly beyond. The X38 motherboard from ASUS
was a joy to work with and presented a very stable platform. As
usual, the Intel chipset was 100% stable and actually had 100%
support for Linux.
While the price of the QX chips presented here may be off the
scale, keep in mind that the more affordable chips will be right
around the corner. At that point, purchasing a lower clocked chip
should be a great investment. These chips have a lot of headroom and
should last you a while, especially if you're not scared of a little
overclocking.
From here, I'm going to take a look at using GCC 4.3 for
optimizing the latest Core 2 processors and also take a closer look
at some of the specific components used in this review, starting with
the ASUS Maximus Formula motherboard. Expect the first of those
reviews to show up next week.
原文链接: http://www.linuxhardware.org/article.pl?sid=07/11/21/152234
|