Breaking News
recent

AMD Bulldozer (FX) CPU Review

Introduction
It seems like we have been waiting a long time for AMDs Bulldozer architecture to reach the masses (fans of Duke Nukem Forever will contend that 2 years is not a long time to wait) and today we can finally bring our readers a review of the top end Bulldozer (Zambezi) processor to be released initially, the AMD FX-8150.

As can be seen from the above logo, the FX brand is a big deal for AMD. It was the Athlon FX almost a decade ago that heralded the overtaking of Intel's performance crown and AMD are no doubt hoping for a repeat performance now. Bulldozer is a completely new architecture and represents the top end of their desktop range with Llano making up the middle and Zacate the netbook and ultra portable segment.
Continuing in their "Black Edition" series, all Desktop Bulldozer processors will be fully unlocked as standard. This contrasts with Intel's strategy of charging a premium for their unlocked K series of processors. The design is based around a number of "Bulldozer Modules" containing two cores and some additional components we will discuss in detail below. The benefit is a hugely scalable architecture being initially offered in 2, 3 and 4 module configurations (4, 6 and 8 cores respectively) although we have already seen server versions with up to 16 cores per CPU. How far and how linearly this can be scaled is unknown at present.


Processor Architecture
”Bulldozer” was designed to balance performance, cost and power consumption on multi-threaded applications. The architecture focuses on high-frequency and resource sharing to achieve optimal throughput and blistering speed in next generation applications. AMD FX Processors offer up to eight high-performance, power-efficient cores. These represent the First generation of a new execution-core family from AMD.

Other specs include:
  • – 128 KB of Level1 Cache, 16 KB/Core, 64-byte cacheline, 4-way associative, write-through
  • – 8 MB of Level2 Cache, 2 MB/”Bulldozer” module, 64-byte cacheline, 16-way associative Shared L2 Cache
Integrated Northbridge which controls:
  • – 8 MB of Level3 Cache, 64-byte cacheline, 16-way associative, MOESI
  • – Two 72-bit wide DDR3 memory channels
  • – Four 16-bit receive/16-bit transmit HyperTransport™ links
When evaluating design ideas for the next generation x86 processor core, AMD engineers looked at ways of optimizing core power and area. Analyzing the bursty nature of today’s PC applications led engineers to look for a way to maximize peak bandwidth across the different cores, and maximize the use of silicon area through the use of shared modules.
The result was to design dual core building blocks that would effectively optimize the resources within the processor. Functions with high utilization (such things as Integer pipelines, Level1 data caches) are dedicated in each core.
The other units are now effectively shared between two cores and include: Fetch, Decode, Floating point pipelines, and the Level2 cache This design allows two Cores to each use a larger, higher-performance function unit (ex: floating point unit) as they need it with less total die area than having separate, smaller function units for each Core.

The floating point unit in “Bulldozer” has also undergone a complete re-design. It has been improved to support many new instructions and has been redesigned to allow resource sharing between Cores. There are two 128-bit FMACs shared per module, allowing for two 128-bit instructions per Core or one 256-bit instruction per dual Core module.
On forward looking benchmarks, the new floating point unit is at its best, able to perform quick 128bit instructions, as well as support acceleration of FMA and XOP operations. Applications using older floating-point instructions are typically unable to take advantage of the full performance of the floating-point unit, which is optimized for the newer FMAC instructions.
The front-end unit is responsible for driving the processing pipeline, and was designed to make sure that the Cores are constantly fed with information. It has been designed to work with each dual core unit, and allocate threads to individual cores themselves. AMD has made heavy changes that include decoupled predict and fetch pipelines, as well as prediction-directed instruction prefetchers. A Prediction Queue can manage direct and indirect branches that are now fed with a L1 and L2 Branch Target Buffer, which stores destination addresses.
Bulldozer modules can decode up to 4 instructions per cycle, (vs 3 on AMD Phenom™ II processors).
The prediction pipeline produces a sequence of fetch addresses. The Fetch pipeline does a look up in the instruction cache, and pulls 32 bytes per cycle into the fetch queue which feeds the decoders.
Bulldozer uses a physical register file (PRF) which is a single location that holds the register results of executed instructions. This reduces power by eliminating unnecessary data movement and data replication (keeps one copy instead of broadcasting the data).
Each Core is equipped with a 16 KB Level 1 Data cache, a 32-entry fully associative DATA TLB, and a fully out of order load/store – capable of two 128-bit loads per cycle or one 128-bit store per cycle. Each dual Core module includes a 2 MB 16-way unified L2 cache with an L2 TLB capable of 124 entry, 8 way that services both instruction and data requests. “Bulldozer” supports up to 23 outstanding L2 cache misses for memory system concurrency.
Finally AMD has designed a shared 8 MB L3 cache with 64 way associativity for both cores in a Bulldozer module.
AMD Turbo Core technology has been enhanced for AMD FX processors to include a new mode that can turbo ALL Cores for a time where there is extra TDP headroom to allow it. This enables new highly threaded scenarios to take advantage of extra frequency. AMD Power Manager inside the CPU monitors the processor states.
Max Frequency mode is activated on lightly threaded applications, by increasing frequency on half the cores. AMD has enhanced the highest frequency level of AMD Turbo Core Technology to remain in a higher frequency state than in previous AMD Phenom™ II Processor. The result is better performance in single and lightly threaded applications.
 

The CPU 
Physically the FX-8150 is the same size as the Phenom 2 package but requires a socket AM3+ motherboard (these new motherboards started appearing some weeks ago so should be in ample supply today). A few manufacturers have indicated that their existing AM3 motherboards will be compatible with Bulldozer after a BIOS update but a senior AMD engineer we spoke to about this a few weeks ago was adamant that Bulldozer should only be run on a socket AM3+ motherboard.

Changing sockets is not taken lightly by AMD and is usually the sign of a radically different architecture, which is indeed the case today. Future roadmaps show Bulldozer cores with integrated graphics (likely to be substantially more powerful than the HD 6550D found in Llano) and this may require yet another socket shift further down the road. When we get more details from AMD we will pass them on to our readers.
So how do these new offerings compare to what is already available?
 
  Manufacturing Process Cores Transistor Count Die Size
 AMD Bulldozer 8C 32nm 8 ~2B 315mm2
 AMD Thuban 6C 45nm 6 904M 346mm2
 AMD Deneb 4C 45nm 4 758M 258mm2
 Intel Gulftown 6C 32nm 6 1.17B 240mm2
 Intel Nehalem/Bloomfield 4C 45nm 4 731M 263mm2
 Intel Sandy Bridge 4C 32nm 4 995M 216mm2
 Intel Lynnfield 4C 45nm 4 774M 296mm2
 Intel Clarkdale 2C 32nm 2 384M 81mm2
 Intel Sandy Bridge 2C (GT1) 32nm 2 504M 131mm2
 Intel Sandy Bridge 2C (GT2) 32nm 2 624M 149mm2
Despite this being the first CPU range on AMDs new 32nm process die size is almost as large as the previous generation. This is due to a 2 Billion transistor count on the real estate of the chip and reflects the fact that sharing a few resources between pairs of cores does not save a lot of space. This contrasts with Intel's Hyperthreading which gives the impression of extra cores while only duplicating about 5% a core. That said, the die shrink alone has allowed AMD to more than double its transistor count over Thuban and have a smaller die.
 
Processor Cores Clock Speed Max Turbo NB Clock L2 Cache TDP Lowest UK Price (inc. VAT)
 AMD FX-8150 8 3.6GHz 4.2GHz 2.2GHz 8MB 125W £195
 AMD FX-8120 8 3.1GHz 4.0GHz 2.2GHz 8MB 95W/125W £165
 AMD FX-8100* 8 2.8GHz 3.7GHz 2GHz 8MB 95W ?
 AMD FX-6100 6 3.3GHz 3.9GHz 2GHz 6MB 95W £140
 AMD FX-4170* 4 4.2GHz 4.3GHz 2.2GHz 4MB 125W ?
 AMD FX-B4150 4 3.8GHz 4GHz 2.2GHz 4MB 95W ?
 AMD FX-4100 4 3.6GHz 3.8GHz 2GHz 4MB 95W £95
 AMD X6 1100T 6 3.2GHz 3.6GHz 2GHz 3MB 125W £145
 AMD X4 980 4 3.7GHz N/A 2GHz 2MB 125W £120
As with the release of their 6-core Thuban range last year what surprises us, even before looking at performance, is the aggressive pricing. The 8-core FX-8150 is price to compete with the i5-2500K  which does not have Hyperthreading so only a straight 4 cores. The FX-8120 is an interesting choice. The default clock speed is lower but the max turbo is almost as high as its bigger brother and it can get by with a TDP of only 95W not to mention being a lot cheaper.
The 6-core and 4-core offerings are both cheaper than the corresponding high end parts. We asked AMD if there would be a price reduction on the Phenom 2 ranges and were told "no", the aim is to phase these out so the high end processors are FX and the entry range is Llano (desktop Fusion). This leaves Zacate for ultra portable and Netbook segments. It makes sense and will reduce inventory SKUs.
The other FX processors listed are not yet launching so pricing is unknown.
AMD are also releasing their own brand of liquid cooling system to go with the FX range. They claim an average overclock to be about 4.6GHz on air and 5.0GHz on water. Our AMD water cooling kit did not arrive in time so we used an Antec 620 Kuhler for testing. The premium for this water cooling will be around $100 and should be available as a retail boxed set with FX processors.



The FX Platform
The FX platform itself has become an important branding focus for AMD. Formerly codenamed Scorpius, it comprises a 9-series chipset, an FX CPU and a Radeon 6800 series (or higher) GPU.

This provides a consistent baseline for consumer expectations so for future games we may see recommended specs listed as "FX Platform" or "Scorpius Platform" rather than a long list of processors and graphics cards. In our testing we used a Radeon HD 5850 so a Radeon HD6850 or above will be at least as fast and using this as our minimum allows us to evaluate in-depth the experience of playing a game on the FX Platform. We could not do this for every game so we used Deus Ex: Human Revolution as our test subject. Instead of just benchmarks we provide a detailed review of the game running under the FX Platform a little later in this article.
The motherboard AMD sent us was an ASUS Crosshair V Formula and proved to be very capable and stable. It provides 2 16x PCI-E channels for a more balanced Crossfire setup and a host of overclocking functions.
It's good to see a larger number of USB3 ports although still no headers for internally connecting a front panel USB3 connector.


Overclocking
Sandy Bridge proved to be tricky to overclock despite high expectations of Intel's 32nm process and are some of the most locked down processors in Intel's history (with only two unlocked models being offered at a premium). AMD have taken a completely different and route and all of their FX range is unlocked much to the joy of PC enthusiasts.

We were not able to explore overclocking potential in the time available for this review but were able to run Furmark for roughly 20 minutes at about 5GHz without problems although we did need to disable Turbo Core. The specs above are at default speeds and the list of supported instructions is quite impressive.


Test Setup

Test Configuration

System Hardware

CPU

Intel Core i7-2600K, i7-2500K and i7-870
AMD FX-8150
Motherboard
ASUS Maximus III Gene and Intel P67/H67 Motherboards
ASUS Crosshair V Formula
CPU Cooler
Corsair H50
Corsair H50
RAM
Kingston KHX2133C8D3T1K2/4GX 4GB at 2000MHz DDR3 Non-ECC

CL8 (Kit of 2) Intel XMP Tall HS CAS 8-8-8-24
Kingston KHX2133C8D3T1K2/4GX 4GB at 1866MHz DDR3 T1 Series Non-ECC

CL8 DIMM (Kit of 2) XMP CAS 9-9-9-24
Graphics
ATI Radeon 5850 HD
ATI Radeon 5850 HD
Hard Drive
Maxtor 300GB SATA-2
Maxtor 300GB SATA-2
Sound
SupremeFX X-Fi built-in and Intel HD Audio
Realtek® 1200 8 -Channel High Definition Audio CODEC
Network
Gigabit LAN controller
Realtek® 8112 Gigabit LAN controller
Chassis
Antec 902 Midi Tower Case
Antec Skeleton
Power
Antec TruPower 750W
Antec EarthPower 1000W

Software

Operating System
Windows 7 Professional
Windows 7 Professional
Graphics
ATI Catalyst 10.3
ATI Catalyst 10.3
Chipset
Intel P55, P67, H67
AMD 890FX
Applications
  • SiSoft Sandra 2009
  • 3D Mark 2011
  • 3DMark Vantage Pro
  • PCMark Vantage Pro
  • Everest Ultimate
  • CPU-Z
  • Far Cry 2
  • HAWX
  • Resident Evil 5
  • Stalker:COP
  • Lost Planet 2
  • Mafia 2
  • Street Fighter 4
  • AvP
  • SiSoft Sandra 2009
  • 3D Mark 2011
  • 3DMark Vantage Pro
  • PCMark Vantage Pro
  • Everest Ultimate
  • CPU-Z
  • Far Cry 2
  • HAWX
  • Resident Evil 5
  • Stalker:COP
  • Lost Planet 2
  • Mafia 2
  • Street Fighter 4
  • AvP
All games are tested at the maximum available settings and initially at 1024x768 so we can be sure of hitting CPU limitations before bandwidth or fill rate ones related to the GPU. We selected Far Cry 2 (first person shooter), HAWX (air combat) and Resident Evil 5 (horror) for our tests as they are reliable titles that are suited to benchmarking and run well on modest systems. DX11 titles include Stalker: Call of Pripyat, Lost Planet 2, Mafia 2, and Street Fighter 4. A special in-depth evaluation of Deus Ex: Human Revolution is also conducted to show the capabilities of the FX (Scorpius) platform.

Test Results - SiSoft Sandra
We start with synthetic benchmarks. While they don't represent real-world performance, they are vital to understand what the potential capabilities of processors are and identify any bottlenecks.

The FX-8150 is clearly distinguishable for AMD's previous generation and beats the Intel i5-2500K as well as the older i7-870. This is the first time any serious competition has been offered by an AMD processor to Intel's dominance in this benchmark.

Here the FX-8150 wins in Integer performance and only loses marginally to the more expensive i7-2600K in the FP score. Given the sharing of floating point resources between pairs of cores on Bulldozer, one cannot help but wonder if this part of the reason for the red bar not being longer.

The new memory controller is highly optimised and beats all comers. We'd still like to see support for more than 1866MHz speeds but since memory speeds have only a minor effect on real world performance (readers would be advised to first go for more memory before worrying about speed as it will have a greater impact) we can't really grumble.

Test Results - Everest Ultimate Edition 
Everest is a very comprehensive benchmark suite that is set to take the synthetic crown from SiSoft Sandra. We limited our testing to the CPU and FPU benchmarks provided.

CPU Queen is a simple integer benchmark which focuses on the branch prediction capabilities and the misprediction penalties of the CPU. It finds the solutions for the classic "Queens problem" on a 10 by 10 sized chessboard. CPU Photoworx is an integer benchmark that performs different common tasks used during digital photo processing. CPU Zlib is an integer benchmark that measures combined CPU and memory subsystem performance through the public ZLib compression library. CPU ZLib test uses only the basic x86 instructions, and it is HyperThreading, multi-processor (SMP) and multi-core (CMP) aware. CPU AES is an integer benchmark that measures CPU performance using AES (a.k.a. Rijndael) data encryption. It utilizes Vincent Rijmen, Antoon Bosselaers and Paulo Barreto's public domain C code in ECB mode.
The FX-8150 is well ahead of its previous generation brethren and beats the i5-2500K. Of particular note is the AES performance and shows the use of hardware encryption that will be valuable for future E-Commerce.

The FPU Julia benchmark measures the single precision (also known as 32-bit) floating-point performance through the computation of several frames of the popular "Julia" fractal. The code behind this benchmark method is written in Assembly, and it is extremely optimized for every popular AMD and Intel processor core variants by utilizing the appropriate x87, 3DNow!, 3DNow!+ or SSE instruction set extension.
The FPU Mandel benchmark measures the double precision (also known as 64-bit) floating-point performance through the computation of several frames of the popular "Mandelbrot" fractal. The code behind this benchmark method is written in Assembly, and it is extremely optimized for every popular AMD and Intel processor core variants by utilizing the appropriate x87 or SSE2 instruction set extension.
The FPU SinJulia benchmark measures the extended precision (also known as 80-bit) floating-point performance through the computation of a single frame of a modified "Julia" fractal. The code behind this benchmark method is written in Assembly, and it is extremely optimized for every popular AMD and Intel processor core variants by utilizing trigonometric and exponential x87 instructions.
Here the race is much closer and the 6-core Thubans manage to win over the FX-8150. This may seem puzzling at first but when we remember that a single floating point unit (FPU) is shared between two Bulldozer cores, the FP results are much easier to interprate.

Test Results - 3D Mark and PC Mark
Of much more interest to gamers is 3D Mark Vantage and this is the de facto standard for synthetic 3D graphics benchmarks for a wide variety of gaming types.

 
In terms of everyday use, the only significant advantage brought by the new generation of processors is in the Communications suite. The ramge of applications here is extensive and readers will probably be more interested in their favoured activities (gaming, Skype, Itunes etc.)

 
Comparing the scores from 3d Mark 11 shows no difference in performance although the i7-2600K has more headroom. The reason for the FX Platform becomes clear as the top end Fusion GPU cannot compete in this demanding test. High end graphics are required to go with a capable CPU to achieve good results - The FX Platform would be a good choice here.

 
3DMark is a computer benchmarking tool created and developed by Futuremark Corporation (formerly MadOnion.com and initially Futuremark) to determine the performance of a computer's 3D graphic rendering and CPU workload processing capabilities. Running 3DMark produces a 3DMark score with higher numbers indicating better performance. The 3DMark measurement unit is intended to give a normalized mean for comparing different PC hardware configurations (mostly graphics processing units and central processing units), which proponents such as gamers and overclocking enthusiasts assert is indicative of end-user performance capabilities.
With the exception of CPU scores, there is no difference between the FX-8150 and the Intel K series processors. Given the pricing of Bulldozer, all the Phenom II range has become obsolete.
 
Test Results - DX9 Titles
 
This game is very playable on any CPU over the last few years but the FX-8150 in particular manages to give the best performance at high resolutions (1080p) where it counts.

 
HAWX is a title that favours AMD processors and the FX-8150 leads the pack here. The use of extra cores benefits the user and hopefully we will see developers find ever more ingenious ways to use extra cores.

 
As DX9 horror games go, Resident Evil 5 is very playable on any processor and the FX-8150 in particular gains a huge advantage over older AMD processors.

Test Results - DX11 Titles
Now we look at a host of newer titles that support DX11. As usual we try to have the settings maxed out and see who falls by the wayside.
 
This is a very demanding title and both Intel's and AMD's top contendors manage to barely pass the 30fps mark. The A8-3850 is there for comparison and shows the futility of expecting an entry level solution to compete in a recent DX11 game.

 
A good game that runs at above 60fps on both high end processors and is actually playable on the Llano at low resolutions.

 
A good beat-em up with great visual effects and quite playable on any DX11 platform.

Stalker:COP takes place soon after the events of S.T.A.L.K.E.R.: Shadow of Chernobyl. After Strelok disables the Brain Scorcher, many Stalkers rush to the center of the Zone, hoping to find artifacts and treasure. The military decides this is the perfect time to take control of the Zone, and launch "Operation Fairway," a large scale helicopter recon mission intended to scout the area by air. Unfortunately, the mission goes horribly wrong, and all five STINGRAY helicopters crash. The player, Alexander Degtyarev, is sent into the Zone to investigate the crash sites on behalf of the army.
The Sun Shafts test is very demanding but the game is otherwise playable with either the FX Platform or the Intel equivelant.

Once we engage Hardware Tessellation and Contact Hardening Shadows, there is a slight performance drop but this is more than offset by the greatly improved visuals.

 
Very little to distinguish between the three processors.

Deux Ex: Human Revolution
This detailed review can now be found here.

Conclusion
The Bulldozer (Zambezi) has finally landed. Some may be disappointed that its performance alone is not able to crush Intel at stock speeds but most will look at the price tag and then performance that in many cases matches or exceeds Intel's best offerings and conclude that it's rather a good purchase. Once unrealistic expectations are set aside it's easy to see that the current performance is good enough and the future potential for successive models is considerable.
With the Bulldozer (Zambezi) launch AMD has now ushered in a top to bottom range of processors to suit all pockets. More importantly, Bulldozer represents the future direction for AMD and is key to their Desktop Strategy. Whatever the reasons, the Bulldozer modules will now be put together to make CPUs with an even number of cores. Cost savings in design and fabrication from this standardised "building block" approach are hard to estimate.
Performance is good and the Bulldozer shines under heavy workloads and the nifty Turbo Core makes up for low thread count performance. It's an interesting architecture that really needs high clock speeds and AMD have certainly done that by leaving it up to the consumer to overclock the FX range - even going so far as launching their own water cooling kit to help out. Large UK retailers such as Aria are offering overclocked FX-8150 processors running at 4.8GHz for a small premium (£229 instead of £195). We can't say for certain that our sample was typical but if vendors like Aria, who sell hundreds of CPUs per day, can offer such a high overclock in volume and guarantee it for a year then statistically the FX-8150 has a huge headroom for overclocking. Running at 4.8GHz there should be few if any benchmarks that the FX-8150 does not win against an Intel i7-2600K let alone an i5-2500K.
Will it herald a return to the good old days of the Athlon64 FX range? Probably not but it gives AMD a few months of sharing the performance crown until Ivy Bridge launches next year. Bulldozer's successor, Piledriver, is also due for release late next year and may bring some interesting tweaks. Until then, anyone considering a new processor or new PC build would be advised to look at the FX Platform. The aggressive pricing and performance make the AMD FX-8150 an ideal choice for any type of PC user. Those looking to build a PC based around the FX Platform (Scorpius) will find themselves with a tacit guarantee of at least a certain minimal level of performance and assurance of compatibility while still having the flexibility to choose from a wide range of processors and graphics cards to fit their budgets.
Maximilianus

Maximilianus

Tidak ada komentar:

Posting Komentar

max_the_hack_boy.CORPORATION. Diberdayakan oleh Blogger.