AMD’s Zen: Hashing Intel’s Mellow
The reason that Intel has been able to outperform AMD’s processors can be summed up in one simple acronym: IPC. IPC or Instructions Per Clock is a simple, yet effective, method of judging how efficient a CPU design is as it does not matter how fast the core is clocked, it does not matter how many cores there are… as the IPC of a design will remain the same. Oh sure the overall performance will go up when comparing a 3Ghz to a 4Ghz version of the same design, but the actual number of instructions per clock cycle it can handle will be the same.
This is the area that AMD has fallen a bit behind in when compared to Intel. This is not because of a lack of talent and simply a lack of resources. This should come as no surprise as Intel literally has a yearly research and development budget in the ten-figure range. When you ‘you’ as a company can literally throw hundreds and hundreds of millions and dollars at CPU architecture design every year ‘you’ can release a new architecture a lot faster than a company with a lot less resources. This is why even though AMD’s Excavator team have refined the underlying design and reached performance levels they could have only imagined of when its ancestor Bulldozer was created Intel has been able to outpace them.
AMD really is the scrappy underdog of the computer world and their ‘make do with less’ attitude is why they are the only x86 CPU manufacture to still be standing and not be crushed into dust by the Intel juggernaut. So AMD knew they had a problem, they knew they were falling further and further behind – to the point where simply pushing the speed of the core would not let them stay competitive – and they decided to do something about it.
This is how Zen came about… and the dream of Zen was rather ambitious – to create a new design that had a 40% higher IPC than Excavator. To put this goal in to perspective they basically wanted to create a design that when running at a mere 2.8Ghz could perform the same as a 4Ghz Excavator core. Let the shear audacity sink in… and then realize that Intel is perfectly happy, and consider it a good return on their investment, when their new architecture is 5-10% better than their old one.
So did AMD succeed at their intended goal? Well yes and no. You see Zen is indeed did hit 40% improvement… but no AMD’s team did not stop there. Instead they created a monster that is 52% better. Yes, and with all other things being equal (which they are not… more on this later), a single 1.92Ghz Zen core will basically perform the same as a 4Ghz Excavator core. Basically, Zen is an AMD ‘Athlon 64 X2’ sized leap forward for them.
So how did AMD go about creating this miracle? They rethought about how they were designing CPU’s in the past and took a page from their APU and GPU design teams. At Zen’s heart beats a modular building block called a CCX or Cpu CompleX. Much like a AMD’s “Compute Unit” APU building block this CCX block is made up of a bunch of prefab parts. Want a bigger CPU… add more CCXs. Want a smaller one… turn off parts of a CCX. To be a bit more precise each ZEN CCX block houses four cores, 64K of L1 cache per core (256KB total), 512KB L2 cache per core (2MB total), and 8 MB of L3 cache that is shared across all four cores.
Unlike previous cores a Zen CCX has native simultaneous multi-threading (SMT) technology baked right in allowing each core to process two threads at a time… just like Intel cores… ish… as AMD’s SMT is arguably smarter/better/more efficient than Intel’s ‘hyperthreading’ design (right now). Put another way each four core CCX block can handle eight threads simultaneously. Yes no more ‘core and a half’ math needed when thinking about AMD processors as each core really is a full blown core that can handle all duties. Thus an 8 core Zen based Ryzen CPU is indeed an 8 core processor and not a 6 core with 2 more half cores for specialized work only.
Drilling down further each core in the CCX can not only handle two threads at once but AMD has given the Zen design a much-needed performance boost thanks to a much quicker instruction scheduler. To think of what an instruction scheduler is and does, imagine the I.S. is the foreman on a worksite directing what tasks go to what parts of the ‘job site’ and who will work on what (i.e. parts of the core). Basically, this foreman is a heck of a lot faster and smarter than its predecessor and a big reason why the IPC is 52% higher.
Backstopping this desperately needed scheduler is some really… really advanced predictive math for the cache buffer. You see a modern CPU vastly outstrips even the fastest external memory going so in order to keep the CPU actually working on tasks and not waiting for the equivalent of a cement mixer to come from across town, the CPU itself has to ‘pre-order’ data to fill its ‘on-site’ buffer (i.e. it’s extremely fast L cache buffers). This is called ‘pre-fetch’ and if the pre-fetcher gets it wrong the CPU cycles that were used on going down that rabbit hole are wasted and the buffer may even need to be flushed before the CPU can do the actual work that comes in next. Basically, when it works the CPU hums along at max capacity and the overall system performance is outstanding… but when it gets it wrong often the system gets ‘slow’.
AMD’s improvement in this predictive analysis is called Neural Net Predication and is based on some pretty advanced deep learning and machine intelligence science. No its no SkyNet, or even StarTrek’s Data… but it is a lot better than its ancestors. How it stacks up against Intel and their mega-budget remains to be seen… but AMD states the CPU ‘learns’ your habits and adjusts its pre-fetch routines based on your usage pattern… which is a little creepy… as do you really want the CPU knowing your web viewing habits… and judging you based on them?! In all seriousness, it doesn’t judge it just does what is best for your habits… no matter how odd they may be.
Zooming out a bit, a Zen CPU (or the eventual Zen APU’s for that matter) does not need to rely upon just one CCX building block… or even a full building block. For example a Ryzen R7 CPU makes use of two full CCX’s, but smaller ones (like the APU ‘G’ series) only need one or one and a half. When more than one is used these CCX’s are connected via what AMD calls Infinity Fabric… but can be considered AMD’s next generation processor interconnect that sits on top of the CCX’s and allows them to communicate with each other. In other words, you can consider it a modified/improved/etc version of AMD APU’s Heterogeneous System Architecture (HSA) idea. This additional layer is however why you can expect Ryzen ‘G’ processors with built in GPUs to land in the future… and carry on the AMD APU tradition.
Now as this CCX is modular all the way down to the core level AMD can easily disable 1, 2 or even 3 cores in a CCX to create smaller core-count Zen processors. For example, disable one in each CCX and you have a ‘6 core Zen’ (higher Ryzen R5 models). Disable two in each CCX and you have a ‘4-core Zen’ CPU (AKA Ryzen 3 and lower Ryzen 5). This not only allows AMD a lot of flexibility in creating tailor made Zen processors for a wider range of the marketplace, it allows them to use Zen Ryzen cores that don’t quite pass muster at the full CCX level in the lower Ryzen series – more on this later in the review.
So in summary Zen is a lot faster at single tasks, is a heck of a lot smarter, is much more modular and overall a radical improvement for a single generation. This is exactly what AMD needed and we are glad to see that is what they did.
XFR and Precision Boost
Ryzen is choke full of performance improve nets with everything from native SMT – that is arguably better than Intel’s ‘HyperThreading’ – on down to eight physical cores. There are however two new technologies that are really, really exciting and deserve to be highlighted and showcased. These two technologies are of course XFR and Precision Boost.
Let’s start with Precision Boost. For many, many years now CPU’s come with two speed specifications: stock and boost. In a nut shell the ‘stock’ speed is what the CPU can do on all cores all the time, while ‘Boost’ (or ‘Turbo Boost’ as Intel calls it) is a higher clock speed that a couple of the cores can do. The CPU does this by increasing the multiplier a couple extra levels on an ‘as needed basis’ but usually only on a couple cores. In practical terms this means single and dual threaded tasks get more speed and
for people who are unwilling to manually adjust the multiplier… the CPU does it for them. Of course, the boost in multiplier is a lot less than what you can do manually but it too was a game changer back when it first hit the scene.
The downside is that this ‘boost’ is extremely course in its adjustments as it only can go in 100Mhz– ie 1 CPU multiplier – increments. Thus on systems that are right at the ragged edge of cooling the boost number is a pipe dream. This is how Intel does it and how AMD did it. Ryzen on the other hand is a new and much more elegant take on things as Precision Boost allows for much, much finer grained core speed control.
This is because, just as the Ryzen design team borrowed their ‘building block’ idea to processor design from the APU and GPU team, the design team also borrowed precision frequency adjustment from their GPU team. Basically, the boost speed can vary from core to core in real time based upon actual loads upon the cores, can be adjusted in 25Mhz increments, and allows up to 0.6miliVolts of additional voltage to be fed to each core… all in real-time.
Of course, there is a few caveats to this new and improved Boost. First and foremost, Precision Boost is smart enough to actually not fry the CPU cores. So while it will Boost speeds there is two hard limits baked in. First it will start lowering CPU core speed (especially on those that don’t need the extra performance) when the CPU starts demanding 128 Watts of power (in the case of the 95 TDP modes… expect it kick in much lower with the lower TDP variants). The other one is actually more troubling for the average consumer… as Precision Boost will start lowering core speed once Tcase (i.e. the CPU temperature via its integrated thermistors) hits 60-degrees C.
Sixty degrees is not all that hot and for most consumer it will be the real limiting factor. Thus cooling is very, very important to overall system performance and consumers who over-build with a high performance cooler will be rewarded with better real-world performance compared to those who stuck the cheapest cooler they could find on it. Lastly Precision Boost speed is different from all core boost speed. Basically, don’t expect to get a Boost speed of say 4.0Ghz on all cores all the time. The CPU would probably pull more than PBoost allows. So its really more like a 1800x or 1600X will routinely do 4.0Ghz on two cores, with others running at lower core clocks… and maybe do more than 2 cores if power draw and cooling limitations are not hit first. This is still pretty darn decent as PBoost does adjust each core based on individual needs in real time so the cores that need the speed now get it while cores that don’t… don’t and needlessly raise the overall CPU temperature or pull more electricity than they need.
AMD however did not just stop there as they also included an entirely new boost mode. Basically on top of Precision Boost AMD also offers XFR. XFR, or Extended Frequency Range, is designed solely for low core demand scenarios and only does a small boost on the max Boost specification. Basically Ryzen ‘X’ models will hit 100 (most X models) to 200 Mhz (oddball ones like the 1500X) higher than their Boost rated speed, while non-X models will only go 50Mhz (most) to 100Mhz (oddball ones like the 1600) faster. Of course, cooling is once again the real limiting factor. So if you routinely are seeing 60-degree plus temperatures… swap out the cooler for a bigger, better model… and it in theory the CPU will reach its ‘true’ potential.
XFR is indeed a potential game changer but 50 to 200Mhz at most (depending on the model in question) is a bit of a disappointment… and something AMD did not do a great job of explaining in prelaunch. Which to be fair is to be expected as it is brand-new way of looking at things and they didn’t want to give Intel time to prepare.
Will this change much as time goes by and Ryzen matures? Possibly. But AMD has to ensure that all Ryzen’s can do it – even the ‘worst’ of them. Thus, XFR is not going to (entirely) replace manual overclocking… as with manual you the owner find and set the core speed limit. As such, XFR is very interesting technology but it does have that ‘fresh from the oven’ smell. Its enticing, but even it in conjunction with Precision Boost it may leave you wanting even more.