When it came to creating their first ever desktop Tile based processor series Ori Lempel (Intel engineer… and no. We have no idea if he is related to (may he RIP) Abraham Lempel of Lempel-Ziv compression algorithm fame) said it best “…Our (team’s) Mantra was simple. Remove any transistor from the design that doesn’t directly contribute to product goodness”.
Needless to say, this radical change in core architecture philosophy has lead Intel down some very interesting paths. Paths that Intel actually explored first in their “Meteor Lake” and then “Lunar Lake” Mobile processors… arguably making this “1st generation” Desktop CPU Tile design a ‘tock’ or even a ‘double tock’ and not a ‘tick’ it would appear to be. Which is pretty much how Intel did things when they introduced E-cores to the desktop world only after beta-testing them in the mobile world first. Conservative nature aside, these new branches and paths Intel have gone down are not all sunshine and lollipops. Some will be controversial. Some will be seen as the much needed change they are. All will be talked about in AMD RnD boardroom meetings… as Intel’s ‘Tile’ design is arguably already a more sensible, and saner, design than AMD’s 5th generation ‘chiplet’ design (aka Zen 5).
Firmly in the ‘why… just… just… why?’ category is the name change. Core-I branding lasted fourteen generations before getting a nice (if now in retrospect tarnished) swan song. Yes. The nomenclature was… quirky, but after Billions spent on marketing, and many years of use, everyone understood it. 9 meant flagship/enthusiast, 7 high end, 5 mainstream, 3 for low power systems. This was then followed by a four digit code with the first two signifying the generation, followed by two digits for where it landed in a given product stack (with 9x being the flagship option and 1x being “extreme value end of the marketplace”). Add in a letter at the end for special features… and bam. You could read a model number and understand it. If you had the decoder ring that is.
That is all gone and has been replaced. “Core -I” is now “Core”… followed by a designation. Right now Intel only has released their flagship “Ultra” options but future lower tiered models will swap the “Ultra” out for {To Be Officially Announced Later branding} to signify their lower than premium standing. Thus, its “Core Ultra” for the time being. This is then followed by not four digits but… 1+3. The first signifies the class. Once again 9…8…7…5…3 etc. Thus making it “Core Ultra 9/8/7/5/3 xxx”.
The next digit is for the generation. Which is a 2 even though this is technically the first desktop gen CPU series. Thus “Core Ultra 9 2xx” to signify this ‘first’ generation of Core Ultra series. Yeah. The second is where a given model falls in the product stack with a third digit… well… further demarcation, with high (e.g. “5”) being the better than a lower digit. Most likely this originally was to be used to nuke the whole KF vs K confusion and allow for easier assortment (as 285 is “obviously” higher than 280), but also allow Intel to add in a higher performance model at a later date.
As you may notice… for the time being these two digits only go to 8x and not 9x. Most likely this is planned so as to leave Intel room for later “cream of the crop” / premium factory overclocked edition models… that used to be dubbed ‘KS’ (and thus confused the hell out of non-enthusiasts), but now could be called (for example) a “295”. Rounding out the end you have a letter. K stands for unlocked core. Just like in the olden days of yesteryear.
All in all. Not as bad as some are making it out to be. In fact there is potential for a bit more sanity to Intel’s nomenclature. One that is sure to annoy enthusiasts and experts, but will indeed make it easier for novices to instantly grasp what a model means and thus (mostly likely) correctly guess where two different CPUs fall on the performance spectrum… as long as the comparison is not between a Desktop and Mobile CPU. We say this as right now a (Lunar Lake based) Core Ultra 9 288 V does exist, and with its 4+4 p/e cores (and low TDP) it will be smoked by even a Core Ultra 5 245 K. Thus destroying any and all the method to the madness that Intel so lovingly crafted to help novices. Thus “you win some, you lose some” does spring to mind when thinking about this massive (and expensive) change Intel undertook. In case you are wondering… yes. Yes, we are indeed still salty over this change.
Moving on. Firmly in the good arguably even greatness side of things. Tiles. Instead of a monothetic design, where all parts of the CPU are tightly woven together and laser etched on the same piece of sand, Intel has broken things up into a couple key components. Even here Intel is showing the proper way to break things up as it is a much more nuanced approach than what AMD takes. In AMD’s Zen architecture AMD relies upon (up to) multiple “small” CCX’s to create a “CCD” that is in turn connected to a ‘Big’ SoC chiplet… that houses everything else. For bigger core counts, they add in a more CCDs. Intel on the other hand has opted for a different approach. One that reverses the Big and Small component break down.
Let’s start with the SOC tile. Unlike AMD, this ‘System on a Chip’ Tile is a small(ish and we probably would call it a medium) Tile that basically handles the mundane tasks such as IMC and its PHY, interconnectivity of the other tiles… and Intel’s NPU generation 3 AI Accelerator. Sadly, unlike Meteor Lake, there apparently are no (block of 4) “Low Power Island” E-Cores to be found here… as the P and normal E-Cores are on the compute tile, and LP Ecores are MIA (and possibly DOA).
Intel are making a big deal over this AI portion of their processors and yet while it may offer 13 TOPS (Trillion Operations Per Second) that still does not meet Microsoft’s 40 TOPS base requirement for CoPilot local acceleration. Yes, this number is boost higher via other parts of Arrow Lake… but even the Core Ultra 9’s “peak” TOPS of 36 falls short of the 40 TOPS minimum standard. Thus, we are truly puzzled by Intel opted for older NPU 3 and not newer NPU 4… but in the words of the modern YT Scottish Bard who likes to imbibe a drink or twelve…. “Don’t Knoooow” and “Nah, it will be fine”. Yeah. Truly puzzling. Luckily a quick swap of the SoC could bring it up to NPU 4 or even cutting edge (aka not released yet) NPU 5 standards so as to actually bring the ‘win’ back in ‘WinTel’.
Interestingly enough, Intel has broken up the iGPU into more… pardon the pun Core components with some of the Ultra efficient parts being housed here in the SoC Tile. Namely the components that are noticeably impacted by system memory latency. These include the Display Engine (which allows for up to four 4K resolutions monitors to be connected), Display I/O (handles the standards the iGPU can understand and work with such as eDP 1.4/HMDI 2.1/DP 2.1), and Media Acceleration Engine (encode/decoders, scalers, color space converters, etc). Of these the Media Acceleration is the most noteworthy. As its native abilities to decode and encode modern video standards now not only include HEVC but VP9 and AV1. Better still this “QuickSync” hardware acceleration can do it for resolutions all the way up to 8K at 120 Hz and do both at (least at) 10-bit HDR depths. Noice… and sure to garner some sales in the Home NAS and HTPC arenas.
Also on the positive side, this SOC Tile’s DDR5 dual channel Integrated Memory Controller is now rated for DDR5-6400 and Intel’s Robert Hallock has publicly claimed it can handle DDR5-8000 so well it might as well be the ‘sweet spot’ for memory overclocking this generation (their term, not ours). Which if true is an amazing improvement given the fact the last gen’s IMC routinely had troubles with even 6400 speeds. With that said, we do however a sneaking suspicion that the rest of that claim’s sentence was
“…when paired with extremely pricey (and a bit gimmicky outsider Enterprise Server scenarios) CAMM2 / CUDIMM… and not the standard UDIMM form-factored DDR5 RAM”… that you know… us mere mortals purchase for our desktop systems where internal case volume is not at quite a premium? Ether way DDR5-6400 is a noice upgrade from the last gens IMC abilities.
Needless to say, that is actually a decent chunk of goodies… but as none require massive frequencies nor create massive amounts of overhead it has been built on a cheaper, but still better than the last Core (-I) generation, TSMC “N6” 6nm node. Which saves money and allows for a smaller form-factored GPU and Compute Tiles to be built on the costly 3nm process.
Connected to this small(‘ish… compared to AMD’s) SoC Tile is a dedicated “GPU” tile (also a “small” Tile)… which as the name suggest houses the integrated Graphics Processor Units. Interestingly enough, this Xe-LPG GPU based Tile is not as powerful nor as potent as what the mobile Arrow Lake-HX GPU tile boasts. Instead it basically is a rehash and downgrade of what Meteor Lake rocked. To be specific even the top end Core Ultra 9 285K’s iGPU is only rocking a mere 4 Xe cores and not 8 like Meteor Lake or mobile Arrow Lake-HX variants of this Tile have.
More puzzling still is these are Xe 1.0 and not Xe 2.0 based Xe cores. Thus only 64EUs (instead of 112), 512 shaders (instead of 1024)… and the only “AI acceleration” it can offer is DP4a (aka software… aaka DX12 Ultimate’s Shader Model 6.4) and not XMX (aka hardware …. aaka Intel Xe Matrix Extensions which rely upon Intel Arc Tensor Cores). Thankfully, it does still have Ray Tracing cores. Four of them (one per Xe core). Yeah. That certainly qualifies as ‘something’. On the positive side it is enough for DX12 Ultimate certification, which says a lot about that certification’s worth, but certainly not capable for playable frame rates at any resolution worth mentioning.
Mix in 4MB of L2 cache and a (up to) 2.0Ghz (though only 1.9 in the Ultra 5) clock speeds and yes… its better than what the 14th gen came with. Still it is mostly a mixed bag, even a bit disappointing in some areas, but still enough good overall to edge it into the ‘nice upgrade’ territory and out of the ‘meh’ zone. With that said, this is an iGPU that will not be cause for concern in AMD land. AMD still has a firm grip on iGPU performance in the desktop space (not to mention their fantastical (for certain scenarios) APU lineup). So hopefully future Core Ultra models will rock a tile worth of both the ‘GPU Tile’ and ‘Ultra’ titles.
The low level I/O interconnectivity features go into a(n aptly named) I/O Tile. While these interconnect features are very important (the least of which is the dual ThunderBolt 4 / USB 40G ports!!) none are what anyone would call “power hungry”. Thus this small Tile is baked until golden brown in an older TSMC 6nm node process oven.
This brings us to the star of the Tile show… as for a grand finale of Tile creation, Intel went and created a Big Tile they call a ‘Compute Tile’. This Compute Tile is not to be confused with AMD’s Core Complex Die. Intel is not copying AMD and gluing multiple sub components together to create one compute chiplet… err “Tile”. Instead, a single ‘Big’ Compute Tile contains all the P and E cores Arrow Lake has in one virtual monolithic block. Yes. Intel’s Big vs AMD’s Small approach to the critically important compute cores has numerous advantages for Intel, but an obvious disadvantage as well.
First the good. With only one (internal we may add) “ring bus” connecting all the cores together in the Compute Tile… the differences in latencies between the various cores is going to be a rounding error compared to AMD and its reliance upon an external (to the CCD) bus to transfer data from cluster of cores to cluster of cores. Thus fears over “Latency Storms” which still afflict (albeit not as often) AMD Zen 5 are certainly nullified and negated.
AMD will be quick to argue that smaller however is cheaper to make and will result in less Tiles/Chiplets failing to pass QA/QC. Of which Intel will agree with… as the other multiple Tiles are all ‘small’ and dedicated to much fewer tasks than (ironically) AMD’s Big SoC. Intel will simply disagree that this cost savings on their end is worth performance loss on your end. Considering the price of Arrow Lake has not increased for the 9 and has actually gone down (slightly) for the 5 model over their last gen counterparts… we personally don’t care about wastage, nor the potential for latency spikes on the iGPU due to it being broken up over multiple Tiles. We do however care about those latency spikes that invariably happen with some CPU workloads when they are run on Ryzen. Your Mileage May Vary as they say and you may come to having the exact opposite opinion.
In either case, Intel has not just created a Big Tile for their Compute Tile and called it good enough. They also took the time to optimize the P and E core blocks layout so as to reduce the latency the 13/14’th gen’s doubling from 2 e-blocks to 4 introduced. This optimization takes a multi-prong approach to latency reduction. First up, the E-Cores themselves boast everything from advanced predicative modeling to make… an… “educated wish” on what the next instruction operation will be… before it is even in the pipeline. This mixed with deeper cache queues and wider cache does indeed help.
However, the true magic comes from a change in the very P and E core layout. Yes, what was once just a rumor / S.W.A.G / clickbait has apparently become reality. That is to say that instead of blocks of P-cores and then blocks of E-Cores on the bus, Arrow Lake takes a more… mixed approach to P and E core layout. One where it is basically a P-Core followed by a block of four E-cores, followed by another P core. This ‘1-4-1’ approach guarantees that a E-Core block is only one ring bus stop away from a P-Core. What this all means is that Intel is trying to ensure as little latency impact occurs from the (3rd generation) Thread Director swapping a thread over to an E from a P-core – and vice versa. Now that is low-level optimization done right.
Moving on. Beyond cost and complexity, there is a second downside to using a monolithic-esque ‘Big Tile’ for the processor cores. This second downside is rather obvious: there is simply no room for a second Compute Tile on the “Base Tile”! For point of reference, and in laymen’s terms, this ‘Base Tile’ is not a tile perse; rather think of it as an advanced “breadboard” like substrate that all the actual tiles are arranged on… and interconnected and “glued” to.
This is a what the base tile ‘is’ and ‘does’, and is part of what Intel calls Foveros but is pretty similar to how AMD’s (actually TSMC’s) “CoW” does things(… which ironically Intel mocked for years). In either case, the Compute Tile is just that big and takes up so much real-estate on the “Base (non-)Tile” that Intel cannot include a second one until they node shrink the Tiles again… which may “nicely” explain the inclusion of blank tiles / filler tiles… or not. Either way, any increase to the P/E count beyond 8 and 16 will require a full Compute Tile redesign. This lack of flexibility is actually something AMD Zen design does not suffer from (i.e. smaller Ryzen’s have one CCD, bigger have two… and even the number of active CCXs in a CCD can vary). Thus this is a(nother) major difference between how AMD does things and Intel. Both with their own set of pros and cons.
Make no mistake just because the existing Compute Tile is made up of blocks of (up to) 2P + 4E does not mean all future Compute Tiles inside the 15th Gen Core series have to follow suite. They could, if the market demand it, make an all E-core Compute Tile, or all P-core Tile and create an entirely new Core Ultra processor. It will just be a costlier endeavor than simply gluing a second Compute Tile to the Base Tile would be.
Furthermore, they probably will at some point start using multiple Compute Tile designs. After all as the successful node process output increases over time, it makes little sense to laser cut / disable portions of a working 9-class Compute Tile to make a lower priced 7 or 5-class Compute Tile. Put another way, with the added flexibility Tiles brings to the table, it is possible that Intel follows Nvidia’s method of having multiple ‘Variants’ of a core design depending on the targeted market sector (e.g. AD102-xxx for the bigger cards vs. AD104-xxx for smaller). No matter what Intel eventually chooses to do… they now have a choice. Just not necessarily as inexpensive a choice as AMD and their Zen configuration.
In either case, this added flexibility over previous monolithic CPU architecture does indeed extend to the other tiles too. So if Intel realize they have a need for a better iGPU than the Xe 1.0 based one Arrow Lake is rocking right now… they just swap the GPU Tile out for say a “Battle Mage” based GPU Tile later and release a (please, please, please) “286” 9-class CPU.
Conversely, if they want to at a later date add in say “Killer” 2.5GbE (or skip 2.5 all together and go for 5GbE), or even WiFi 7 + BT 5.4 abilities (to give Desktop Arrow Lake CPUs the same abilities as the upcoming portable -H models) all they need to do is swap out the IO Tile for a different one. Furthermore, if they want to add even more onboard memory cache (aka Memory Side Cache) and have the IMC on the same tile so as to overcome LPDDR5 increased latencies… they can… and they have. As seen in the Lunar Lake mobile processor lineup which does precisely that. The possibilities are almost endless, and as future trends develop so too will the possibilities (e.g. maybe even a true AI Tile). Alternatively… Intel could yank out that empty ‘filler tile’ and stick a goodly sized chunk of “L3.5” cache there and create their own ‘3D V Cache’ variant.
To be clear. This is the level of ultra-granularity, via multiple Tile types, that Intel’s Tile technology offers. It is this level of ultra-agility, in ease of upgrading what technology a given Tile contains, that Intel’s Tile technology offers. It is this supreme flexibly on node sizes used that Intel Tiles offers. Above all else, it is the combination of all three that proponents of multicomponent CPU design have been promising for years… and what no one else has been apparently able to fully deliver in the consumer desktop space.
This brings us to a rather controversial change for Intel… and oof it’s a big’un. Unlike the promised (and fabled) Intel “20A” 2nm fab process being used for that cutting edge Compute Tile and GPU Tile… both are (and basically all the other tiles) are being fabricated by TSMC. Yes, that means the promised “(Intel) 2nm” 20A node process is not happening… again… or probably ever. Instead it is going to be on “(TSMC) 3nm” (most likely bog-standard N3Baseline and not newer N3Enhanced 3-2 node process). This is because Intel have basically decided to kill their 20A fab process and handed ~14+ Billion to TSMC to make use of their N3B fab for even more of the Arrow Lake’s production. The downside is obvious as this last minute change in node size had to have impacted expected temperatures, power consumption… and frequencies (more on that in a moment). All without time to really optimize things for the (much) bigger 3nm fabrication node.
Beyond that hitch this chance actually is a win-win-win scenario for consumers, Intel, and TSMC alike. The N3B fab was being underutilized as TSMC bet big on covid19 level sales continuing into the future… and now have a multi-billion fab facility sitting virtually idle due to the worldwide economic downturn. As such, this change in fabrication process helps turn a net negative into a net positive for (East) Taiwan Semiconductor Manufacturing Company (and their stock holders).
For consumers the upside is obvious. A radically different fab process, run by an entirely different company means ‘oxidation’ worries are moot. The upside for Intel? The nuking of the seemingly doomed for failure 20A Node is no longer draining resources and allows Intel to focus in on their 1.8nm “18A” fab process. Which will most likely play a critical role in the future of the industry in the coming years (for example Microsoft is already in talks with Intel to get some of that sweet, sweet fab time once it goes live… and help stick it to Sony and their overpriced PS7 “Pro” console). Furthermore, thanks to TSMC betting on a longshot (and losing) they are certainly not paying the 20K USD per wafer “asking price” that TSMC originally was demanding for N3B.
The downside (aka controversy) to this major departure for Intel? Well that is obvious. This is a major, major hit to Intel’s pride. Intel have always prided themselves on being a fabricator first and not a glorified design house that has to rely on others to make their processors for them (ala Apple or AMD). To put that another way (and to steal a famous saying by Jerry Sanders… once CEO of AMD) “Real Men Have Fabs”. So, yeah. This kick right square in their… pride underscores how serious they are taking the 13/14 gen degradation issue and making 100 percent certain it is a dead issue… as this this is the equivalent of nuking it from orbit to ensure it is not just dead but all its friends, and their friends’ friends are too. Now that is how you get clean edges and ensure a cancerous tumor is gone. Color us impressed with this level of dedication by Intel… which hopefully will translate into a more proactive stance for existing 13 and 14 gen owners in the coming months. Or not. Only time will tell.
Moving on to a more… granular look at Arrow Lake. Everything and we mean everything has changed between the two generations. Right down to the P and E core architecture used in that new Compute Tile. On the P-core front “Lion Cove” has replaced Raptor Cove… but more importantly Intel has finally gotten around to showing the E-cores some love via Skymont replacing the aging GraceMont E-core architecture. No one and we mean no one will consider these two massive architectural changes anything but a Great Thing. A Great Thing that was overdue.
As to specifics to what some of these changes mean, and firmly on the controversial side, it means another change of sockets. Thus, after three generations LGA1700 is EOL and owners are SOL if they want the latest Intel. Which should surprise no one. If it is one thing Intel is known for it is for changing sockets faster than some people replace their cars. We personally are ambivalent bordering on a “anti-recycling of sockets” stance (beyond 2 generations)… as the LGA1851 is a nice upgrade with major improvements. Enough improvements to actually warrant the socket change. We will circle back to these changes later, but in the meantime while we personally consider them large enough to justify the change… we completely understand where the ensuing confusion and possible anger will stem from. After all, LGA1851 and LGA1700 use the exact same 45×37.5mm formfactor.
The upside to this is existing coolers will work. Most without even a need for hardware adapter kits. Even then we doubt many will actually ever need an adapter kit as while the IHS to motherboard nominal z-height range is different… it really isn’t different enough. With LGA1851 the accepted z-height range is now 6.83mm to 7.49mm versus LGA1700’s 6.73 to 7.40mm. Which is 0.1mm (3.937 thou’ in freedom units) taller on the low end and 3.54 thousands of an inch (0.9mm in Euro Units) on the high end of the acceptable range. Which while we are talking electronics and not large block v8 standards… this is a rounding error that a decent Thermal Interface Material’s “squish factor” will easily cover.
The reason for this change in Z-height’s acceptable range is because Intel has taken major steps to fix the potential ‘bowing’ issue… and have done so via a factory take on Ye Olde Washer mod. What this mod entails is adding a bit of material (aka ‘washers’ though in Intel’s case they have gone with a non-conductive plastic strips configuration) under the corners of the socket mounting hardware so as to reduce the uneven pressure associated with the “one locking arm holds down a bracket that only has tabs in the middle to secure a long rectangular CPU” config Intel is stubbornly re-using. The upside is… very little pressure variances, much more consistent temperatures, and no need to purchase overly expensive, overly complicated mounting frames (unless you are OCD and want it to be as close to perfect as this one armed design can be). The downside is if you bought the (vastly superior if finicky to install) Thermal Grizzly contact frame that fixed this problem in LGA1700 systems… it (probably) won’t work with this new gen as the Z-height is juuuust different enough to make it incompatible. Almost as if Intel’s engineers are holding a grudge and dislike 3rd party companies showing them up.
Moving on. For some the loss of HyperThreading will be considered a major downgrade. After all, both Intel (and AMD) fully and freely admit that HT does boost overall performance by upwards of 30%. All at only the cost of noticeable bloating of the core’s footprint on the silicone, increased complexity (including for the baked-in and OS level schedulers)… and (once again using Intel documented numbers) 20% increase in power consumption. That is a lot to like, as few care about the hows and whys… just that the CPU delivers what it promises to deliever.
For others, this change will be seen as dead tree branching of a technology that is no longer needed, wanted, or even a good idea anymore. This is not 2002. It is not 2012. It is late 2024. A 22 year lifespan is methuselah levels impressive, but proponents of this move will argue its time had come to drop HyperThreading. After all, the days of only having even a “mere” eight threads are gone… as even the Core 5 has fourteen legit, core backed, threads on tap. So yes. For the average joe the need for virtual threads is indeed in the past and HT and its overhead (not to mention security concerns) can now be considered a detriment.
Proponents of HT’s death will further point to the fact that even when disabling them in the BIOS one does not get back all of that ~20% power overhead associated with HT being baked into the P-core’s design (and scheduler… and… and… and…), nor does the CPU magically make room for more cores/cache/goodies where that HT overhead resides. All of which translates to the fact that when HT is not baked into the silicon it allows Intel a ‘free’ 10% to 30% increase in performance optimized core density. Which also has a lot going for it. Not enough to gain a second group of P and E cores, but certainly pointing towards how Intel plans on doubling both P (16) and E (32!) core count in Nova Lake.
If we were to take a side in this debate, we would say that we are more in the latter than the former camp… but there is no denying that a reduction of 8 threads is “optically challenging” to say the least. Most people look at basic marketing numbers and nothing more… as they have busy lives and don’t care about the whys or hows. This “I aint got time for that” attitude is why the Frequency War rages on to this day. Rants aside on historic blunders of the industry, the fact remains that just like with the Core 10th gen flagship model versus the Core 11th generation flagship model… Intel has once again seemingly ‘downgraded’ their latest and greatest flagship 9-class processor. This time all the way down to Core 12th generation total thread count levels.
Thankfully, it is not an across the board gutting, nor all that terrible when compared to certain AMD Ryzen models. Yes. AMD’s flagship Ryzen 9 has 32 threads versus Intel’s latest and greatest (Core Ultra 9) “only having 24”… but that is still 16 real cores vs 24 real cores. Furthermore, stepping down a couple class models (and price) AMD’s Ryzen 5 only has a mere 6 cores / 12 threads compared to Intel Core Ultra 5’s fourteen cores and threads. Thus little… “TickTock edutainment” is needed to get the fact across that thread count is a less important metric than total processing power.
Moving on. Helping to balance out this HyperThreading removal is a new power management engine baked into the processor. In previous Core (-I) monolithic designs the onboard power management was… well thick as a brick. Not quite to the “it ate a lot of lead paint chips as a child” thick, but thick enough that AMD were well ahead of them in this regards. In a nut shell, when it was asked to change power delivery numbers it would read the (paltry few) onboard temperature sensors, existing voltage numbers, and clocks speeds… and sometimes go “Huurrr Duuur. According to my simple sensor array’s readings the temperatures are still under Tj Max and the ‘guard bands’ an Intel intern created while sleep deprived in the lab say it’s A-OKAY… so it must be okay. So sure lil buddy! You can have that 5 bajillion amps of voltage the motherboard wants to push to your delicate little pathways. I believe in you. You. Can. Do. It.”
Yeah. That needed to change… and should have back when AMD first started to show off their advancements in this critical area. Thankfully, the replacement is a much more advanced “AI self-tuning controller” that adapts to your usage scenarios in real time. No. This does not, we repeat does not, mean there is a little HAL 9000 hiding inside your CPU judging you on your browser habits and plotting your eventual demise. It’s just a bunch of highly advanced math based algorithms that judge the processor’s usage and adjust a lot more than it previously could. All in real time(ish). All using a much more advanced, and finely tuned sensor array to gather data.
Thus even if by some terrible DEI/ESG hire ‘miracle’ another micro-code SNAFU gets past QA checks… the new generation of Intel processors will not try and taser the logic pathways into oblivion. No matter if the motherboard wants to push 40000000000Amps or not. This low-level sanity check will not allow it. A little too late to save Intel’s reputation, but it should help restore it. Which is a Good Thing™ to say the least.
Furthermore, and arguably even more impressive, is this controller is not only going to be more rational in its decision making skills but also more precise. To be specific, Intel has granted it the luxury of not only being able to use a different base clock multiplier for the Compute vs SoC… they allow it ignore the 100Mhz base clock slices. Instead the P-cores (at least) can have their clock speeds adjusted in as little as a 1/6th the size slice of the base clock. Yes, that does in mean that Intel’s experience in discrete video cards it paying dividends as these Lion Cove P-cores can now modify their clocks in slices as small as 16.67Mhz. Thus, sudden thermal limiting, or sudden radical shifts in frequency shifts (that still occasionally plague AMD Zen 5 processors) are now a thing of the past.
Moving on. Firmly in the controversial side of the triple-beam scale is the dropping of DDR4 support. At this point in time, and with DDR5 prices down to reasonable levels, while we may dislike it… we do understand it. Legacy compatibility takes up room on the tile, it overcomplicates the memory architecture, and few will use it. In return for freeing up room via dropping beloved by bargain hunters DDR4 compatibility, Intel now has an IMC that can handle (up to) DDR5-6400 speeds natively. Furthermore, this simplification of the Integrated Memory Controller should allow for faster development and advancement on the IMC front in the coming generations.
This brings us to the big boi changes of the P-cores. The low level memory subsystem. AKA the cache and its controllers… and did Intel ever change a lot. In the continuing trend (bordering on being a theme) of Intel’s engineers doing things their own way… some of these changes are going to be controversial. Some are fantastically good. All are going to give AMD’s design team’s math geniuses fits, as they show just how outclassed their algorithms are when compared to what Intel can seemingly pull off on demand.
Before that, there is one major change that we must, must, highlight. That is the scheduling agent. In a nutshell these ‘agents’ are what decide what data is placed into cache, and when it is to be removed (aka ‘fall off’). For the last couple generations this was a unified, 5 port/way scheduler that handled both integer and vector maths. Now? Instead of that one large 5 port math scheduler, Intel has opted for a 6-port scheduler dedicated for integers (same number as Zen 5 albeit with radically different abilities per port) and a second 4 port scheduler just for vector. Yes. That is ten, count them ten schedulers… or a doubling of agents. All of which are packing smarter more refined algorithms than ever before.
Drilling down into the actual cache configuration of these P cores and we are greeted with… well… some “interesting” and highly controversial decisions. Most are in the good category. Some with cause eye-twitches in OCD sufferers over the naming convention Intel has decided upon. In previous generations you had L1 (“Level 1” low latency, small capacity cache closest to the cores). You had L2 (“Level 2” / mid-level cache… as it is “mid” in size, speed and placement), and you had last resort L3 (“Level 3” big but relatively slow being furthest from the cores). That, tried and true, standardization of cache nomenclature changes with Lion Cove… as Intel took a look AMD adding Level 3.5 / ‘3D Cache’ non-standard standard nomenclature and went ‘here hold my beer, and we will show you how to seriously improve performance while giving industry experts a real migraine”.
First up is what Intel calls “L0” or Level Zero cache. This is actually what was previously called L1 in Raptor Cove… as it is considered L1 cache by the rest of the industry. It just now has a new name to totally confuse experts and novices alike. Either way, in Raptor Cove L1 was 48+32KB (data/instruction) cache using (for its day) highly advanced branch prediction alogs but with a fairly slow 5 CPU cycle “load to use” (aka 5 cycle latency). With L0’s it is now FOUR cycles of latency… like it was back in SkyLake days (or in modern times… the same as Zen 5’s L1 latency). The L2 is now (up to) 3MB in size and yet only requires a single extra cycle of latency to go along with the increased size.
L3 cache capacity has gone from 30MB (or 24 with i5) to 36MB (though it is the same as last gen’s 24MB in the Ultra 5… as “Tiles Are Strange”). Which is bigger than Zen 5’s 32MB of L3 (excluding x3D models). Level 3 is once again shared across P and E cores so as to reduce latencies associated with I/O duties being handed off from a P to an E core (or vice versa). Either way, Intel’s L3 cache may be bigger, but it is still slow compared to AMD Zen (to be fair AMD has had to put a lot more RnD into L3 as their L1 and L2 algo’s miss a lot more often than Intel). Of course, as AMD’s Ryzen ‘x3D’ CPUs have proven… size does indeed matter, but what you do with it (sometimes) matters more. Here Intel has also upped their game with numerous branch prediction changes that make this 36MB L3 a bit more potent. Even if it is relatively dog slow and seriously needs some love.
Moving on. With all three ‘standard’ levels of cache covered, this is where the controversy enters the chat. The mysterious fourth cache level. No it sadly is not an off die L4 cache hidden away on another Tile. It is not even a Level 3.5 ‘x3D’ style cache addition to some Compute Tiles ala Zen x3D models (which we would love to see rocking Intel cache alogs on Intel CPUs). No its a middle cache… for the middle L2 cache. Yes. That is not a typo… and probably yes. This mid cache for the mid-cache fever dream was probably born out of some mad genius watching way too many Pimp My Ride episodes while sleep deprived… and Yes. We can almost see the board meeting where it was pitched “You like cache? Well here is some cache on your cache so you can cache while you cache!!! Trust me, experts will love it.”… that or they pulled an Oprah and went “And YOU get cache! And YOU get cache! Ya’all get cache!!!”
All mockery over naming and creation process aside, it is a good idea. Just with a terrible name. While Intel insists on calling this new cache “Level 1”, that is just not going to happen. This cache resides between the ‘new’ L0 (aka old L1) and L2 cache and thus some have dubbed it L1.5… which makes more sense to us than calling it L1. To be blunt, with 9 cycles of latency is too slow to be true Level 1 cache, but is a bit too fast for Level 2 (e.g. SkyLake’s L2 was 12 cycles, Raptor Cove aka last gen was 15 cycles). It is also a bit too big at 48+32KB (data/instruction) for most consumer CPU’s L1… and yet L2 usually is in the Megabyte range. Thus a nice, simple L1.5 compromise name makes more sense that trying to push a complete overhaul to the standards on to the industry.
Moving on. No matter what you call it, an extra layer of fast cache is always a good thing as it means much more chances to get things right… and thus boost IPC. This is why, arguably, a good chunk of the high single to low double digit IPC gains that Lion Cove brings to the table over Raptor Cove stem directly from this combination of deeper branch predictions and much, much more clever cache algorithms.
With that said things are not all sunshine and lollipops. We cannot prove it, but we do think the relatively last minute change from the slated 2nm to TSMC 3nm node size has led to a… downshift in frequencies offered. Once again. We cannot prove that, as we do not have a smoking gun, but either way some of the IPC gains are going to be “robbed” via the lowered P-core clock speeds they are rocking versus their 14th gen counterparts. We say this as while, YES(!), the base clock speeds have gone up noticeably (an extra 500MHz on the 9 to whopping 700Mhz on the Ultra 5) that number only really matters when a potato is being used to cool the processor. Give it some good cooling and the All Core and sometimes single active thread “Thermal Velocity Boost” / maximum boost (depending on model) are what matter the most in the real-world. For the Ultra 9 single thread speed is down by a massive 5 percent in total clock cycles per second (5.7GHz vs 6.0GHz… and it is probably going to be another generation before we see factory ‘KS’ models breaking the 6K barrier). For the Ultra 5 single thread speed is down by a more moderate 1.9’ish percent (5.2 vs 5.3Ghz) which still stings a wee bit.
Thankfully, these days the only place single threaded performance matters much is in some games and some applications. Outside of those caveats, the “All-Core(s active)” max boost is what contributes the most to making a system feel ‘fast’. Here, not nearly as much of the IPC is going to be obfuscated. Last gen 14900K had it set to 5.5Ghz this gens Ultra 9 285K has it set to 5.4Ghz. A mere 100Mhz / 1.82 percent difference. The same holds on the Ultra 5 front with the frequency change netting a sub two percent difference between the generations (5.2 vs 5.3). This still hurts the bottom line, but is not as bad as some AMD fanbois and click-baiters are going to make it out to be… even if it probably does stem from TSMC not having an equivalent 2nm process available for Intel to swap over to.
On the cooling front, Intel has made it clear that the days of insane-o temperatures and power requirements are slowly coming to an end (and may even be numbered). Motherboard manufactures could not handle the freedom and as such you are going to see real world reduction in electricity demand and heat output from most of the 15th gen models. Even if the Ultra 9’s PL2 is 250W (and the ‘Extreme’ is 259 watts) most are a bit saner (and even 259 is ‘sane’ by modern “9 series” standards). For example, the Core Ultra 5 245K’s PL2 is back down to a sane(r) 159 watts from the 181 of the last generation. This to us is good news. No one likes being forced to use cooling devices that cost nearly as much as the frickin’ CPU. Furthermore, while AMD is still playing “let’s all pretend staying at the Tj MAX is a good idea” games, Intel is at least trying to reign in the insanity that is modern computing temperatures.
Moving on to the arguable star of the Arrow Lake show: E-Cores. Ah Ye Olde rebranded, and repurposed, ATOM architecture based cores. “How we love thee. Let us count the ways”… actually let’s not. No one actually ‘love’, loves E-Cores. We just respect them and love having a ton of them to throw at problems (as the old saying goes “quantity has a quality all its own” and 16 cores quickly crushes a lot of problems). Excluding the fact that E-core count (just as with P-core count) has now not changed in three generations, E-Cores have noticeably changed for the better. So much that with frequencies of up to 4.6GHz and up to sixteen of these wee beasties it is getting harder and harder to consider them ‘efficiency’ cores. Rather, they are just Low(ish) Power cores with an incredibly small “Low Profile” foot print on the die. Intel engineers agree with that assessment that is why they started down the true “Low Power Island (LP)” E-core design… which would have (and still may in the future) bring core count type up to three.
We are not exaggerating when we say all the above. In this generation these ‘E-Cores’ are downright powerful with IPC improvements in the high double digits. This combination of massive IPC uplift with increased frequencies means that you can almost… under certain circumstances and loads… if you squint realllll hard… consider their IPC to be (somewhat) equal to 11th maybe even 12gen real / performance cores. To put that another way, and with less snark, in one generation things have gone from needing about 1.9 to 2’ish E-Cores to equal one Zen 5 core, to needing about 1.5 of them to get the job done. In fact, in certain edge case scenarios it is even closer to parity than that. Yes, that is how powerful and potent these low(ish)-power “e-cores” have become in a relatively short period of time.
Some will argue they needed to become this powerful as they are now going to be doing double duty: their original design goal of being low-level thread handlers and now also covering what was once handled (via HyperThreading) on the high-performance P-cores. That is indeed rather large shoes to fill, and they somewhat do. After all, a massive IPC improvement combined with a couple extra hundred ‘all core’ frequency boost (an extra 200MHz on the expensive chip and a whopping 600MHz on the cheap chip – which just somehow feels wrong… and backwards) results in a noticeably “peppier” core.
So yes, Intel did get a lot right with the new “Skymont” E-Core design… but they still left a lot of room for improvement. To be perfectly candid, to knock it out of the park Intel had to do all they did and have found a way to add in an extra block or two of them. They also needed to give them even more cache as these now actually fast E-Cores can still sometimes feel cache deprived. Even if their cache bus is now ‘double wide’ and they have more cache. Neither are big enough improvements to get the job done in all scenarios… just like the slower GraceMont E-cores suffered from in the 13 and 14th gen (albeit to a lesser extent). Put another way, when they are finally given more than enough cache and can blow past the 5Gig barrier they will indeed be “Over (AMD Ryzen) 9000!” in their abilities. Until then, they are still very good but it will be another generation or two before the seeds Intel planted back in the 12 gen fully bloom.
Moving on and circling back to the overall abilities of this new generation of Intel desktop processors… While last gens CPU 16 PCIe 5.0 lanes for the dGPU (and/or other things) and four (useable) PCIe 4.0 lanes for a lone NVME was a step in the right direction… it was starting to show its age. With Core Ultra 200 series we now have 16 (5.0) for the GPU and/or alternate PCIe AICs (i.e. it can now be natively split x16+0, x8+x8… or even x8+x4+x4). Then we get another eight lanes dedicated for PCIe storage. For a grand total of 24 direct to CPU PCIe lanes of goodness. The only catch is that only four of the eight lanes will be PCIe 5.0. Meaning that instead of 16 + 4 vs. 24, it is 16 + 4 vs. 20 +4. Which is better… but still lagging behind Ryzen 9000’s full 24 PCIe 5.0 lanes of high performance goodness.
With that said, having a full speed NVIDIA RTX 4000 (or 5000) series beast running in full x16 speed mode, plus a PCIe 5.0 rip snortin’ Crucial T705 (or Corsair MP700 SE Pro) also in full speed mode for the OS drive, with and a secondary Corsair MP600 or Crucial T500 SSD for your secondary ‘games’ drive is nothing to sneeze at. That is a lot of power hooked directly to the CPU. It is more than just ‘good enough’ for most. It is darn near perfect… and only enthusiast will cry (with a rebel yell) “More, more, more, more, more, MORE!”.
Circling back to the socket change debate. LGA1851 and its Z890 chipset is a very nice upgrade over Z790. In addition to being connected to the CPU via eight DMI 4.0 lanes (basically a bus width equivalent of PCIe 4.0 x8), Z890 brings to the table 24 useable PCIe 4.0 lanes (up from the 16x 4.0 + 8x 3.0) lanes. In addition to those tasty feature upgrades this next gen PCH has some other very… very nice upgrades. Take for instance the integration of WiFi 6E and Bluetooth 5.3 as well as an integrated Ethernet NIC (aka Media Access Control) that is good for 1GbE. Interestingly, all three (the WiFi, the BT, and the LAN) are going to come with (or at least available to the motherboard manufactures) “Intel Killer” software driver packages. Some will consider that a plus, some(… okay… most) will not. Either way, this is a major improvement over the Z790 series… and should help take some of the sting out of the $57 USD sting asking price.
Moving on. USB has also gotten a boost…. think up to 32 USB 3.2 ports worth of power is now natively supported via the combination of Z890 PCH and the CPU. Better still, this USB power can be configured pretty much anyway you want… within reason. For example motherboard manufactures can configure all that USB goodness into five USB 3.2 2×2 “20G” ports, plus (up to) ten USB 3.2 gen2 “10G” ports… and then have USB room left over for (up to) ten bog standard USB 5G ports and ‘up to’ fourteen USB 2.0 ports to round out a massive front and rear I/O feature set. So yeah. All we can say is “NOICE” (in our best Tacoma FD accent). Now if Intel had only upgraded the SATA from 8 to 12 (for NAS enthusiasts), opted to make it 2.5Gbe and WiFi 7 + BT 5.4 capable we would be truly happy with this new generation of Intel options.
Quirks aside, there is a lot to like, and a lot of improvements, with this new generation of Core Ultra 200-series processors and their accompanying flagship chipset.