To start our moderately deep dive into what makes the 13th generation tick let us start with the things that have stayed the same. First and foremost is Intel is still using their 10nm (aka Intel ‘7’ aaka ‘Intel 7 Ultra’) fab process for the 13th generation. Intel themselves admit this will be the third generation to use their 10nm process… but honestly the 10nm Tiger Lakes were a “blink and you missed them” level event. As such the ‘recycling’ of 10nm is not overly disconcerting but hopefully is the last generation to use it. Not out of any reasoning over the 10nm being bad, rather for the simple reasons that the faster Intel gets on to a ‘7nm’ (aka Intel ‘5’) fab process the lower the chances of a “Return of the Memes 2: The 10nm++++++++++++++++++++++ Memememememememing…ing” event happening.
In either case, Intel has improved their 10nm SuperFin Transistor design for the 13th Generation. So much so that they are claiming a greater than 50mv improvement at a given frequency. In simplistic terms this directly translates into a couple hundred additional Megahertz worth of performance and is how/why Intel allows (up to) 8 of the P-cores to run at a stock 5.4Ghz frequency (aka almost what the last gen ‘KS’ ultra-binned could Thermal Velocity Boost to)… and allows for a godly 5.8Ghz single and dual core Thermal Velocity Boost setting (if you have the cooling to keep the wee beasties below 73c).
Interestingly enough, Intel has applied a smidgen of these improvements in the iGPU department. While yes, it is the same UHD 770 series as seen in the 12th generation, the 32 execution units are clocked a bit higher. In the case of the i9 they run at 1.65Ghz (instead of 1.55 as in the previous generation) and in the i5’s case they run at 1.5GHz instead of 1.45. Honestly though… this change is so minor as to be nothing but a footnote. Few i9 owners will ever use the iGPU outside of troubleshooting a GPU gremlin, and even i5 users would be hard pressed to notice any change in Frames Per Second while running their favorite slideshow… errr… “game”.
(image courtesy of Cardyak and the SiliconGang)
The overall architecture of the ‘Raptor Cove’ Performance-Cores also is not dramatically changed from the previous generations ‘Golden Cove’. As with the previous generation Raptor Lake uses up to eight of these performance cores in 2 blocks of 4 configuration (i9 – the i5 has 2 cores disabled/laser cut). However, there are multiple low-level performance improvements baked right in.
The most obvious change is in the L2 and L3 cache size. In (12th gen) Alder Lake the p-cores each had 80KB of L1 cache, 1.25MB of L2, and access (up) to 30MB of shared L3 cache (with two p-cores and a block of e-cores disabled the i5-12500K only has/had 20MB total L3). In Raptor Lake the p-cores may still “only” have 80KB of L1 per core, but L2 has been increased to 2MB per core and L3 has been increased (up) to 36MB (including the L3 via e-cores… and once again the 13th gen i5 with two cut p-cores and two blocks of e-cores ‘only’ has 24MB L3 in total). Put another way. Raptor Cove cores come with an across the board increase of 60 percent in their L2 cache, and a twenty percent increase in L3 cache capacity.
To help put those numbers into perspective the last “great” (… especially at being a space heater) Intel HEDT was the Core i9 10980XE. That ~1K (USD) CPU only had 64KB per core of L1 (8-way), 1MB per core L2 (16-way)… and only 24.75Mb total of L3 (that was 11-way). A mere three years and a HEDT processor designed for mega-calculations seems underequipped compared to a consumer Core-I series.
The cache capacity is not the only changes baked in. Once again L1 has not been dramatically improved over its predecessor with a 12(data)/8(instruction)-way configuration (though there are some extremely low-level tweaks that would take a white paper to go over). Nor has L3 changed all that much (though once again a lot of low-level tweaks have been made such as “Dynamic INI” which is a new dynamic inclusive/non-inclusive mechanism that monitors and adjusts in real-time what should be cached based on real-time active workload). Instead L2 is where Intel obviously focused much of their attention. In the 12th gen the p-cores L2 made use of a 10-way set-associative cache configuration… and had about a 15 and a smidgen cycle latency penalty. With Raptor Cove its now sixteen way. On the surface most knowledgeable enthusiasts will cringe at the thought of the latency penalty that ‘must’ go along with such a combination of increased size and increased complexity. In reality… Intel has done some major reworking and the latency is only an extra cycle(ish). Considering the latency penalty of going to L3 to fetch data vs. an added cycle of staying in L2… everyone will agree that an extra cycle is well worth it. Furthermore, the L2 cache algorithms have gotten an upgrade with a new dynamic prefetcher algorithm that Intel called ‘L2P’. This new (AI-learning based) algorithm can adjust the L2 prefetcher’s behavior… and can adjust it in real-time. Intel claims it alone can improve the efficiency of the cache by ~16 percent.