Let’s face it. With the Arc Alchemist series, Intel wanted to prove to the world that their Xe blueprint was actually a workable design. One that could prove to be the foundation upon which a third party contender could be built… sometime in the future. As such, Intel did not shoot for the moon. Instead they realistically calculated the chances of a RTX 3090 “Killer” even breaking even, and then decided to target the 1080P crowd… as that is the resolution of what the majority of people game at. This meant good enough performance, at a good enough price tag was the main priority. Thus the A580 came with 8GB of RAM, and core clocks that were aptly described as ‘good enough’. Not overkill. Just… “good enough”.
With the second generation, the design team wanted to prove that their (pardon the pun) core philosophy was not only a workable solution, but arguably a good one. This is why their goals were not to satisfy the needs of just 1080P gamers but also 1440P gamers. After all, if you can provide ‘good enough’ performance at 1440P you will easily break into the good to great 1080P performance territory. This means gone is the 8GB of GDDR6 and its stead is 12GB of GDDR6. Which is arguably ‘good enough’ for 1440P modern games… and certainly more than that for 1080P gaming.
Sadly, going hand in glove with that boost to total onboard memory is the fact that the second gen BMG-G21 core’s memory controller’s bus is smaller than its predecessor. To be precise, when dealing with a 5-class B580 dGPU one only gets access to a 192-bit wide memory bus. Not 256-bit like its predecessor was rocking. Mix in the fact that it is the same GDDR6 frequency ICs being used… and the net result is the overall memory bandwidth is smaller. Arguably still ‘good enough’ for 1440P but this is a tad disappointing given the goal and targets of the BattleMage. Thankfully, the duopoly has all but written off the ‘low end’ and given Intel a free pass on this issue.
Take for instance AMD. At about 3bills, their Radeon RX 7600 is probably the closet in price to the B580 and it “features” a 128-bit bus paired with 8GB of GDDR6 RAM. It is not until you move up to the 4-5bill range and their more expensive Radeon RX 7700 XT (as the non-XT is still MIA) will you find 12GB offered on a 192-bit bus. NVIDIA is even worse. Their RTX 4060 (300 to 350 USD range) has an anemic 128-bit bus and 8GB of RAM. Hell, the RTX 4060Ti only (350 to 400 yanky bucks) 128-bits and while you can get 16GB options… 8GB is what it was designed around. So yeah. 192-bits plus 12GB is more than just good enough for the 1080P and 1440P market… or at least it is by both AMD and NVIDIA standards. So who are we to argue that it should have been 256-bits and 12GB… even if we sincerely hope the 7-class is 16GB on a 256-bit bus.
Moving on. The amount of RAM is not the only thing to change. In fact, we started with it because it probably is the least important change. Xe 2.0 brings multiple massive changes. All focused around trimming the fat while netting major performance gains. Albeit with a couple caveats.
First the caveats. For the first time in recent memory a dGPU manufacture is releasing a smaller core for a given class. For example, the A580 made use of six rendering slices (aka the basic building block Xe is based upon) and thus has/had 24 Xe cores + 24 Ray Tracing units + 96 ‘AI Acceleration” XMX engines. Thus a grand total of 3072 shader units (think ‘cuda cores’ for point of reference), 384 XMX cores (think ‘Tensor cores’ for point of reference), 24 Ray Tracing engines, 96 ROPS, and 192TPUs.
The B580 only has five rendering slices. Not six.
Since the Xe 2.0 core breakdown per slice is very… very similar this means the B580 is rocking only 20 Xe 2.0 cores. However, that is about where the similarities end. Yes, it means 2560 shader units vs 3072. However it means 160 XMX engines, and 20 Ray Tracing engines versus 384 + 24. On paper. Ooof. That… that is a massive reduction. Yet, Intel not only promises but down right boasts of a swole 50 percent or better overall performance gains. How have they cut this gordian knot and done what is seemingly a contradiction in terms? Low level optimizations and sheer brute horsepower. Put another way, what Intel did was look for ways to slice off nanosecond delays. For example, what would have taken 2 cores/slices/etc to render now takes ‘one’. With each “one” running much, much faster than before.
Put another way, think of the HEDT line Core i9-10900X and compare and contrast that ten core beast (for its day) to a modern desktop Core Ultra 200 series processor. Imagine a Core Ultra 9 285 with just its 8 p-cores and no e-cores. Thus ten vs eight cores… and yet one would be foolish to pick the higher core count 10gen HEDT’er over the newer Core Ultra 200! That is basically why Intel can promise, and deliver, on a smaller dGPU core doing more. Much more.
On the IPC front its in the high double digits as Intel’s Arc dev team have done major optimizations… and boosted the L2 cache from 8 to 18MB (and L1 is now 256KB!). Take for example the vector engines and XMX engines. In Alchemist series they were 256bits 1024bit based. In the new BattleMage era they are 512 and 2048bit based. So while the total number is ‘halved’ each one is larger and more efficient. Netting major IPC gains to say the least.
These major improvements in Instructions Per Clock cycle combined with more on-chip and off-chip RAM cache is then paired with a base clock of 2,670 (vs 1700!) MHz and a boost of 2.85Ghz (instead of a flat 2GHz). This means that each of these cores not only can do more per cycle… they are clocked ~30 percent higher. Intel claims “up to” 70 percent higher overall performance. In testing, that is being a wee bit optimistic but the low-level hardware (combined with massive software) improvements does allow this B580 to hit well above its price class and easily match a RTX 4060 and even sometimes (albeit rarely) a 4060 Ti. When compared against AMD… the company once known for their massive memory bandwidth cards? Fuhgeddaboudit. Their low-end is arguably ‘better’ than the Team Green option, but that is not the same as saying it is a good option.
These days all the above will net you some ‘nice’ to ‘very nice’ compliments… but Ray Tracing is still the new sexy and all modern dGPUs must support it. In NVIDIA’s case that means 24 to 34(4060Ti) Ray Tracing cores. In the case of AMD it means up to 54 (7700XT). As such, on paper a mere 20 is not disappointing. It is ‘disgusting’. In reality these new 3-pipeline based beasties are arguably as good as NIVDIA on a per core basis, and out and out smoke AMD. They smoke AMD so badly it really is only a two-horse race in RT land: Team Green… and Team Blue.
Since Ray Tracing hammers frames per second all modern dGPUs must also offer “frame generation”… aka fake it till you make it frames. AMD is once again barely worth looking at, and instead the only two serious options are Intel and NVIDIA. NVIDIA’s DLSS (Deep Learning Super Sampling) offers multiple approaches to make fake frames to keep FPS in the realm of reasonable. Intel now also offers multiple options.
To be precise when one puchases an Intel Arc dGPU one gains access to XeSS 2.0 Super Resolution (aka XeSS 2 -SR), XeSS Frame generation (XeSS-FG where it uses two prior real rendered frames and two different algorithms to make a… “inbetween” or “blending” frame… for point of reference think of the ‘soap opera effect’ on TVs for the level of smoothness that can be obtained via blending in faux frames between real frames), and Xe Low Latency (“XeLL” that overrides the game logic and allows for actions to be rendered earlier than the game engine would typically do things – for point of reference think how dogwater bad a poorly optimized PC game runs compared to a slick ‘n’ smooth highly optimized game. Most of the difference is in when ‘things’ get rendered by the game engine/dGPU… as the game engine might be using as low as 30 frames per second time slices… on your 120Hz+ monitor). Put another way Intel now offers good frame interpolation, excellent frame smoothing, and insane responsiveness.
More importantly, all three can be active at the same time. Of the three the last is actually the most exciting… as frame generation increases latency. Sometimes noticeably. Sometimes enough that ‘pro gamers’ typically turn that shite off and brute force it via 4090 level cards. Mere mortals do not have the luxury of solving a problem by throwing a mortgage payment at the problem.
As such everyone will be happy to know that with all three of these new technologies active Intel promises better than native latency. With better than native FPS. Of course, each game will have to have Xe SS 2.0 enabled… but given the fact “pro gamers” will get better than native latency we highly doubt many next gen games will not come with them as an option. Especially given the fact that the Software Dev Pack for XeSS now supports DX12, DX11 and Vulcan. Making it pretty much a no-brainer, (little to) no cost feature to enable.
“But wait! There’s more!” All these XESS software improvements are backwards compatible. Yes. Unlike Team Green that forces you to buy the latest gen to get the latest software stack… Intel is letting it work on gen 1 Alchemist models. Albeit… not as efficiently as XeSS2 was designed (and tested) around BattleMage. This is still eons better than the “other guys” though.
Overall the BattleMage is not just a beefed up Alchemist. Instead it really does represent the next generation of Intel Arc design. Now let’s see how it performs… as few care about the ‘how’ or the ‘why’, just that it does.