Roughly six months ago, we covered the debut of Ashes of the Singularity, the first DirectX 12 title to launch in any form. With just a month to go before the game launches, the developer, Oxide, has released a major new build with a heavily updated benchmark that’s designed to mimic final gameplay, with updated assets, new sequences, and all of the enhancements to the Nitrous Engine Oxide has baked in since last summer.
Ashes of the Singularity is a spiritual successor to games like Total Annihilation, and the first DirectX 12 title to showcase AMD and Nvidia GPUs working side-by-side in a multi-GPU configuration.
The new build of the game released to press now allows for multi-GPU configuration testing, but time constraints limited us to evaluating general performance on single-GPU configurations. With Ashes launching in just under a month, the data we see today should be fairly representative of final gameplay.
AMD, Nvidia, and asynchronous compute
Ashes of the Singularity isn’t just the first DirectX 12 game — it’s also the first PC title to make extensive use of asynchronous computing. Support for this capability is a major difference between AMD and Nvidia hardware, and it has a significant impact on game performance.
A GPU that supports asynchronous compute can use multiple command queues and execute these queues simultaneously, rather than switching between graphics and compute workloads. AMD supports this functionality via its Asynchronous Compute Engines (ACE) and HWS blocks on Fiji.
Asynchronous computing is, in a very real sense, GCN’s secret weapon. While every GCN-class GPU since the original HD 7970 can use it, AMD quadrupled the number of ACEs per GPU when it built Hawaii, then modified the design again with Fiji. Where the R9 290 and 290X use eight ACEs, Fiji has four ACEs and two HWS units. Each HWS can perform the work of two ACEs and they appear to be capable of additional (but as-yet unknown) work as well.
The exact state and nature of Nvidia’s asynchronous compute capabilities is still unclear. We know that Nvidia’s Maxwell can’t perform anything like the concurrent execution that AMD GPUs can manage. Maxwell can benefit from some light asynchronous compute workloads, as it does in Fable, but the benefits on Team Green hardware are small.
The Nitrous Engine that powers Ashes of the Singularity makes extensive use of asynchronous compute and uses it for up to 30% of a given frame’s workload. Oxide has stated that they believe this will be a common approach in future games and game engines, since DirectX 12 encourages the use of multiple engines to execute commands from separate queues in parallel.
Test setup and performance:
We tested both the AMD Fury X and the Nvidia GeForce GTX 980 Ti in a Haswell-E system with 16GB of DDR4-2667 and Windows 10 with all updates installed. AMD distributed a new driver for this review, Nvidia did not — we used the WHQL 361.91 driver, released on 2/16/2016 for our performance testing.
We confined ourselves to DirectX 12 testing this time out, but Anandtech did coverDX11. The performance data there suggests that both AMD and Nvidia improved in all modes. Nvidia continues to outperform AMD in DX11 compared to DX12, but the gap is much smaller than it was previously. As before, however, DirectX 12 gives Nvidia no performance improvement over and above DX11.
We tested Ashes of the Singularity in three detail modes — High, Extreme, and Crazy and with asynchronous computing enabled and disabled to measure the impact on AMD versus Nvidia cards. The feature is enabled by default.
We’re going to show you results first with asynchronous compute enabled versus disabled, then by resolution.
With asynchronous compute disabled, AMD’s R9 Fury X leads the GTX 980 Ti by 7-8% across all three detail levels. Enable asynchronous compute, however, and AMD roars ahead, beating its Nvidia counterpart by 24-28%. The GeForce GTX 980 Ti’s performance, in contrast, drops by 5-8% if asynchronous compute is enabled. This accounts for some of the gap between the two manufacturers, but by no means all of it.
Let’s shift to 4K and check performance there:
Higher resolutions have often favored AMD cards, and this is no exception. With asynchronous compute disabled, AMD GPUs are still running 11-15% faster than their Nvidia counterparts. Enable async compute, and that gap doubles — the Radeon R9 Fury X is no less than 31-33% faster than the Nvidia GTX 980 Ti. Given how the Fury X struggled out of the gate, that’s got to be a welcome sight for Team Red.
Is Ashes of the Singularity biased?
Ashes of the Singularity is the first DX12 game on the market, and the performance delta between AMD and Nvidia is going to court controversy from fans of both companies. We won’t know if its performance results are typical until we see more games in market. But is the game intrinsically biased to favor AMD? I think not — for multiple interlocking reasons.
First, there’s the fact that Oxide shares its engine source code with both AMD and Nvidia and has invited both companies to both see and suggest changes for most of the time Ashes has been in development. The company’s Reviewer’s Guide includes the following:
[W]e have created a special branch where not only can vendors see our source code, but they can even submit proposed changes. That is, if they want to suggest a change our branch gives them permission to do so…
This branch is synchronized directly from our main branch so it’s usually less than a week from our very latest internal main software development branch. IHVs are free to make their own builds, or test the intermediate drops that we give our QA.
Oxide also addresses the question of whether or not it optimizes for specific engines or graphics architectures directly.
Oxide primarily optimizes at an algorithmic level, not for any specific hardware. We also take care to avoid the proverbial known “glass jaws” which every hardware has. However, we do not write our code or tune for any specific GPU in mind. We find this is simply too time consuming, and we must run on a wide variety of GPUs. We believe our code is very typical of a reasonably optimized PC game.
We reached out to Dan Baker of Oxide regarding the decision to turn asynchronous compute on by default for both companies and were told the following:
“Async compute is enabled by default for all GPUs. We do not want to influence testing results by having different default setting by IHV, we recommend testing both ways, with and without async compute enabled. Oxide will choose the fastest method to default based on what is available to the public at ship time.”
Second, we know that asynchronous compute takes advantages of hardware capabilities AMD has been building into its GPUs for a very long time. The HD 7970 was AMD’s first card with an asynchronous compute engine and it launched in 2012. You could even argue that devoting die space and engineering effort to a feature that wouldn’t be useful for four years was a bad idea, not a good one. AMD has consistently said that some of the benefits of older cards would appear in DX12, and that appears to be what’s happening.
Asynchronous computing is not itself part of the DX12 specification, but it’s one method of implementing a DirectX 12 multi-engine. Multi-engines are explicitly part of the DX12 specification. How these engines are implemented may well impact relative performance between AMD and Nvidia, but they’re one of the advantages to using DX12 as compared with previous APIs.
Third, every bit of independent research on this topic has confirmed that AMD and Nvidia have profoundly different asynchronous compute capabilities. Nvidia’s own slides illustrate this as well. Nvidia cards cannot handle asynchronous workloads the way that AMD’s can, and the differences between how the two cards function when presented with these tasks can’t be bridged with a few quick driver optimizations or code tweaks. Beyond3D forum member and GPU programmer Ext3h has written a guide to the differences between the two platforms — it’s a work-in-progress, but it contains a significant amount of useful information.
Fourth, Nvidia PR has been silent on this topic. Questions about Maxwell and asynchronous compute have been bubbling for months; we’ve requested additional information on several occasions. Nvidia is historically quick to respond to either incorrect information or misunderstandings, often by making highly placed engineers or company personnel available for interview. The company has a well-deserved reputation for being proactive in these matters, but we’ve heard nothing through official channels.
Fifth and finally, we know that AMD GPUs have always had enormous GPU compute capabilities. Those capabilities haven’t always been displayed to their best advantage for a variety of reasons, but they’ve always existed, waiting to be tapped. When Nvidia designed Maxwell, it prioritized rendering performance — there’s a reason why the company’s highest-end Tesla SKUs are still based on Kepler (aka the GTX 780 Ti / Titan Black).
It’s fair to say that the Nitrous Engine’s design runs better on AMD hardware — but there’s no proof that the engine was designed to disadvantage Nvidia hardware, or to prevent Nvidia cards from executing workloads effectively.
Ashes of the Singularity launches in a month. It’s going to be a major DX12 data point for several years, at least, and we don’t yet know if the shift to that API means that more engines will move to using asynchronous compute or not. It’s certainly possible, particularly given that both the Xbox One and PS4 can make use of asynchronous compute already.
For now, we recommend treating these results as an interesting example of how a new API can open up performance capabilities and breathe new life into older hardware. While time constraints prevented us from testing older AMD or NV cards, data we’ve seen suggests that AMD GPUs see advantages from async compute across the company’s entire product stack. It’s not a miracle cure for an otherwise-slow card, but it gives a solid benefit.
If you already own a GeForce card, we still recommend waiting before rushing out to buy new hardware. Both AMD and Nvidia have 14nm refreshes coming this year, and relative rankings could change depending on the architectures of the new cards. For now, however, AMD seems to be gaining more from the DX12 shift than Nvidia is — the Fury X is an absolute titan in Ashes of the Singularity.
Update: (2/24/2016) Nvidia reached out to us this evening to confirm that while the GTX 9xx series does support asynchronous compute, it does not currently have the feature enabled in-driver. Given that Oxide has pledged to ship the game with defaults that maximize performance, Nvidia fans should treat the asynchronous compute-disabled benchmarks as representative at this time. We’ll revisit performance between Teams Red and Green if Nvidia releases new drivers that substantially change performance between now and launch day.