Tag Archives: nvidia

Nvidia unveils Pascal specifics — up to 16GB of VRAM, 1TB of bandwidth


Nvidia may have unveiled bits and pieces of its Pascal architecture back in March, but the company has shared some additional details at its GTC Japan technology conference. Like AMD’s Fury X, Pascal will move away from GDDR5 and adopt the next-generation HBM2 memory standard, a 16nm FinFET process at TSMC, and up to 16GB of memory. AMD and Nvidia are both expected to adopt HBM2 in 2016, but this will be Nvidia’s first product to use the technology, while AMD has prior experience thanks to the Fury lineup.

HBM vs. HBM2

HBM and HBM2 are based on the same core technology, but HBM2 doubles the effective speed per pin and introduces some new low-level features, as shown below. Memory density is also expected to improve, from 2Gb per DRAM (8Gb per stacked die) to 8Gb per DRAM (32Gb per stacked die).


Nvidia’s quoted 16GB of memory assumes a four-wide configuration and four 8Gb die on top of each other. That’s the same basic configuration that Fury X used, though the higher density DRAM means the hypothetical top-end Pascal will have four times as much memory as the Fury X. We would be surprised, however, if Nvidia pushes that 16GB stack below its top-end consumer card. In our examination of 4GB VRAM limits earlier this year, we found that the vast majority of games do not stress a 4GB VRAM buffer. Of the handful of titles that do use more than 4GB, none were found to exceed the 6GB limit on the GTX 980 Ti while maintaining anything approaching a playable frame rate. Consumers simply don’t have much to worry about on this front.

The other tidbit coming out of GTC Japan is that Nvidia will target 1TB/s of total bandwidth. That’s a huge bandwidth increase — 2x what Fury X offers — and again, it’s a meteoric increase in a short time. Both AMD and Nvidia are claiming that HBM2 and 14/16nm process technology will give them a 2x performance per watt improvement.

Historically, AMD has typically led Nvidia when it comes to adopting new memory technologies. AMD was the only company to adopt GDDR4 and the first manufacturer to use GDDR5 — the Radeon HD 4870 debuted with GDDR5 in June 2008, while Nvidia didn’t push the new standard on high-end cards until Fermi in 2010. AMD has argued that its expertise with HBM made implementing HBM2 easier, and some sites have reported rumors that the company has preferential access to Hynix’s HBM2 supply. Given that Hynix isn’t the only company building HBM2, however, this may or may not translate into any kind of advantage.

HBM2 production roadmap

With Teams Red and Green both moving to HBM2 next year, and both apparently targeting the same bandwidth and memory capacity targets, I suspect that the performance crown next year won’t be decided by the memory subsystem. Games inevitably evolve to take advantage of next-gen hardware, but the 1TB/s capability that Nvidia is talking up won’t be a widespread feature — especially if both companies stick to GDDR5 for entry and midrange products. One of the facets of HBM/HBM2 is that its advantages are more pronounced the more RAM you’re putting on a card and the larger the GPU is. We can bet that AMD and Nvidia will introduce ultra-high end and high-end cards that use HBM2, but midrange cards in the 2-4GB range could stick with GDDR5 for another product cycle.

The big question will be which company can take better advantage of its bandwidth, which architecture exploits it more effectively, and whether AMD can finally deliver a new core architecture that leaps past the incremental improvements that GCN 1.1 and 1.2 offered over the original GCN 1.0 architecture, which is now nearly three years old. Rumors abound on what kind of architecture that will be, but I’m inclined to think it’ll be more an evolution of GCN rather than a wholesale replacement. Both AMD and Nvidia have moved towards evolutionary advance rather than radical architecture swaps, and there’s enough low-hanging fruit in GCN that AMD could substantially improve performance without reinventing the entire wheel.

Neither AMD nor Nvidia have announced a launch date, but we anticipate seeing hardware from both in late Q1 / early Q2 of 2016.

Tagged , , , , , , , ,

SteamOS, Ubuntu, or Windows 10: Which is fastest for gaming?

For years, game support on Linux has seriously lagged behind Windows, to the point that the OS was basically a non-option for anyone who wanted to game on a PC. In recent years, that’s begun to change, thanks to increased support for the OS via Valve and SteamOS. From the beginning, Valve claimed that it was possible to boost OpenGL performance over D3D in Windows, and it’s recently put a hefty push behind Vulkan, the Mantle-based API that’s a successor to OpenGL.

Two new stories took OpenGL out for a spin compared with Windows 10, on a mixture of Intel and Nvidia hardware. Ars Technica dusted off their Steam machine for a comparison in the most recent version of SteamOS, while Phoronix compared the performance of Intel’s Skylake Core i5-6600K with HD Graphics 530. The results, unfortunately, point in the same direction: SteamOS and Ubuntu simply can’t keep up with Windows 10 in most modern titles.


Ars tested multiple titles, but we’ve included the Source-based results here, because these are the games that the industry titan has direct control over. In theory, Valve’s own games should show the clearest signs of any OGL advantage, if one existed. Obviously, it doesn’t — L4D2 shows similar performance on both platforms, but TF2, Portal, and DOTA 2 are all clear advantages for Windows 10.

That doesn’t mean Linux gaming hasn’t come a long way in a relatively short period of time. All of these titles return playable frame rates, even at 2560×1600. There’s a huge difference between “Windows 10 is faster than Linux,” and “We can’t compare Linux and Windows 10 because Linux and gaming are a contradiction in terms.” It’s also possible that Valve is throwing most of its weight behind Vulkan and that future games that use that API will be on a much stronger footing against Windows in DX12 titles.

The penguinistas at Phoronix also took Windows and Ubuntu out for a spin with Intel’s HD Graphics 530 and a Skylake processor. Again, the results are anything but pretty for Team Penguin — while some titles, like OpenArena, ran nearly identically, most 3D applications showed a significant gain for Windows 10. Again, driver support is a major issue; Intel’s Linux drivers remain limited to OpenGL 3.3, though OpenGL 4.2 support is theoretically forthcoming by the end of the year. Under Windows, OGL 4.4 is supported, which gives that OS a decided advantage in these types of comparisons.

A complex situation

There are two, equally valid ways of looking at this situation. First, there’s the fact that if you want to game, first-and-foremost, Windows remains a superior OS to Mac or Linux, period, full-stop. There is no Linux distribution or version of Mac OS X that can match the capabilities of Windows for PC gaming across the entire spectrum of titles, devices, and hardware — especially if you care about compatibility with older games, which can be persnickety in the best of times.

That conclusion, however, ignores the tremendous progress that we’ve seen in Linux gaming over a relatively short period of time. There are now more than a thousand titles available for Linux via Steam. If you’re primarily a Linux user, you’ve got options that never existed before — and as someone who hates dual-booting between operating systems and refuses to do so save when necessary for articles, I feel the pain of anyone who prefers to game in their own native OS rather than switching back and forth.

Furthermore, it’s probably not realistic to expect Valve to close the gap between Windows and Linux gaming. Not only does that assume that Valve can magically control the entire driver stack (and it obviously can’t), it also assumes that Valve does anything within a 1-2 year time frame (it doesn’t). The launch of Vulkan means that Linux users will get feature-parity and very similar capabilities to DX12 gamers on Windows, but Nvidia, AMD, and Intel will need to provide appropriate driver support to enable it. Hopefully, since Vulkan is based on Mantle, AMD will be able to offer support in short order.

In short, it’s not surprising to see that Windows still has a strategic and structural advantage over Linux, and we shouldn’t let that fact obscure the tremendous progress we’ve seen in just a handful of years.

Tagged , , , , , , , , ,

Nvidia confirms G-Sync displays trigger massive power consumption bug

Nvidia’s G-Sync has been getting some press of late, thanks to a fresh crop of monitors promising new feature support and Nvidia’s push to put the technology in more boutique laptops. We’ve seen a number of displays with higher refresh rates hitting market recently, but there’s a bug in the latest driver sets and how they interface with the Asus ROG Swift PG279Q. Apparently, increasing refresh rates can cause steep increases in power consumption — and the bug doesn’t appear to be monitor-specific.

PC Perspective tracked down the problem and found it’s linked to G-Sync when running at high refresh rates. At or below 120Hz, the GPU sits comfortably at 135MHz base clock. Push the refresh rate above 120Hz, however, and power consumption begins to spike. PC Perspective believes the problem is linked to pixel refresh rates — the base 135MHz frequency isn’t fast enough to refresh a display above 120Hz, but you don’t need a GPU running full bore to handle a 144Hz refresh rate, or the 165Hz that the Asus panel can deliver.

Today, Nvidia confirmed the bug to PCPer and announced that it would have a fix in the pipeline in the near future. According to Nvidia, “That new monitor (or you) exposed a bug in the way our GPU was managing clocks for GSYNC and very high refresh rates. As a result of your findings, we are fixing the bug which will lower the operating point of our GPUs back to the same power level for other displays.”

We don’t have a G-Sync display with that high of a refresh rate to test, but we did pull an older 1080p Asus monitor out to check if the issue is confined to G-Sync. Even at 144Hz (the maximum refresh rate on this particular panel), our GTX 970 sits at a steady 135MHz. Granted, this is still a 1080p monitor, not the 2560×1440 panel that the Asus ROG Swift PG279Q uses. Nvidia’s phrasing, however, suggests that this is an issue with G-Sync and high refresh rates rather than one or the other (and the test results from PCPer appear to bear that out.

No word yet on when the driver will drop, but we expect it in the not-too-distant future. Nvidia is usually fairly quick to resolve bugs and take care of problems. If you have a high resolution, high-refresh rate display with G-Sync, you can check the issue for yourself. Just remember to let the computer sit idle at the desktop. Most modern browers use the GPU for rendering, so you’ll see power spikes if you’re actively surfing the web.

Tagged , , , , , , , ,

Nvidia: future game-ready drivers will require registration

Over the past few years, Nvidia has made a number of changes and improvements to its GeForce GPU companion software, GeForce Experience. If you own an Nvidia graphics card, GFE governs the use of a number of features, including in-game video recording, Shield streaming, Battery Boost, and game optimizations. The company is making several new changes to the application today, and announcing a major change to how it distributes driver updates that could have far-reaching implications.

First, the updates. Starting immediately, Nvidia’s GameStream technology will allow users to stream titles in 4K at 60 FPS, with support for 5.1 audio, if your hardware can handle that output level in the first place. Only high-end Maxwell cards have the updated NVENC encoding unit that’s required for 4K support, and only a handful of those cards can push 4K in 60 FPS — basically the Titan X and GTX 980 Ti. We expect Pascal GPUs to support such capabilities across more of Nvidia’s GPU stack.

You’ll need a Shield TV to receive a 4K stream from a local PC, and Nvidia recommends a wired connection for best performance. Presumably, Nvidia has moved to H.265, since both high-end Maxwell cards and Shield TV support it, but we don’t have official confirmation of that yet.

The other new feature introduced today is the ability to broadcast to YouTube Live, the streaming giant’s new service meant to compete with Twitch. GeForce Experience can manage both logins and stream to either service.

Future “Game-Ready” drivers will require registration

By far the biggest announcement today is a fundamental change to how Nvidia distributes its driver updates. One of the differences between Teams Red and Green is that Nvidia has often been faster off the block when it comes to Day 1 support for features like SLI. While DirectX 12 is expected to help level this difference, since it moves support for multi-GPU configurations to the developer (and allows for fewer driver-side optimizations in general), early driver support for DX11 remains important. Up until now, those game-ready drivers have been available to anyone with a GeForce card. Going forward, that’s going to change.


In the future, only GeForce owners who both install GeForce Experience and register the service by providing Nvidia with an email address will have access to Game-Ready driver downloads, which will be pushed exclusively through GFE. That doesn’t mean you won’t be able to download a driver from Nvidia.com — it just means that the drivers on the website will be updated periodically, not on a per-release basis. Nvidia has stated that it will push a new driver through its website at least once a quarter, but it hasn’t ironed out the exact timing details yet.

Nvidia was quick to reassure us that users could choose to stop providing an email address to GeForce Experience and opt-out of the program, but noted that you’ll lose access to Game-Ready drivers if you do. As a reviewer, I agree that the burden of providing the company with an email address is minimal and GeForce Experience is a well-behaved, useful utility. The only thing I dislike about it is that you have to have it installed in order to use Nvidia’s Shield controller with a PC, even if you’re connecting with a USB cable. Asinine as that is, it impacts a small number of people.

As a journalist with deep concerns over user privacy, however, I hate this trend of vacuuming up user information. When you buy a CPU or GPU, you’re paying several hundred dollars for the product and for a reasonable expectation of support. Locking driver updates behind an email address is a very small barrier, but it’s still a barrier that requires you to provide Nvidia with ostensibly personal information.

I’d feel better about the situation if Nvidia released more information on this plan and promised (without weasel words) that email addresses would not be sold, shared, or combined with information purchased from companies like Axciom to create customer histories for marketing purposes. Since no company offers promises like this, if you care about privacy, I’d recommend registering a burner email account if you eventually sign up for GFE.

I don’t want to sound like I’m accusing Nvidia of anything untoward — there’s no evidence the company is planning to do anything to compromise user privacy, and Nvidia’s business model doesn’t rely on gathering and selling data about its users. In an era where we now see companies dropping even the pretense of anonymizing user data, however, I’ve become increasingly wary of all moves that would require customers to hand over anything — particularly when the feature being locked behind such hand-overs has been historically available without them.

Tagged , , , , , , , ,

Testing mobile G-Sync with the Asus G751JY: Boutique gaming’s killer feature?

Last January, we previewed how mobile G-Sync might perform on an Asus G751JY laptop that wasn’t fully certified for the feature but supported it well enough to give us a taste of what G-Sync could deliver. Today, we’re revisiting the topic, armed with a fully certified Asus G751JY-DB72. This system is nearly identical to the G751JY that we tested earlier this year, but with a handful of upgrades. Specifically, the G751JY-DB72 uses a Core i7-4720HQ CPU, 24GB of DDR3, a 256GB SSD, and a backup 1TB HDD for conventional mass storage. The system still uses a GTX 980M (4GB of RAM) and a 1,920-by-1,080, 17.3-inch screen.


At $1999 from Asus, it’s not a cheap laptop, but it’s one of the nicest and best-balanced systems I’ve ever tested. Because mobile G-Sync is a big enough feature to warrant its own treatment, we’re going to discuss the laptop’s performance and capabilities in a separate piece. For now, it’s enough to say that this is one of the best boutique laptops I’ve ever tested, even if the base model debuted a year ago.

How mobile G-Sync works

Mobile and desktop G-Sync accomplish the same goal, but they achieve it in different ways. Nvidia’s desktop G-Sync displays rely on a separate, Nvidia-built scaler unit. This scaler controls the monitor’s timing and synchronizes the display’s output with the video card. In 2013, when Nvidia debuted G-Sync, its custom scaler technology was the only way to achieve this kind of synchronization in a desktop display. That’s since changed with the launch of the VESA-backed Adaptive Sync standard (AMD calls its own implementation FreeSync). Laptops, however, don’t require custom scaler hardware — the ability to synchronize refresh rates is part of the embedded DisplayPort specification that both AMD and Nvidia use.


In order to qualify for the mobile G-Sync moniker, Nvidia requires laptop manufacturers to prove that their hardware meets certain standards. We don’t know all the details on what panels need to have, but we do know that they must support variable overdrive. Nvidia has stated that it works with ODMs to ensure that the G-Sync implementations in each laptop are tuned to the specifications of the underlying panels.


As the name implies, variable overdrive allows the display to decrease pixel ghosting by anticipating what color a pixel may need to be on the next refresh cycle and adjusting voltage accordingly. Nvidia has noted that this could result in a slight decrease in color accuracy in some conditions, but the net result should still be improved color reproduction.

G-Sync: A Goldilocks solution:

Now that we’ve covered the basics of how mobile G-Sync works, let’s talk about its specific implementation in the Asus G751JY. This laptop uses a 75Hz panel, which is important to know, because it specifies the maximum refresh speed at which G-Sync can operate. If you have a 75Hz panel and your game is kicking out a steady 200 FPS, G-Sync disables automatically and the game will switch to either V-Sync on or off. By default, NV switches to V-Sync on, since this is much less jarring then the sudden appearance of tearing, but if you prefer to disable V-Sync when the frame rate exceeds 75 FPS, you can specify that at the control panel.

This might seem less-then ideal, since gamers are typically taught to prefer high frame rates, but the relative advantage of faster FPS is subject to diminishing marginal returns. The higher the frame rate, the less visible a missed frame is.

If the frame rate falls below a certain level, however, G-Sync can run into another problem. While it doesn’t shut off due to low FPS, the GPU will automatically interpolate and insert multiple frames to smooth playback. If performance is relatively steady, this is an excellent way to smooth the game without impacting playability. If the frame rate is changing significantly from moment to moment, however, some frames will end up repeated and some will not.

PC Perspective wrote an excellent report on how FreeSync and G-Sync handle low frame rates. The graph below shows how G-Sync inserts additional frames, boosting the refresh rate as a result.


As the frame rate fluctuates, the number of frames G-Sync injects to smooth presentation can vary as well. While the end result can still be superior to not having G-Sync on at all, a variable frame rate below ~35 FPS doesn’t produce the buttery smoothness that Adaptive Sync and G-Sync provide at higher refresh rates.

This ideal window is why we call G-Sync (and Adaptive Sync) a Goldilocks solution. Both technologies work best when your frame rate is neither too high nor too low. In this case, users should target an average consistent frame rate between 40 and 60 FPS.

Testing G-Sync

One of the intrinsic problems with testing a feature like G-Sync is that it’s hard to capture the output difference without a high-speed camera. One website, Blurbusters, has built a G-Sync simulator that you can use to examine the relative impact of having G-Sync enabled vs. disabled. You can see and select various display modes to compare the output, but if you choose G-Sync, be advised that the frame rate will rise until it reaches your monitor’s maximum refresh rate, then drop and start again. You can compare the output in this mode against the various other options (V-sync enabled, disabled, frame rate drops, etc).

The best video demonstration we’ve found of G-Sync vs. V-Sync On is embedded below. I’d recommend watching it full-screen and not trying to focus too hard on any one area of the image. If you relax your eyes and focus on the green line between the two rotating outputs, you’ll see that the V-Sync output on the left has a small but noticeable stutter that the G-Sync output lacks. The relevant portion of video is at 1:10.

One problem with testing a feature like G-Sync is confirmation bias. Confirmation bias is the human tendency to look for evidence that confirms a hypothesis while ignoring or discounting evidence that could disprove it. If I know that G-Sync is enabled, I may claim that a game looks better because I expect G-Sync to deliver a marked improvement. We avoided this problem by using a single-blind A/B test.

Before each test, the laptop was configured to enable or disable G-Sync. I was then asked to choose whether G-Sync had been enabled or disabled based on how the game/benchmark ran. No frame rate information or third-party tools like FRAPS, that might inadvertently hint at whether or not G-Sync was enabled, were enabled and I was not allowed to alt-tab the game or check my results until after the entire set of test runs had been concluded.

Our initial tests of BioShock Infinite failed because the game was either running well above the 75 Hz refresh rate on the Asus G751JY (and enabling V-Sync at these higher frame rates rather than using G-Sync), or running below the 30 FPS mark when we tested at 4K using Dynamic Super Resolution. We discussed the situation with Nvidia and chose IQ settings that kept the game at the 40-50 FPS mark where G-Sync’s impact is most noticeable. Once we did, I could successfully identify whether BioShock Infinite used G-Sync or not in every single test.


We also tested The Elder Scrolls: Skyrim, though in its case, we had to install additional texture mods to pull frame rates low enough for G-Sync to kick in. Again, I was able to correctly determine whether or not G-Sync was enabled in every single test. In most cases, it took just seconds — camera pans and movement are much smoother when G-Sync is enabled.


As someone who would benchmark a llama if I could find one with a PCIe slot, I’m loathe to issue an opinion that comes down to “Trust me, it’s awesome.” In this case, however, that’s what’s called for. With G-Sync enabled, camera pans are much smoother. V-Sync just doesn’t deliver an equivalent experience — not unless your game is already holding a steady 120+ FPS frame rate and you own one of the handful of monitors that support a refresh rate that high.

Is G-Sync worth it?

The FreeSync vs G-Sync battle between AMD and Nvidia has mostly played out in the desktop space, where FreeSync / Adaptive Sync displays have generally been cheaper than their G-Sync counterparts. The situation is different in mobile, where multiple vendors are shipping G-Sync-enabled laptops, while FS/AS appear to be a no-show thus far. We’ve heard rumors that this could change in the next few months, but for now, mobile G-Sync is the only show in town.

It’s true that getting G-Sync up and running properly can require some fine-tuning, but we’re not talking about anything extravagant — if you’re comfortable adjusting in-game video settings, you can tune a game to work well in G-Sync. Older titles may require some additional intervention, but if you’re comfortable installing graphics mods, it’s easy to find frame rates that showcase the feature.

Sometimes, buying into a new technology when it initially rolls out means paying a premium for a less-than ideal experience — but that doesn’t seem to be the case here. The Asus G751JY is a well-balanced system, and the GTX 980M is unmatched in mobile GPUs. True, Nvidia now offers a desktop-class GTX 980 in an ostensibly mobile form factor, but we have some significant concerns about just how that solution will actually work in the real world. The 980M, in contrast, is a proven high-performance solution.

AMD will likely counter with its own solutions — the first FreeSync demos were originally doneon a mobile platform — but for now, if you want this technology, Nvidia is the only game in town. It’s a feature that makes a significant difference, and if we were in the market for a boutique gaming laptop, we’d put G-Sync high on our list of desired features.

Tagged , , , , , ,

Nvidia dealt blow in bid to block Samsung shipments into US

An administrative law judge for the US International Trade Commission sides with Samsung on a patent dispute with Nvidia.

Nvidia was handed a major setback Friday in its lawsuit with Samsung over the improper use of its graphics technology.

Thomas B. Pender, an administrative law judge for the US International Trade Commission, wrote that Samsung didn’t infringe on Nvidia’s graphics patents. He also determined one of Nvidia’s three patents is invalid because the technology had already been covered in previously known patents.

The decision deals a blow to Nvidia’s efforts to prove that Samsung illegally used its technology. If found guilty, Samsung, the largest smartphone maker in the world, could face a ban on US shipments of certain products, including the Galaxy Note Edge, Galaxy Note 4 and Galaxy S5. But the judge’s decision is an early recommendation, and the ITC still has to make a formal decision.

“Today’s initial determination is one more step in the ITC’s legal process,” Nvidia said Friday in a statement. “We remain confident in our case.”

Samsung declined to comment.

Nvidia, which is best known for making graphics chips for PCs, filed lawsuitswith the ITC and US District Court in Delaware in September 2014 involving seven of its patents. At the time, Nvidia said it asked the ITC to block shipments of several Samsung smartphones and tablets to the US and requested the district court award damages for the alleged infringement.

Nvidia’s specialty in graphics is the focal point of the dispute. Samsung has tended to use Qualcomm’s processors in its high-end devices. The Note 4, for instance, uses a Snapdragon 805 chip. Samsung also uses its own Exynos chips in some models, particularly those sold in Korea and its newest products, the Galaxy S6 and Galaxy Note 5. The devices mentioned in the suit involve Qualcomm’s Adreno graphics, ARM Holdings’ Mali technology and Imagination’s PowerVR graphics architecture, which are three of Nvidia’s main competitors in mobile graphics.

Beyond the smartphone, several Samsung tablet computers, including the Galaxy Tab S, Tab 2 andNote Pro, were listed as well.

Nvidia’s suit was only the latest in a series of lawsuits in the hot mobile sector. Samsung has been battling Apple for the past several years over technology used in its smartphones and tablets. The two companies a year ago agreed to settle all disputes outside the US, but their lawsuits continue in the country. Microsoft also has sued Samsung, saying it didn’t live up to its patent licensing agreement for technology used in Android tablets and smartphones.

Companies have tended to file lawsuits with the ITC to speed up the process of addressing the dispute. Civil suits could take years to go to trial, and they’re often held up for even longer in the appeals process. An ITC sales ban, however, could severely hurt a company’s profits or force the two sides to negotiate a settlement.

Tagged , , , , ,

Asynchronous compute, AMD, Nvidia, and DX12: What we know so far

Ever since DirectX 12 was announced, AMD and Nvidia have jockeyed for position regarding which of them would offer better support for the new API and its various features. One capability that AMD has talked up extensively is GCN’s support for asynchronous compute. Asynchronous compute allows all GPUs based on AMD’s GCN architecture to perform graphics and compute workloads simultaneously. Last week, an Oxide Games employee reported that contrary to general belief, Nvidia hardware couldn’t perform asynchronous computing and that the performance impact of attempting to do so was disastrous on the company’s hardware.

This announcement kicked off a flurry of research into what Nvidia hardware did and did not support, as well as anecdotal claims that people would (or already did) return their GTX 980 Ti’s based on Ashes of the Singularity performance. We’ve spent the last few days in conversation with various sources working on the problem, including Mahigan and CrazyElf at Overclock.net, as well as parsing through various data sets and performance reports. Nvidia has not responded to our request for clarification as of yet, but here’s the situation as we currently understand it.

Nvidia, AMD, and asynchronous compute

When AMD and Nvidia talk about supporting asynchronous compute, they aren’t talking about the same hardware capability. The Asynchronous Command Engines in AMD’s GPUs (between 2-8 depending on which card you own) are capable of executing new workloads at latencies as low as a single cycle. A high-end AMD card has eight ACEs and each ACE has eight queues. Maxwell, in contrast, has two pipelines, one of which is a high-priority graphics pipeline. The other has a a queue depth of 31 — but Nvidia can’t switch contexts anywhere near as quickly as AMD can.


According to a talk given at GDC 2015, there are restrictions on Nvidia’s preeemption capabilities. Additional text below the slide explains that “the GPU can only switch contexts at draw call boundaries” and “On future GPUs, we’re working to enable finer-grained preemption, but that’s still a long way off.” To explore the various capabilities of Maxwell and GCN, users at Beyond3D and Overclock.net have used an asynchronous compute tests that evaluated the capability on both AMD and Nvidia hardware. The benchmark has been revised multiple times over the week, so early results aren’t comparable to the data we’ve seen in later runs.

Note that this is a test of asynchronous compute latency, not performance. This doesn’t test overall throughput — in other words, just how long it takes to execute — and the test is designed to demonstrate if asynchronous compute is occurring or not. Because this is a latency test, lower numbers (closer to the yellow “1” line) mean the results are closer to ideal.

Radeon R9 290

Here’s the R9 290’s performance. The yellow line is perfection — that’s what we’d get if the GPU switched and executed instantaneously. The y-axis of the graph shows normalized performance to 1x, which is where we’d expect perfect asynchronous latency to be. The red line is what we are most interested in. It shows GCN performing nearly ideally in the majority of cases, holding performance steady even as thread counts rise. Now, compare this to Nvidia’s GTX 980 Ti.


Attempting to execute graphics and compute concurrently on the GTX 980 Ti causes dips and spikes in performance and little in the way of gains. Right now, there are only a few thread counts where Nvidia matches ideal performance (latency, in this case) and many cases where it doesn’t. Further investigation has indicated that Nvidia’s asynch pipeline appears to lean on the CPU for some of its initial steps, whereas AMD’s GCN handles the job in hardware.

Right now, the best available evidence suggests that when AMD and Nvidia talk about asynchronous compute, they are talking about two very different capabilities. “Asynchronous compute,” in fact, isn’t necessarily the best name for what’s happening here. The question is whether or not Nvidia GPUs can run graphics and compute workloads concurrently. AMD can, courtesy of its ACE units.

It’s been suggested that AMD’s approach is more like Hyper-Threading, which allows the GPU to work on disparate compute and graphics workloads simultaneously without a loss of performance, whereas Nvidia may be leaning on the CPU for some of its initial setup steps and attempting to schedule simultaneous compute + graphics workload for ideal execution. Obviously that process isn’t working well yet. Since our initial article, Oxide has since stated the following:

“We actually just chatted with Nvidia about Async Compute, indeed the driver hasn’t fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute.”

Here’s what that likely means, given Nvidia’s own presentations at GDC and the various test benchmarks that have been assembled over the past week. Maxwell does not have a GCN-style configuration of asynchronous compute engines and it cannot switch between graphics and compute workloads as quickly as GCN. According to Beyond3D user Ext3h:

“There were claims originally, that Nvidia GPUs wouldn’t even be able to execute async compute shaders in an async fashion at all, this myth was quickly debunked. What become clear, however, is that Nvidia GPUs preferred a much lighter load than AMD cards. At small loads, Nvidia GPUs would run circles around AMD cards. At high load, well, quite the opposite, up to the point where Nvidia GPUs took such a long time to process the workload that they triggered safeguards in Windows. Which caused Windows to pull the trigger and kill the driver, assuming that it got stuck.

“Final result (for now): AMD GPUs are capable of handling a much higher load. About 10x times what Nvidia GPUs can handle. But they also need also about 4x the pressure applied before they get to play out there capabilities.”

Ext3h goes on to say that preemption in Nvidia’s case is only used when switching between graphics contexts (1x graphics + 31 compute mode) and “pure compute context,” but claims that this functionality is “utterly broken” on Nvidia cards at present. He also states that while Maxwell 2 (GTX 900 family) is capable of parallel execution, “The hardware doesn’t profit from it much though, since it has only little ‘gaps’ in the shader utilization either way. So in the end, it’s still just sequential execution for most workload, even though if you did manage to stall the pipeline in some way by constructing an unfortunate workload, you could still profit from it.”

Nvidia, meanwhile, has represented to Oxide that it can implement asynchronous compute, however, and that this capability was not fully enabled in drivers. Like Oxide, we’re going to wait and see how the situation develops. The analysis thread at Beyond3D makes it very clear that this is an incredibly complex question, and much of what Nvidia and Maxwell may or may not be doing is unclear.

Earlier, we mentioned that AMD’s approach to asynchronous computing superficially resembled Hyper-Threading. There’s another way in which that analogy may prove accurate: When Hyper-Threading debuted, many AMD fans asked why Team Red hadn’t copied the feature to boost performance on K7 and K8. AMD’s response at the time was that the K7 and K8 processors had much shorter pipelines and very different architectures, and were intrinsically less likely to benefit from Hyper-Threading as a result. The P4, in contrast, had a long pipeline and a relatively high stall rate. If one thread stalled, HT allowed another thread to continue executing, which boosted the chip’s overall performance.

GCN-style asynchronous computing is unlikely to boost Maxwell performance, in other words, because Maxwell isn’t really designed for these kinds of workloads. Whether Nvidia can work around that limitation (or implement something even faster) remains to be seen.

What does this mean for gamers and DX12?

There’s been a significant amount of confusion over what this difference in asynchronous compute means for gamers and DirectX 12 support. Despite what some sites have implied, DirectX 12 does not require any specific implementation of asynchronous compute. That aside, it currently seems that AMD’s ACE’s could give the company a leg up in future DX12 performance. Whether Nvidia can perform a different type of optimization and gain similar benefits for itself is still unknown. Regarding the usefulness of asynchronous computing (AMD’s definition) itself, Kollock notes:

“First, though we are the first D3D12 title, I wouldn’t hold us up as the prime example of this feature. There are probably better demonstrations of it. This is a pretty complex topic and to fully understand it will require significant understanding of the particular GPU in question that only an IHV can provide. I certainly wouldn’t hold Ashes up as the premier example of this feature.”

Given that AMD hardware powers both the Xbox and PS4 (and possibly the upcoming Nintendo NX), it’s absolutely reasonable to think that AMD’s version of asynchronous compute could be important to the future of the DX12 standard. Talk of returning already-purchased NV cards in favor of AMD hardware, however, is rather extreme. Game developers optimize for both architectures and we expect that most will take the route that Oxide did with Ashes — if they can’t get acceptable performance from using asynchronous compute on Nvidia hardware, they simply won’t use it. Game developers are not going to throw Nvidia gamers under a bus and simply stop supporting Maxwell or Kepler GPUs.

Right now, the smart thing to do is wait and see how this plays out. I stand by Ashes of the Singularity as a solid early look at DX12 performance, but it’s one game, on early drivers, in a just-released OS. Its developers readily acknowledge that it should not be treated as the be-all, end-all of DX12 performance, and I agree with them. If you’re this concerned about how DX12 will evolve, wait another 6-12 months for more games, as well as AMD and Nvidia’s next-generation cards on 14/16nm before making a major purchase.

If AMD cards have an advantage in both hardware and upcoming title collaboration, as a recent post from AMD’s Robert Hallock stated, then we’ll find that out in the not-too-distant future. If Nvidia is able to introduce a type of asynchronous computing for its own hardware and largely match AMD’s advantage, we’ll see evidence of that, too. Either way, leaping to conclusions about which company will “win” the DX12 era is extremely premature. Those looking for additional details on the differences between asynchronous compute between AMD and Nvidia may find this post from Mahigan useful as well.  If you’re fundamentally confused about what we’re talking about, this B3D post sums up the problem with a very useful analogy.

Tagged , , , , , , , , , , , , , , , ,

Apple TV Should Take a Hint From Nvidia’s Shield

While it is sad that Apple TV likely won’t be able to get the TV content it needs to really sing, it still could be an amazing product if it were to go after gaming more aggressively, enhance sound like the Nvidia Shield offering, and incorporate a better voice command/digital assistant, which Amazon’s Echo has showcased. I expect both Amazon and Nvidia will be watching this launch closely.

Apple will be launching some interesting products on Wednesday. The timing is nearly right for its Intel Skylake laptops and PCs. The iPhone 6S is also due, and typically the smart Apple customers wait for the S, because S models have fixes to the earlier model’s problems that most annoyed users. At the very least, the iPhone 6S is expected to avoid its predecessors’ bending issues.

However, the most interesting product may be the updated Apple TV. The old product, while pretty popular for something Apple didn’t focus on that much, has gone well beyond its use-by date and is in dire need of a refresh.

The problem is that Apple, like pretty much everyone else, hasn’t been able to get the content deals it needs to make it really sing — the cable companies have those things locked up nicely — so it will have to go in a different direction.

That direction could be showcased by two existing products: Amazon’s Echo and the Nvidia Shield set-top box. Apple could pull inspiration from both of these products to create a fallback strategy in light of its content deal problems.

I’ll close with my product of the week: a set of business-focused wireless PC headphones from Plantronics.

The Apple TV Problem

The past week was painful to watch — not just for Apple, but for virtually every company trying to bring out a next-generation set-top box. Only TiVo really stands out as a success on content, and that is because it is more of a DVR that doesn’t cut the cord.

Even Intel, with all of its strength and power, made a run at this market and failed. It ended up having to sell its unique and potentially powerful cloud-based service to Verizon, which could get the content deals Intel couldn’t.

Apple apparently is countering by creating its own movies and TV shows, but that’s not Apple’s core competence, and there’s a ton of ways that could end badly.

On the other hand, given how dominant Apple is in Hollywood and at Disney, it might be able to pull a rabbit out of its hat.

While the Apple TV will have improved — likely vastly improved — capabilities, I doubt any of us will get very excited without enhanced content, unless Apple’s emphasis is on something else amazing — and this is where Amazon and Nvidia could showcase Apple’s future direction.

The Nvidia Shield Example

Nvidia came up with its own set-top box, the Nvidia Shield, and it may have plowed the field a bit in one direction Apple could go.

Given its gaming focus, it’s likely Shield performance will eclipse Apple’s effort, but that doesn’t mean Apple couldn’t go after a gaming opportunity as well. It has been very good at capturing developers and has been building up gaming competence for some time. A number of games that currently are played on the iPad should be able to scale to the larger screen with some success. That’s one way Apple could emphasize the Apple TV’s enhanced capabilities and take folks’ minds off the programing shortfall.

Granted, for serious gamers, the Nvidia product should remain the better choice. However, in much the same way that Nintendo did, Apple could carve out a niche of casual games. Although it would fall under Nvidia and the other gaming console products, a Wii-like offering from Apple still could be very compelling.

The Shield set-top box also showcases the benefits of enhanced music support (Dolby 7.1 surround sound), and Apple’s Beats acquisition makes this an ideal area for it to focus on as well. I’ve often wondered why Apple didn’t make a bigger push for making Apple TV a great music player before now, given its massive success in this segment. It likely could approach Nvidia’s position on music quality and match it on music breadth. (Both basically use the same general sources.)

Amazon’s Echo

While Amazon is pushing its Siri-like Echo product as some kind of an in-house assistant, most folks who have used it for a while praise it’s voice-activated music capability the most.

Apple clearly has the parts to make this work even better than Amazon does, with expected Siri enhancements like IBM Watson integration, which should make her far smarter.

While Apple TV likely will need to be connected to some kind of amp/speaker and not be as standalone as Echo is, the end result should sound far better and have far more utility than Echo does. It will connect with the entire Apple ecosystem, giving users unprecedented voice command capability over a variety of audio and video content. It likely will eclipse Echo but goad Amazon into developing a blended Fire TV/Echo product, which could be equally cool.

Wrapping Up: Set-Top Box Wars

While it is sad that Apple TV likely won’t be able to get the TV content it needs to really sing — there are ongoing efforts to break open the cable company gridlock, eventually — it still could be an amazing product if it were to go after gaming more aggressively, enhance sound like the Nvidia Shield offering, and incorporate a better voice command/digital assistant, which Amazon’s Echo has showcased.

I expect both Amazon and Nvidia will be watching this launch closely. It will be interesting to see whether Apple TV can garner more excitement than the companies’ other products are likely to generate.

Rob Enderle's Product of the Week

An increasing number of us are living on our PCs much more than we did because our voice and video communications increasingly are coming from there rather than via our phones. What is kind of fascinating for me is that many of the capabilities folks are getting excited about now were available back in the 1980s on ROLM PBXes — but that industry imploded, and we’re now getting them on VoIP systems tied to our PCs and mobile devices.

Key to this is the headset, and the Plantronics Voyager Focus UC headset is a case in point.

At US$299, this headset isn’t inexpensive, but this is a product you’ll likely put on in the morning and then take off when you leave work — or in my case, my home office — at night. It has to be comfortable, wireless, and have excellent sound quality and battery life.

Plantronics' Voyager Focus UC Stereo Bluetooth Headset

Plantronics’ Voyager Focus UC
Stereo Bluetooth Headset

This latest Plantronics offering has all of that. Controls on the headset let you pick up and disconnect calls; its 150-foot range lets you wander away from your desk while talking; and a battery monitor on the side helps you monitor battery life.

It has three microphones for very strong outbound sound quality and an open microphone feature you can use to bring the sounds around you into your headset — like if your boss or cubicle neighbor wants to chat, or if you just want to quietly eavesdrop on what they are saying while looking like you are lost in your headphones (not that I’d EVER do that myself, mind you).

It has the typical volume and music controls, so you can listen to your tunes between calls or training material (by the way, just so you know, folks don’t move to the beat when listening to training tapes or while on conference calls).

This is one of the most comfortable products I have, and I have a lot of headphones. It is a lot more comfortable than my Plantronics Savi Pro, which typically bridges my analog phone and PC. It’s nicely done, easy to use, and very comfortable, which are all good reasons to make the Plantronics Voyager UC headset my product of the week.

Tagged , , , , , ,

Intel will support FreeSync standard with future GPUs

Currently, there are two competing display standards that can provide smoother gameplay and refresh rates synchronized to GPU frame production — Nvidia’s proprietary G-Sync standard, and the VESA-backed Adaptive-Sync (AMD calls this FreeSync, but it’s exactly the same technology). We’ve previously covered the two standards, and both can meaningfully improve gaming and game smoothness. Now, Intel has thrown its own hat into the ring and announced that it intends to support the VESA Adaptive-Sync standard over the long term.

This is a huge announcement for the long-term future of the Adaptive-Sync. Nvidia’s G-Sync technology is specific to their own GeForce cards, though a G-Sync monitor still functions normally if hooked to an Intel or AMD GPU. The theoretical advantage of Adaptive-Sync / FreeSync is that it can be used with any GPU that supports the VESA standard — but since AMD has been the only company pledging to do so, the practical situation has been the same as if AMD and Nvidia had each backed their own proprietary tech.

AMD FreeSync

Intel’s support changes that. Dwindling shipments of low-end discrete GPUs in mobile and desktop have given the CPU titan an ever-larger share of the GPU market, which means that any standard Intel chooses to back has a much greater chance of becoming a de factostandard across the market. This doesn’t prevent Nvidia from continuing to market G-Sync as its own solution, but if Adaptive-Sync starts to ship standard on monitors, it won’t be a choice just between AMD and Nvidia — it’ll be AMD and Intel backing a standard that consumers can expect as default on most displays, while Nvidia backs a proprietary solution that only functions with its own hardware.

Part of what likely makes this sting for Team Green is that its patent license agreement with Intel will expire in 2016. Back in 2011, Intel agreed to pay Nvidia $1.5 billion over the next five years. That’s worked out to roughly $66 million per quarter, and it’s high-margin cash — cash Nvidia would undoubtedly love to replace with patent agreements with other companies. There’s talk that the recent court cases against Samsung and Qualcomm over GPU technology have been driven by this, but Nvidia likely love to sign a continuing agreement with Intel to allow the company to offer G-Sync technology on Intel GPUs. If Intel is going to support Adaptive-Sync, it’s less likely that they’d take out a license for G-Sync as well.

The only fly in the ointment is the timing. According to Tech Report, no current Intel GPU hardware supports Adaptive-Sync, which means we’re looking at a post-Skylake timeframe for support. Intel might be able to squeeze the technology into Kaby Lake, with its expected 2016 debut date, but if it can’t we’ll be waiting for Cannonlake and a 2017 timeframe. Adaptive-Sync and G-Sync are most visually effective at lower frame rates, which means gaming on Intel IGPs could get noticeably smoother than we’ve seen in the past. That’s a mixed blessing for AMD, which has historically relied on superior GPU technology to compete with Intel, but it’s still an improvement over an AMD – Nvidia battle where NV holds the majority of the market share.

Tagged , , , , , , , , , ,

DirectX 12 arrives at last with Ashes of the Singularity, AMD and Nvidia go head-to-head

Ever since Microsoft announced DirectX 12, gamers have clamored for hard facts on how the new API would impact gaming. Unfortunately, hard data on this topic has been difficult to come by — until now. Oxide Games has released an early version of its upcoming RTS game Ashes of the Singularity, and allowed the press to do some independent tire-kicking.

Before we dive into the test results, let’s talk a bit about the game itself. Ashes is an RTS title powered by Oxide’s Nitrous game engine. The game’s look and feel somewhat resemble Total Annihilation, with large numbers of on-screen units simultaneously, and heavy action between ground and flying units. The game has been in development for several years, and it’s the debut title for the new Nitrous engine.


An RTS game is theoretically a great way to debut an API like DirectX 12. On-screen slowdowns when the action gets heavy have often plagued previous titles, and freeing up more CPU threads to attend to the rendering pipeline should be a boon for all involved.

Bear in mind, however, that this is a preview of DX12 performance — we’re examining a single title that’s still in pre-beta condition, though Oxide tells us that it’s been working very hard with both AMD and Nvidia to develop drivers that support the game effectively and ensure the rendering performance in this early test is representative of what DirectX 12 can deliver.

Nvidia really doesn’t think much of this game

Nvidia pulled no punches when it came to its opinion of Ashes of the Singularity. According to the official Nvidia Reviewer’s Guide, the benchmark is primarily useful for ascertaining if your own hardware will play the game. The company also states: “We do not believe it is a good indicator of overall DirectX 12 performance.” (emphasis original). Nvidia also told reviewers that MSAA performance was buggy in Ashes, and that MSAA should be disabled by reviewers when benchmarking the title.

Oxide has denied this characterization of the benchmark in no uncertain terms. Dan Baker, co-founder of Oxide Games, has published an in-depth blog post on Ashes of the Singularity, which states:

“There are incorrect statements regarding issues with MSAA. Specifically, that the application has a bug in it which precludes the validity of the test. We assure everyone that is absolutely not the case. Our code has been reviewed by Nvidia, Microsoft, AMD and Intel. It has passed the very thorough D3D12 validation system provided by Microsoft specifically designed to validate against incorrect usages. All IHVs have had access to our source code for over year, and we can confirm that both Nvidia and AMD compile our very latest changes on a daily basis and have been running our application in their labs for months. Fundamentally, the MSAA path is essentially unchanged in DX11 and DX12. Any statement which says there is a bug in the application should be disregarded as inaccurate information.

“So what is going on then? Our analysis indicates that the any D3D12 problems are quite mundane. New API, new drivers. Some optimizations that that the drivers are doing in DX11 just aren’t working in DX12 yet. Oxide believes it has identified some of the issues with MSAA and is working to implement work arounds on our code. This in no way affects the validity of a DX12 to DX12 test, as the same exact work load gets sent to everyone’s GPUs. This type of optimizations is just the nature of brand new APIs with immature drivers.”

AMD and Nvidia have a long history of taking shots at each other over game optimization and benchmark choice, but most developers choose to stay out of these discussions. Oxide’s decision to buck that trend should be weighed accordingly. At ExtremeTech, we’ve had access to Ashes builds for nearly two months and have tested the game at multiple points. Testing we conducted over that period suggests Nvidia has done a great deal of work on Ashes of the Singularity over the past few weeks. DirectX 11 performance with the 355.60 driver, released on Friday, is significantly better than what we saw with 353.30.

Is Ashes a “real” benchmark?

Baker’s blog post doesn’t just refute Nvidia’s MSAA claims; it goes into detail on how the benchmark executes and how to interpret its results. The standard benchmark does execute an identical flyby pass and tests various missions and unit match-ups, but it doesn’t pre-compute the results. Every aspect of the game engine, including its AI, audio, physics, and firing solutions is executed in real-time, every single time. By default, the benchmark is designed to record frame time data and report a play-by-play report on performance in every subsection of the test. We only had a relatively short period of time to spend with the game, but Ashes records a great deal of information in both DX11 and DX12.

Ashes of the Singularity also includes a CPU benchmark that can be used to simulate an infinitely fast GPU — useful for measuring how GPU-bound any given segment of the game actually is.

In short, by any reasonable meaning of the phrase, Ashes is absolutely a real benchmark. We wouldn’t recommend taking these results as a guaranteed predictor of future DX12 performance between Red and Green — Windows 10 only just launched, the game is still in pre-beta, and AMD and Nvidia still have issues to iron out of their drivers. While Oxide strongly disputes that their MSAA is bugged for any meaningful definition of the word, they acknowledge that gamers may want to disable MSAA until both AMD and NV have had more time to work on their drivers. In deference to this view, our own benchmarks have been performed with MSAA both enabled and disabled.

Test setup

Because Ashes is a DirectX 12 title, it presents different performance considerations than we’ve previously seen, and unfortunately we only have time to address the most obvious cases between AMD and Nvidia today. As with Mantle before it, we expect the greatest performance improvements to show up on lower-core CPUs, or CPUs with weak single-threaded performance. AMD chips should benefit dramatically, as they did in Mantle, while Intel Core i3’s and Core i5’s should still see significant improvements.

With that said, our choice of a Core i7-5960X isn’t an accident. For these initial tests, we wanted to focus on differences in GPU performance. We compared the Nvidia GTX 980 Ti using the newly-released 355.60 drivers. These drivers dramatically boost Ashes of the Singularityperformance in DX11 and are a must-download if you plan on playing the game or participating in its beta. AMD also distributed a new beta Catalyst build for this review, which was also used here. Our testbed consisted of an Asus X99-Deluxe monitor, Core i7-5960X, 16GB of DDR4-2667, a Galax SSD, and the aforementioned GTX 980 Ti and R9 Fury X video cards.

We chose to test Ashes of the Singularity at both 1080p and 4K, with 4x MSAA enabled and disabled. The game was tested at its “High” default preset (note that the “High” preset initially sets 2x MSAA as default, but we changed this when testing with MSAA disabled).

Batches? We don’t need no stinkin’ batches!

As we step through the game’s performance, we should talk a bit about how Oxide breaks the performance figures down. In Ashes, performance figures are given as a total average of all frames as well as by batches. Batches, for our purposes, can be thought of as synonymous with draw calls. “Normal” batches contain a relatively light number of draw calls, while heavy batches are those frames that include a huge number of draw calls. One of the major purposes of DirectX 12 is to increase how many draw calls the system can handle simultaneously without bogging down.

Test results: DirectX 11

We’ll begin with DirectX 11 performance between the AMD Radeon R9 Fury X and the GeForce GTX 980 Ti. The first graph is the overall frames-per-second average between AMD and Nvidia, the next two graphs show performance broken out by batch type.


Batch performance at 1080p

Batch performance at 4K

Nvidia’s DirectX 11 performance makes hash of AMD in DX11. Looking at the graph breakdowns for Normal, Medium, and High batches, we can see why – Nvidia’s performance lead in the medium and heavy batches is much greater than in normal batches. We can see this most clearly at 4K, where Nvidia leads AMD by just 7% in Normal batches, but by 84% in Heavy batches. Enabling 4x MSAA cuts the gap between AMD and Nvidia, as has often been the case.

Overall performance with MSAA enabled

Batch performance in 1080p

Batch performance in 4K

Note that while these figures are comparatively stronger for AMD on the whole, they still aren’t great. Without antialiasing enabled, Nvidia’s GTX 980 Ti is 1.42x faster than AMD in 4K and 1.78x faster in 1080p. With MSAA enabled, that gap falls to 1.27x and 1.69x respectively. The batch breakouts show these trends as well, though it’s interesting that the Fury X closes to within 13% of the GTX 980 Ti at4K, Medium batches.

The gap between AMD and Nvidia was smaller last week, but the 355.60 driver improved Team Green’s performance by an overall 14% and up to 25% in some of the batch-specific test. Oxide told us it has worked with Nvidia engineers for months to ensure the game ran optimally on DirectX 11 and 12, and these strong results bear that out.

Test Results: DirectX 12

DirectX 12 paints an entirely different picture of relative performance between AMD and Nvidia. First, here’s the breakdown at 1080p and 4K, and then in the batch runs for each of those tests.




The gap between AMD and Nvidia in DX11 doesn’t just shrink in DX12, it vanishes. AMD’s R9 Fury X, which is normally about 7% slower than the GTX 980 Ti, ties it in both 4K and 1080p. Meanwhile, the batch tests show AMD a hair less quick in normal batches, but faster at medium and high batch counts. Let’s enable MSAA and see what happens.




For all the fuss about Oxide’s supposed MSAA bug, we expected to see Nvidia’s performance tank or some other evidence of a problem. Screenshots of DX12 vs. DX11 with 4x MSAA revealed no differences in implementation, as per Dan Baker’s blog post. All that happens, in this case, is that AMD goes from tying the GTX 980 Ti to leading it by a narrow margin. In DirectX 11, Nvidia’s 4x MSAA scores were 15.6% lower at 4K and 7.7% lower at 1080p. AMD’s results were 6% and 3% lower, but there’s clearly some non-optimized code paths on AMD’s side of the fence when using that API.

In DirectX 12, Nvidia’s 4x MSAA scores were 14.5% lower at 4K and 12% lower at 1080p. AMD’s results were 12% and 8.2% lower respectively. It’s not news to observe that AMD’s GPUs often take less of a performance hit with MSAA enabled than their Nvidia counterparts, so the fact that the DX12 API is marginally slower for Nvidia with 4x MSAA enabled than the highly optimized DX11 path doesn’t explain why Nvidia came out so strongly against Ashes of the Singularity or its MSAA implementation.

DirectX 12 presents two very different challenges

At first glance, these results may not seem impressive. The magnitude of AMD’s improvement from DX11 to DX12 is undercut by Nvidia’s stellar DX11 performance. The Fury X beats or ties Nvidia in both our benchmarks, and that’s definitely significant for AMD, considering that the Fury X normally lags the GTX 980 Ti, but Microsoft didn’t sell DirectX 12 as offering incremental, evolutionary performance improvements. Is the API a wash?

We don’t think so, but demonstrating why that’s the case will require more testing with lower-end CPUs and perhaps some power consumption profiling comparing DX11 to DX12. We expect DirectX 12 to deliver higher performance than anything DirectX 11 can match in the long run. It’s not just an API – it’s the beginning of a fundamental change within the GPU and gaming industry.

Consider Nvidia. One of the fundamental differences between Nvidia and AMD is that Nvidia has a far more hands-on approach to game development. Nvidia often dedicates engineering resources and personnel to improving performance in specific titles. In many cases, this includes embedding engineers on-site, where they work with the developer directly for weeks or months. Features like multi-GPU support, for instance, require specific support from the IHV (Integrated Hardware Vendor). Because DirectX 11 is a high level API that doesn’t map cleanly to any single GPU architecture, there’s a great deal that Nvidia can do to optimize its performance from within their own drivers. That’s even before we get to GameWorks, which licenses GeForce-optimized libraries for direct integration as middleware (GameWorks, as a program, will continue and expand under DirectX 12).

DirectX 12, in contrast, gives the developer far more control over how resources are used and allocated. It offers vastly superior tools for monitoring CPU and GPU workloads, and allows for fine-tuning in ways that were simply impossible under DX11. It also puts Nvidia at a relative disadvantage. For a decade or more, Nvidia has done enormous amounts of work to improve performance in-driver. DirectX 12 makes much of that work obsolete. That doesn’t mean Nvidia won’t work with developers to improve performance or that the company can’t optimize its drivers for DX12, but the very nature of DirectX 12 precludes certain kinds of optimization and requires different techniques.

AMD, meanwhile, faces a different set of challenges. The company’s GPUs look much better under D3D 12 precisely because it doesn’t require Team Red to perform enormous, game-specific optimizations. AMD shouldn’t assume, however, that rapid uptake of Windows 10 will translate into being able to walk away from DirectX 11 performance. DirectX 12 may be ramping up, but Ashes of the Singularity and possibly Fable Legends are the only near-term DX12 launches, and neither is in finished form just yet. DX11 and even DX9 are going to remain important for years to come, and AMD needs to balance its admittedly limited pool of resources between encouraging DX12 adoption and ensuring that gamers who don’t have Windows 10 don’t end up left in the cold.

As things stand right now, AMD showcases the kind of performance that DirectX 12 can deliver over DirectX 11, and Nvidia offers more consistent performance between the two APIs. Nvidia’s strong performance in DX11, however, is overshadowed by negative scaling in DirectX 12 and the complete non-existence of any MSAA bug. Given this, it’s hard not to think that Nvidia’s strenuous objections to Ashes had more to do with its decision to focus on DX11 performance over DX12 or its hardware’s lackluster performance when running in that API.

Tagged , , , , , , , , , ,