Tag Archives: AMD

Nvidia unveils Pascal specifics — up to 16GB of VRAM, 1TB of bandwidth

NV-Pascal1

Nvidia may have unveiled bits and pieces of its Pascal architecture back in March, but the company has shared some additional details at its GTC Japan technology conference. Like AMD’s Fury X, Pascal will move away from GDDR5 and adopt the next-generation HBM2 memory standard, a 16nm FinFET process at TSMC, and up to 16GB of memory. AMD and Nvidia are both expected to adopt HBM2 in 2016, but this will be Nvidia’s first product to use the technology, while AMD has prior experience thanks to the Fury lineup.

HBM vs. HBM2

HBM and HBM2 are based on the same core technology, but HBM2 doubles the effective speed per pin and introduces some new low-level features, as shown below. Memory density is also expected to improve, from 2Gb per DRAM (8Gb per stacked die) to 8Gb per DRAM (32Gb per stacked die).

sk_hynix_hbm_dram_2

Nvidia’s quoted 16GB of memory assumes a four-wide configuration and four 8Gb die on top of each other. That’s the same basic configuration that Fury X used, though the higher density DRAM means the hypothetical top-end Pascal will have four times as much memory as the Fury X. We would be surprised, however, if Nvidia pushes that 16GB stack below its top-end consumer card. In our examination of 4GB VRAM limits earlier this year, we found that the vast majority of games do not stress a 4GB VRAM buffer. Of the handful of titles that do use more than 4GB, none were found to exceed the 6GB limit on the GTX 980 Ti while maintaining anything approaching a playable frame rate. Consumers simply don’t have much to worry about on this front.

The other tidbit coming out of GTC Japan is that Nvidia will target 1TB/s of total bandwidth. That’s a huge bandwidth increase — 2x what Fury X offers — and again, it’s a meteoric increase in a short time. Both AMD and Nvidia are claiming that HBM2 and 14/16nm process technology will give them a 2x performance per watt improvement.

Historically, AMD has typically led Nvidia when it comes to adopting new memory technologies. AMD was the only company to adopt GDDR4 and the first manufacturer to use GDDR5 — the Radeon HD 4870 debuted with GDDR5 in June 2008, while Nvidia didn’t push the new standard on high-end cards until Fermi in 2010. AMD has argued that its expertise with HBM made implementing HBM2 easier, and some sites have reported rumors that the company has preferential access to Hynix’s HBM2 supply. Given that Hynix isn’t the only company building HBM2, however, this may or may not translate into any kind of advantage.

HBM2 production roadmap

With Teams Red and Green both moving to HBM2 next year, and both apparently targeting the same bandwidth and memory capacity targets, I suspect that the performance crown next year won’t be decided by the memory subsystem. Games inevitably evolve to take advantage of next-gen hardware, but the 1TB/s capability that Nvidia is talking up won’t be a widespread feature — especially if both companies stick to GDDR5 for entry and midrange products. One of the facets of HBM/HBM2 is that its advantages are more pronounced the more RAM you’re putting on a card and the larger the GPU is. We can bet that AMD and Nvidia will introduce ultra-high end and high-end cards that use HBM2, but midrange cards in the 2-4GB range could stick with GDDR5 for another product cycle.

The big question will be which company can take better advantage of its bandwidth, which architecture exploits it more effectively, and whether AMD can finally deliver a new core architecture that leaps past the incremental improvements that GCN 1.1 and 1.2 offered over the original GCN 1.0 architecture, which is now nearly three years old. Rumors abound on what kind of architecture that will be, but I’m inclined to think it’ll be more an evolution of GCN rather than a wholesale replacement. Both AMD and Nvidia have moved towards evolutionary advance rather than radical architecture swaps, and there’s enough low-hanging fruit in GCN that AMD could substantially improve performance without reinventing the entire wheel.

Neither AMD nor Nvidia have announced a launch date, but we anticipate seeing hardware from both in late Q1 / early Q2 of 2016.

Tagged , , , , , , , ,

AMD talks with private equity firm Silver Lake fell through: Report

Earlier this month, we reported AMD would reorganize and restructure its entire graphics unit, as well as rumors that Silver Lake, a private equity firm, was interested in acquiring a substantial share of the company. We had reason to think a deal like this might well have been in the cards, but now reports indicate that the meeting may have fallen apart due to a failure to agree on a strategy that was acceptable to the private equity firm.

AMD was in talks to sell roughly 25% of itself to Silver Lake Management, but that negotiations are currently on-hold due to differences in strategy and overall cost, Bloomberg reports. AMD has good reason to push for as high a price as it can get, seeing as the company’s future largely depends on securing a lifeline until the launch of Zen, its next-generation CPU architecture that’s not expected until 2016. AMD wasn’t willing to comment on the current specifics of its situation, and neither was Silver Lake, for obvious reasons.

The fact that the talks are described as “stalled,” rather than dead, could signify that both companies still hope to head back to the table or that one is bringing pressure on the other. It’s still possible that AMD will come to an agreement with Silver Lake.

Everything hangs on AMD's Zen

It’s not surprising to us, however, that the two companies might be having difficulty coming up with a solid plan forward. As we discussed earlier this year, any attempt to split AMD’s graphics and CPU divisions is going to run into serious trouble. The companies are intertwined in a manner that will make it very difficult to spin one side off from the other, and AMD’s ability to continue as a manufacturer of x86 chips is anything but guaranteed in the event of an external acquisition. You could, perhaps, cleave off the GPU assets entirely — but only at the cost of abandoning the CPU division.

When Rory Read resigned from AMD nearly a year ago, it looked as though he’d set the company on a path to further success. Increasingly, this appears not to be the case. Keller has left, the ARM-based K12 CPU is nowhere to be found (AMD has stated that the Zen architecture is complete, but no one is talking about K12), the SeaMicro blew $281 million in cash and did absolutely nothing for the company’s bottom line, and Project Skybridge was canceled — and with it, the 20nm version of Jaguar that might have helped breathe new life into AMD’s low-power product segments.

While Kabini and Jaguar were not huge earners for AMD, the company did a solid business in low-end processors. Unlike GPUs or big-core chips, 20nm might have been a decent fit for a low-power x86 part. Either due to limited funds or because the 20nm node couldn’t support the design, that never happened. The end result leaves AMD in a very difficult position. Zen isn’t expected to ship for revenue until Q1 2017, which means AMD has to survive the next 18 months until it begins recognizing revenue from sales in continuing results. Even if Sunnyvale delivers an excellent GPU refresh cycle in 2016, it faces an uphill fight.

Tagged , , , , ,

AMD announces comprehensive graphics reorganization as investor rumors swirl

Two major pieces of AMD news have crossed the wire today, and both could be good news for the struggling chip company. First, AMD is announcing a major reorganization of its graphics division. The entire graphics team will now be headed by Raja Koduri, including all aspects of GPU architecture, hardware design, driver deployments, and developer relations. Koduri left AMD for Apple in 2009, only to return to the company in 2013. Since then, he’s served as the Corporate Vice President of Visual Computing.

Now, Raja is being promoted to senior vice president and chief architect of AMD’s entire graphics business (dubbed the Radeon Technologies Group). In this new role, Koduri will oversee the development of future console hardware, AMD’s FirePro division, the GPU side of APUs, and all of AMD’s graphics designs on 14/16nm. Bringing all of these elements under one roof, along with developer relations and driver development, will allow AMD to unify its approach to various products that have previously been managed by different departments. This could pay significant dividends in areas like driver management and feature updates, which have previously been handled by other teams that reported to different managers. Koduri is well-respected in the industry and we’ve heard that the R9 Nano, which debuts in the verynear future, was a project he championed at AMD.

Raja Koduri

Based on what we’ve heard, AMD isn’t just shuffling employees on a spreadsheet — it’s looking to increase its investment in graphics products as well. While we wouldn’t expect the company to suddenly hurl huge amounts of money at the concept, this is an excellent time to make prudent additional expenditures in the GPU market. 14/16nm GPUs will come to market next year, with significant performance and power consumption improvements. The advent of HBM2 will allow for larger frame buffers and could turbo-charge AMD’s future Zen-based APUs. If AMD seizes these technological opportunities and capitalizes on the recent launch of DirectX 12, it’ll be in a much stronger competitive position 12-18 months from now.

Did Silver Lake acquire part of AMD?

There’s a rumor making the rounds today that Silver Lake Partners may have purchased a significant stake in AMD. Fudzilla reports that Silver Lake Partners, which owns significant shares in Avagao, Alibaba, and Dell, may have purchased a 20% share of the company. Such a move would inject much-needed capital into AMD and likely negate the need to take on additional debt to finance continuing operations.

If the rumor proves true, it suggests that Silver Lake saw something in AMD’s future roadmap that they felt justified the investment stake. Such an announcement would likely buoy the company’s rather battered stock price, while the fresh cash injection could help AMD hit its production targets for 2016 and beyond. Even if Zen and AMD’s first ARM-based hardware both hit the ground running in 2016, it’ll take time for AMD to rebuild its overall market position.

As always, take financial rumors with a grain of salt.

Tagged , , , ,

Asynchronous compute, AMD, Nvidia, and DX12: What we know so far

Ever since DirectX 12 was announced, AMD and Nvidia have jockeyed for position regarding which of them would offer better support for the new API and its various features. One capability that AMD has talked up extensively is GCN’s support for asynchronous compute. Asynchronous compute allows all GPUs based on AMD’s GCN architecture to perform graphics and compute workloads simultaneously. Last week, an Oxide Games employee reported that contrary to general belief, Nvidia hardware couldn’t perform asynchronous computing and that the performance impact of attempting to do so was disastrous on the company’s hardware.

This announcement kicked off a flurry of research into what Nvidia hardware did and did not support, as well as anecdotal claims that people would (or already did) return their GTX 980 Ti’s based on Ashes of the Singularity performance. We’ve spent the last few days in conversation with various sources working on the problem, including Mahigan and CrazyElf at Overclock.net, as well as parsing through various data sets and performance reports. Nvidia has not responded to our request for clarification as of yet, but here’s the situation as we currently understand it.

Nvidia, AMD, and asynchronous compute

When AMD and Nvidia talk about supporting asynchronous compute, they aren’t talking about the same hardware capability. The Asynchronous Command Engines in AMD’s GPUs (between 2-8 depending on which card you own) are capable of executing new workloads at latencies as low as a single cycle. A high-end AMD card has eight ACEs and each ACE has eight queues. Maxwell, in contrast, has two pipelines, one of which is a high-priority graphics pipeline. The other has a a queue depth of 31 — but Nvidia can’t switch contexts anywhere near as quickly as AMD can.

NV-Preemption

According to a talk given at GDC 2015, there are restrictions on Nvidia’s preeemption capabilities. Additional text below the slide explains that “the GPU can only switch contexts at draw call boundaries” and “On future GPUs, we’re working to enable finer-grained preemption, but that’s still a long way off.” To explore the various capabilities of Maxwell and GCN, users at Beyond3D and Overclock.net have used an asynchronous compute tests that evaluated the capability on both AMD and Nvidia hardware. The benchmark has been revised multiple times over the week, so early results aren’t comparable to the data we’ve seen in later runs.

Note that this is a test of asynchronous compute latency, not performance. This doesn’t test overall throughput — in other words, just how long it takes to execute — and the test is designed to demonstrate if asynchronous compute is occurring or not. Because this is a latency test, lower numbers (closer to the yellow “1” line) mean the results are closer to ideal.

Radeon R9 290

Here’s the R9 290’s performance. The yellow line is perfection — that’s what we’d get if the GPU switched and executed instantaneously. The y-axis of the graph shows normalized performance to 1x, which is where we’d expect perfect asynchronous latency to be. The red line is what we are most interested in. It shows GCN performing nearly ideally in the majority of cases, holding performance steady even as thread counts rise. Now, compare this to Nvidia’s GTX 980 Ti.

vevF50L

Attempting to execute graphics and compute concurrently on the GTX 980 Ti causes dips and spikes in performance and little in the way of gains. Right now, there are only a few thread counts where Nvidia matches ideal performance (latency, in this case) and many cases where it doesn’t. Further investigation has indicated that Nvidia’s asynch pipeline appears to lean on the CPU for some of its initial steps, whereas AMD’s GCN handles the job in hardware.

Right now, the best available evidence suggests that when AMD and Nvidia talk about asynchronous compute, they are talking about two very different capabilities. “Asynchronous compute,” in fact, isn’t necessarily the best name for what’s happening here. The question is whether or not Nvidia GPUs can run graphics and compute workloads concurrently. AMD can, courtesy of its ACE units.

It’s been suggested that AMD’s approach is more like Hyper-Threading, which allows the GPU to work on disparate compute and graphics workloads simultaneously without a loss of performance, whereas Nvidia may be leaning on the CPU for some of its initial setup steps and attempting to schedule simultaneous compute + graphics workload for ideal execution. Obviously that process isn’t working well yet. Since our initial article, Oxide has since stated the following:

“We actually just chatted with Nvidia about Async Compute, indeed the driver hasn’t fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute.”

Here’s what that likely means, given Nvidia’s own presentations at GDC and the various test benchmarks that have been assembled over the past week. Maxwell does not have a GCN-style configuration of asynchronous compute engines and it cannot switch between graphics and compute workloads as quickly as GCN. According to Beyond3D user Ext3h:

“There were claims originally, that Nvidia GPUs wouldn’t even be able to execute async compute shaders in an async fashion at all, this myth was quickly debunked. What become clear, however, is that Nvidia GPUs preferred a much lighter load than AMD cards. At small loads, Nvidia GPUs would run circles around AMD cards. At high load, well, quite the opposite, up to the point where Nvidia GPUs took such a long time to process the workload that they triggered safeguards in Windows. Which caused Windows to pull the trigger and kill the driver, assuming that it got stuck.

“Final result (for now): AMD GPUs are capable of handling a much higher load. About 10x times what Nvidia GPUs can handle. But they also need also about 4x the pressure applied before they get to play out there capabilities.”

Ext3h goes on to say that preemption in Nvidia’s case is only used when switching between graphics contexts (1x graphics + 31 compute mode) and “pure compute context,” but claims that this functionality is “utterly broken” on Nvidia cards at present. He also states that while Maxwell 2 (GTX 900 family) is capable of parallel execution, “The hardware doesn’t profit from it much though, since it has only little ‘gaps’ in the shader utilization either way. So in the end, it’s still just sequential execution for most workload, even though if you did manage to stall the pipeline in some way by constructing an unfortunate workload, you could still profit from it.”

Nvidia, meanwhile, has represented to Oxide that it can implement asynchronous compute, however, and that this capability was not fully enabled in drivers. Like Oxide, we’re going to wait and see how the situation develops. The analysis thread at Beyond3D makes it very clear that this is an incredibly complex question, and much of what Nvidia and Maxwell may or may not be doing is unclear.

Earlier, we mentioned that AMD’s approach to asynchronous computing superficially resembled Hyper-Threading. There’s another way in which that analogy may prove accurate: When Hyper-Threading debuted, many AMD fans asked why Team Red hadn’t copied the feature to boost performance on K7 and K8. AMD’s response at the time was that the K7 and K8 processors had much shorter pipelines and very different architectures, and were intrinsically less likely to benefit from Hyper-Threading as a result. The P4, in contrast, had a long pipeline and a relatively high stall rate. If one thread stalled, HT allowed another thread to continue executing, which boosted the chip’s overall performance.

GCN-style asynchronous computing is unlikely to boost Maxwell performance, in other words, because Maxwell isn’t really designed for these kinds of workloads. Whether Nvidia can work around that limitation (or implement something even faster) remains to be seen.

What does this mean for gamers and DX12?

There’s been a significant amount of confusion over what this difference in asynchronous compute means for gamers and DirectX 12 support. Despite what some sites have implied, DirectX 12 does not require any specific implementation of asynchronous compute. That aside, it currently seems that AMD’s ACE’s could give the company a leg up in future DX12 performance. Whether Nvidia can perform a different type of optimization and gain similar benefits for itself is still unknown. Regarding the usefulness of asynchronous computing (AMD’s definition) itself, Kollock notes:

“First, though we are the first D3D12 title, I wouldn’t hold us up as the prime example of this feature. There are probably better demonstrations of it. This is a pretty complex topic and to fully understand it will require significant understanding of the particular GPU in question that only an IHV can provide. I certainly wouldn’t hold Ashes up as the premier example of this feature.”

Given that AMD hardware powers both the Xbox and PS4 (and possibly the upcoming Nintendo NX), it’s absolutely reasonable to think that AMD’s version of asynchronous compute could be important to the future of the DX12 standard. Talk of returning already-purchased NV cards in favor of AMD hardware, however, is rather extreme. Game developers optimize for both architectures and we expect that most will take the route that Oxide did with Ashes — if they can’t get acceptable performance from using asynchronous compute on Nvidia hardware, they simply won’t use it. Game developers are not going to throw Nvidia gamers under a bus and simply stop supporting Maxwell or Kepler GPUs.

Right now, the smart thing to do is wait and see how this plays out. I stand by Ashes of the Singularity as a solid early look at DX12 performance, but it’s one game, on early drivers, in a just-released OS. Its developers readily acknowledge that it should not be treated as the be-all, end-all of DX12 performance, and I agree with them. If you’re this concerned about how DX12 will evolve, wait another 6-12 months for more games, as well as AMD and Nvidia’s next-generation cards on 14/16nm before making a major purchase.

If AMD cards have an advantage in both hardware and upcoming title collaboration, as a recent post from AMD’s Robert Hallock stated, then we’ll find that out in the not-too-distant future. If Nvidia is able to introduce a type of asynchronous computing for its own hardware and largely match AMD’s advantage, we’ll see evidence of that, too. Either way, leaping to conclusions about which company will “win” the DX12 era is extremely premature. Those looking for additional details on the differences between asynchronous compute between AMD and Nvidia may find this post from Mahigan useful as well.  If you’re fundamentally confused about what we’re talking about, this B3D post sums up the problem with a very useful analogy.

Tagged , , , , , , , , , , , , , , , ,

Intel will support FreeSync standard with future GPUs

Currently, there are two competing display standards that can provide smoother gameplay and refresh rates synchronized to GPU frame production — Nvidia’s proprietary G-Sync standard, and the VESA-backed Adaptive-Sync (AMD calls this FreeSync, but it’s exactly the same technology). We’ve previously covered the two standards, and both can meaningfully improve gaming and game smoothness. Now, Intel has thrown its own hat into the ring and announced that it intends to support the VESA Adaptive-Sync standard over the long term.

This is a huge announcement for the long-term future of the Adaptive-Sync. Nvidia’s G-Sync technology is specific to their own GeForce cards, though a G-Sync monitor still functions normally if hooked to an Intel or AMD GPU. The theoretical advantage of Adaptive-Sync / FreeSync is that it can be used with any GPU that supports the VESA standard — but since AMD has been the only company pledging to do so, the practical situation has been the same as if AMD and Nvidia had each backed their own proprietary tech.

AMD FreeSync

Intel’s support changes that. Dwindling shipments of low-end discrete GPUs in mobile and desktop have given the CPU titan an ever-larger share of the GPU market, which means that any standard Intel chooses to back has a much greater chance of becoming a de factostandard across the market. This doesn’t prevent Nvidia from continuing to market G-Sync as its own solution, but if Adaptive-Sync starts to ship standard on monitors, it won’t be a choice just between AMD and Nvidia — it’ll be AMD and Intel backing a standard that consumers can expect as default on most displays, while Nvidia backs a proprietary solution that only functions with its own hardware.

Part of what likely makes this sting for Team Green is that its patent license agreement with Intel will expire in 2016. Back in 2011, Intel agreed to pay Nvidia $1.5 billion over the next five years. That’s worked out to roughly $66 million per quarter, and it’s high-margin cash — cash Nvidia would undoubtedly love to replace with patent agreements with other companies. There’s talk that the recent court cases against Samsung and Qualcomm over GPU technology have been driven by this, but Nvidia likely love to sign a continuing agreement with Intel to allow the company to offer G-Sync technology on Intel GPUs. If Intel is going to support Adaptive-Sync, it’s less likely that they’d take out a license for G-Sync as well.

The only fly in the ointment is the timing. According to Tech Report, no current Intel GPU hardware supports Adaptive-Sync, which means we’re looking at a post-Skylake timeframe for support. Intel might be able to squeeze the technology into Kaby Lake, with its expected 2016 debut date, but if it can’t we’ll be waiting for Cannonlake and a 2017 timeframe. Adaptive-Sync and G-Sync are most visually effective at lower frame rates, which means gaming on Intel IGPs could get noticeably smoother than we’ve seen in the past. That’s a mixed blessing for AMD, which has historically relied on superior GPU technology to compete with Intel, but it’s still an improvement over an AMD – Nvidia battle where NV holds the majority of the market share.

Tagged , , , , , , , , , ,

DirectX 12 arrives at last with Ashes of the Singularity, AMD and Nvidia go head-to-head

Ever since Microsoft announced DirectX 12, gamers have clamored for hard facts on how the new API would impact gaming. Unfortunately, hard data on this topic has been difficult to come by — until now. Oxide Games has released an early version of its upcoming RTS game Ashes of the Singularity, and allowed the press to do some independent tire-kicking.

Before we dive into the test results, let’s talk a bit about the game itself. Ashes is an RTS title powered by Oxide’s Nitrous game engine. The game’s look and feel somewhat resemble Total Annihilation, with large numbers of on-screen units simultaneously, and heavy action between ground and flying units. The game has been in development for several years, and it’s the debut title for the new Nitrous engine.

Ashes3

An RTS game is theoretically a great way to debut an API like DirectX 12. On-screen slowdowns when the action gets heavy have often plagued previous titles, and freeing up more CPU threads to attend to the rendering pipeline should be a boon for all involved.

Bear in mind, however, that this is a preview of DX12 performance — we’re examining a single title that’s still in pre-beta condition, though Oxide tells us that it’s been working very hard with both AMD and Nvidia to develop drivers that support the game effectively and ensure the rendering performance in this early test is representative of what DirectX 12 can deliver.

Nvidia really doesn’t think much of this game

Nvidia pulled no punches when it came to its opinion of Ashes of the Singularity. According to the official Nvidia Reviewer’s Guide, the benchmark is primarily useful for ascertaining if your own hardware will play the game. The company also states: “We do not believe it is a good indicator of overall DirectX 12 performance.” (emphasis original). Nvidia also told reviewers that MSAA performance was buggy in Ashes, and that MSAA should be disabled by reviewers when benchmarking the title.

Oxide has denied this characterization of the benchmark in no uncertain terms. Dan Baker, co-founder of Oxide Games, has published an in-depth blog post on Ashes of the Singularity, which states:

“There are incorrect statements regarding issues with MSAA. Specifically, that the application has a bug in it which precludes the validity of the test. We assure everyone that is absolutely not the case. Our code has been reviewed by Nvidia, Microsoft, AMD and Intel. It has passed the very thorough D3D12 validation system provided by Microsoft specifically designed to validate against incorrect usages. All IHVs have had access to our source code for over year, and we can confirm that both Nvidia and AMD compile our very latest changes on a daily basis and have been running our application in their labs for months. Fundamentally, the MSAA path is essentially unchanged in DX11 and DX12. Any statement which says there is a bug in the application should be disregarded as inaccurate information.

“So what is going on then? Our analysis indicates that the any D3D12 problems are quite mundane. New API, new drivers. Some optimizations that that the drivers are doing in DX11 just aren’t working in DX12 yet. Oxide believes it has identified some of the issues with MSAA and is working to implement work arounds on our code. This in no way affects the validity of a DX12 to DX12 test, as the same exact work load gets sent to everyone’s GPUs. This type of optimizations is just the nature of brand new APIs with immature drivers.”

AMD and Nvidia have a long history of taking shots at each other over game optimization and benchmark choice, but most developers choose to stay out of these discussions. Oxide’s decision to buck that trend should be weighed accordingly. At ExtremeTech, we’ve had access to Ashes builds for nearly two months and have tested the game at multiple points. Testing we conducted over that period suggests Nvidia has done a great deal of work on Ashes of the Singularity over the past few weeks. DirectX 11 performance with the 355.60 driver, released on Friday, is significantly better than what we saw with 353.30.

Is Ashes a “real” benchmark?

Baker’s blog post doesn’t just refute Nvidia’s MSAA claims; it goes into detail on how the benchmark executes and how to interpret its results. The standard benchmark does execute an identical flyby pass and tests various missions and unit match-ups, but it doesn’t pre-compute the results. Every aspect of the game engine, including its AI, audio, physics, and firing solutions is executed in real-time, every single time. By default, the benchmark is designed to record frame time data and report a play-by-play report on performance in every subsection of the test. We only had a relatively short period of time to spend with the game, but Ashes records a great deal of information in both DX11 and DX12.

Ashes of the Singularity also includes a CPU benchmark that can be used to simulate an infinitely fast GPU — useful for measuring how GPU-bound any given segment of the game actually is.

In short, by any reasonable meaning of the phrase, Ashes is absolutely a real benchmark. We wouldn’t recommend taking these results as a guaranteed predictor of future DX12 performance between Red and Green — Windows 10 only just launched, the game is still in pre-beta, and AMD and Nvidia still have issues to iron out of their drivers. While Oxide strongly disputes that their MSAA is bugged for any meaningful definition of the word, they acknowledge that gamers may want to disable MSAA until both AMD and NV have had more time to work on their drivers. In deference to this view, our own benchmarks have been performed with MSAA both enabled and disabled.

Test setup

Because Ashes is a DirectX 12 title, it presents different performance considerations than we’ve previously seen, and unfortunately we only have time to address the most obvious cases between AMD and Nvidia today. As with Mantle before it, we expect the greatest performance improvements to show up on lower-core CPUs, or CPUs with weak single-threaded performance. AMD chips should benefit dramatically, as they did in Mantle, while Intel Core i3’s and Core i5’s should still see significant improvements.

With that said, our choice of a Core i7-5960X isn’t an accident. For these initial tests, we wanted to focus on differences in GPU performance. We compared the Nvidia GTX 980 Ti using the newly-released 355.60 drivers. These drivers dramatically boost Ashes of the Singularityperformance in DX11 and are a must-download if you plan on playing the game or participating in its beta. AMD also distributed a new beta Catalyst build for this review, which was also used here. Our testbed consisted of an Asus X99-Deluxe monitor, Core i7-5960X, 16GB of DDR4-2667, a Galax SSD, and the aforementioned GTX 980 Ti and R9 Fury X video cards.

We chose to test Ashes of the Singularity at both 1080p and 4K, with 4x MSAA enabled and disabled. The game was tested at its “High” default preset (note that the “High” preset initially sets 2x MSAA as default, but we changed this when testing with MSAA disabled).

Batches? We don’t need no stinkin’ batches!

As we step through the game’s performance, we should talk a bit about how Oxide breaks the performance figures down. In Ashes, performance figures are given as a total average of all frames as well as by batches. Batches, for our purposes, can be thought of as synonymous with draw calls. “Normal” batches contain a relatively light number of draw calls, while heavy batches are those frames that include a huge number of draw calls. One of the major purposes of DirectX 12 is to increase how many draw calls the system can handle simultaneously without bogging down.

Test results: DirectX 11

We’ll begin with DirectX 11 performance between the AMD Radeon R9 Fury X and the GeForce GTX 980 Ti. The first graph is the overall frames-per-second average between AMD and Nvidia, the next two graphs show performance broken out by batch type.

DX11-High

Batch performance at 1080p

Batch performance at 4K

Nvidia’s DirectX 11 performance makes hash of AMD in DX11. Looking at the graph breakdowns for Normal, Medium, and High batches, we can see why – Nvidia’s performance lead in the medium and heavy batches is much greater than in normal batches. We can see this most clearly at 4K, where Nvidia leads AMD by just 7% in Normal batches, but by 84% in Heavy batches. Enabling 4x MSAA cuts the gap between AMD and Nvidia, as has often been the case.

Overall performance with MSAA enabled

Batch performance in 1080p

Batch performance in 4K

Note that while these figures are comparatively stronger for AMD on the whole, they still aren’t great. Without antialiasing enabled, Nvidia’s GTX 980 Ti is 1.42x faster than AMD in 4K and 1.78x faster in 1080p. With MSAA enabled, that gap falls to 1.27x and 1.69x respectively. The batch breakouts show these trends as well, though it’s interesting that the Fury X closes to within 13% of the GTX 980 Ti at4K, Medium batches.

The gap between AMD and Nvidia was smaller last week, but the 355.60 driver improved Team Green’s performance by an overall 14% and up to 25% in some of the batch-specific test. Oxide told us it has worked with Nvidia engineers for months to ensure the game ran optimally on DirectX 11 and 12, and these strong results bear that out.

Test Results: DirectX 12

DirectX 12 paints an entirely different picture of relative performance between AMD and Nvidia. First, here’s the breakdown at 1080p and 4K, and then in the batch runs for each of those tests.

DX12-high

DX12-Batches-1080p

DX12-Batches-4K

The gap between AMD and Nvidia in DX11 doesn’t just shrink in DX12, it vanishes. AMD’s R9 Fury X, which is normally about 7% slower than the GTX 980 Ti, ties it in both 4K and 1080p. Meanwhile, the batch tests show AMD a hair less quick in normal batches, but faster at medium and high batch counts. Let’s enable MSAA and see what happens.

DX12-4xMSAA

DX12-Batches-1080p-4xMSAA

DX12-Batches-1080p-4xMSAA

For all the fuss about Oxide’s supposed MSAA bug, we expected to see Nvidia’s performance tank or some other evidence of a problem. Screenshots of DX12 vs. DX11 with 4x MSAA revealed no differences in implementation, as per Dan Baker’s blog post. All that happens, in this case, is that AMD goes from tying the GTX 980 Ti to leading it by a narrow margin. In DirectX 11, Nvidia’s 4x MSAA scores were 15.6% lower at 4K and 7.7% lower at 1080p. AMD’s results were 6% and 3% lower, but there’s clearly some non-optimized code paths on AMD’s side of the fence when using that API.

In DirectX 12, Nvidia’s 4x MSAA scores were 14.5% lower at 4K and 12% lower at 1080p. AMD’s results were 12% and 8.2% lower respectively. It’s not news to observe that AMD’s GPUs often take less of a performance hit with MSAA enabled than their Nvidia counterparts, so the fact that the DX12 API is marginally slower for Nvidia with 4x MSAA enabled than the highly optimized DX11 path doesn’t explain why Nvidia came out so strongly against Ashes of the Singularity or its MSAA implementation.

DirectX 12 presents two very different challenges

At first glance, these results may not seem impressive. The magnitude of AMD’s improvement from DX11 to DX12 is undercut by Nvidia’s stellar DX11 performance. The Fury X beats or ties Nvidia in both our benchmarks, and that’s definitely significant for AMD, considering that the Fury X normally lags the GTX 980 Ti, but Microsoft didn’t sell DirectX 12 as offering incremental, evolutionary performance improvements. Is the API a wash?

We don’t think so, but demonstrating why that’s the case will require more testing with lower-end CPUs and perhaps some power consumption profiling comparing DX11 to DX12. We expect DirectX 12 to deliver higher performance than anything DirectX 11 can match in the long run. It’s not just an API – it’s the beginning of a fundamental change within the GPU and gaming industry.

Consider Nvidia. One of the fundamental differences between Nvidia and AMD is that Nvidia has a far more hands-on approach to game development. Nvidia often dedicates engineering resources and personnel to improving performance in specific titles. In many cases, this includes embedding engineers on-site, where they work with the developer directly for weeks or months. Features like multi-GPU support, for instance, require specific support from the IHV (Integrated Hardware Vendor). Because DirectX 11 is a high level API that doesn’t map cleanly to any single GPU architecture, there’s a great deal that Nvidia can do to optimize its performance from within their own drivers. That’s even before we get to GameWorks, which licenses GeForce-optimized libraries for direct integration as middleware (GameWorks, as a program, will continue and expand under DirectX 12).

DirectX 12, in contrast, gives the developer far more control over how resources are used and allocated. It offers vastly superior tools for monitoring CPU and GPU workloads, and allows for fine-tuning in ways that were simply impossible under DX11. It also puts Nvidia at a relative disadvantage. For a decade or more, Nvidia has done enormous amounts of work to improve performance in-driver. DirectX 12 makes much of that work obsolete. That doesn’t mean Nvidia won’t work with developers to improve performance or that the company can’t optimize its drivers for DX12, but the very nature of DirectX 12 precludes certain kinds of optimization and requires different techniques.

AMD, meanwhile, faces a different set of challenges. The company’s GPUs look much better under D3D 12 precisely because it doesn’t require Team Red to perform enormous, game-specific optimizations. AMD shouldn’t assume, however, that rapid uptake of Windows 10 will translate into being able to walk away from DirectX 11 performance. DirectX 12 may be ramping up, but Ashes of the Singularity and possibly Fable Legends are the only near-term DX12 launches, and neither is in finished form just yet. DX11 and even DX9 are going to remain important for years to come, and AMD needs to balance its admittedly limited pool of resources between encouraging DX12 adoption and ensuring that gamers who don’t have Windows 10 don’t end up left in the cold.

As things stand right now, AMD showcases the kind of performance that DirectX 12 can deliver over DirectX 11, and Nvidia offers more consistent performance between the two APIs. Nvidia’s strong performance in DX11, however, is overshadowed by negative scaling in DirectX 12 and the complete non-existence of any MSAA bug. Given this, it’s hard not to think that Nvidia’s strenuous objections to Ashes had more to do with its decision to focus on DX11 performance over DX12 or its hardware’s lackluster performance when running in that API.

Tagged , , , , , , , , , ,

Deep dive: Hynix’s High Bandwidth Memory

We’ve discussed the capabilities and performance of HBM (High Bandwidth Memory) multiple times over the past six months, but a new report sheds light on the physical architecture and construction of HBM. This new memory technology is viewed as the future of GPU memory. Nvidia will debut its own Pascal architecture in 2016 with HBM2, while AMD launched its own HBM-equipped GPUs, the Radeon Fury X and Radeon Fury, earlier this summer.

The full report by Tech Insights is paywalled, but the company shared a number of slides and details with EETimes. The HBM assembly that AMD and Hynix jointly designed is genuinely new compared to other products on the market. Samsung has used TSVs (through silicon vias) for wiring DRAM together before, but no one has ever built a wide I/O design like this in a commercial product.

Interposer and DRAM

The image above shows the substrate, the interposer layer (manufactured by UMC on a 65nm process) and the stacked DRAM. The TSVs aren’t visible in this shot, but can be seen in the image below. The report also details how Hynix manufactured the TSVs and the process it used for creating them. One thing the authors note is that while they expected to see “scallops” in the images (scallops are ridges formed in the sidewall during the etching process), Hynix apparently did an excellent job avoiding the problem. Hynix, the author concludes, “has got a great etch recipe.”

TSVs and DRAM die

The arrangement of the dies on the stack suggests that the first three DRAM dies were diced (cut from the wafer) as a group, while the top DRAM chip was cut separately, tested, and then attached to the stack. The entire four-die stack would then have been attached to the logic die. The advantage of this kind of configuration is that it offers Hynix ample opportunity to confirm that it’s building good die before attaching them in the final product.

TSVs

One piece of evidence in favor of this extensive test cycle is the sheer number of TSVs built into each DRAM. Tech Insights reports that there are nearly 2100 TSV pads on each DRAM die (one cross-section sample is shown below). In additional to being used for data, I/O, power, and redundancy, a significant percentage are apparently used to test the TSVs themselves. This fine-grained error control allows Hynix to determine exactly which TSVs aren’t meeting expectations and substitute one of the redundant TSVs where needed.

Why the fine details matter

Ever since AMD announced it would launch HBM, there have been rumors that HBM was either exceedingly expensive, yielding badly, or both. The Tech Insights excerpt doesn’t directly address either of these claims, but it does offer up some indirect evidence. Hynix has built a testing system that allows them to test for bad die at every level. They can test the stack of three ICs, they can test the top-level DRAM before mounting it, and they can test the TSVs after mounting and have a method of switching to redundant TSVs in case a bad link is found rather than tossing out the entire die stack.

The value of being able to test the product at multiple stages can’t be understated. Some of you may remember Rambus and its ill-fated attempt to conquer the DRAM market in the late 1990s and early 2000s. Rambus DIMMs were extremely expensive when they launched, and there were some conspiratorial whispers alleging that either Intel and Rambus were falsely inflating the price, or that the DRAM manufacturers were deliberately trying to cripple the product.

While the entire RDRAM situation was heavily political, one contact we spoke to at a memory company that was fully on-board with the RDRAM shift told us that no, there were real problems that crippled RDRAM yields. One of the most fundamental was that there was no way to test whether an individual RDRAM chip was good or not before mounting it in a series to make a RIMM module. If the module didn’t test perfectly, it had to be disassembled and swapped out, piece by piece, until a faulty IC was found. Since it was possible to have more than one faulty IC at a time, this step had to be performed using a “known good” set of chips until each RIMM was “known good.” Combined with the low yields that are typical for any ramping memory, this inability to test individual components contributed substantially to RDRAM’s high prices when it first launched.

By all accounts, Hynix hasn’t just rolled out a new solution by the skin of their teeth — they’ve built a scalable design that bodes well for the future of the memory standard. The interposer is built on a dirt-cheap 65nm process, and we already know HBM2 is ramping.

Tagged , , , , , ,

AMD Radeon R9 Fury review: Splitting Nvidia’s GTX 980 and 980 Ti in performance

At E3 last month, AMD announced that it would bring launch multiple GPUs under its new Fury brand. First up was theFury X, a $649 card meant to compete with the GTX 980 Ti and sporting its own custom water cooler. Today, the company is launching its follow-up to the Fury X, the $549 Radeon R9 Fury. This new card uses the same base Fiji GPU as the Fury X, but with fewer cores (3584 as opposed to 4096). The modest reduction in total compute units is matched by a slight cut to texture mapping units (down to 224 from 256), but the total number of ROPS stayed the same, at 64. The Radeon Fury’s clock speed has been cut slightly, to 1GHz (down from the Radeon Fury X’s 1050MHz), but the GPU packs the same 500MHz, 4096-bit HBM interface, 275W maximum board power, and dual 8-pin PCIe connectors.

One of the factors that sets the new Radeon R9 Fury apart from the Fury X is the size of the card. While neither the Sapphire Tri-X or Asus Strix R9 Fury are that much bigger than other high-end air-cooled GPUs, they’re far larger than AMD’s diminutive Radeon Fury X. Granted, that GPU used a water-cooler while the Strix (the card we have in-house) is air-cooled, but it’s not just the cooler that’s large — Asus mounted the Fury on a standard-length high-end PCB as well.

The resulting card is the Asus Strix R9 Fury DirectCU III OC, but don’t let the OC get your hopes up. AMD’s reference card is clocked at 1GHz standard, while the Strix clocks in at a maximum of 1020MHz out of the box. That 2% OC isn’t going to push the envelope, and like the Fury X, Fury isn’t expected to have much overclocking headroom. One thing to like about the R9 Fury Strix, particularly if you have older monitors, is that the GPU supports a wide range of ports. Unlike the Sapphire version of the card, which offers 3x DisplayPort and 1x HDMI, the Strix packs 3x DisplayPort, 1x DVI-D, and 1x HDMI.

According to Asus, the GPU cooler is designed to maintain a maximum temperature of 85C. That’s not nearly as low as AMD’s 50C target for Fury X, but for an air-cooled card, 85C is quite good. It’s particularly impressive given that AMD’s last high-end air-cooled cards, the R9 290 and R9 290X, often ran right up to their 95C thresholds. Asus is bringing the Strix R9 to market at $579, marginally higher than the $549 AMD is targeting for the R9 Fury in general. The heatsink and attached GPU are huge compared to previous cards, at 11.75 inches long and with significant cooler overhang.

We asked Asus why the Strix was so large, given that AMD was able to build both Fury X and the upcoming Fury Nano into much smaller cards, but the company declined to comment in detail, saying only “[W]e went with what works best for the chipset and cooling options.” We’ll have to wait and see if other vendors introduce Fury’s in smaller form factors, or if that capability is reserved for the upcoming Fury Nano.

One thing we can say about the R9 Strix — the card may be long and the fans + heatsink are large, but this card delivers excellent performance for very little noise.

Fury’s positioning, tiny review window

To say that this review is coming in hot would be an understatement. We received our Asus test card on Wednesday at 5 PM for an 8 AM Friday launch. Given my other responsibilities for ET, the time I had to spend with this GPU was further compressed. It’s not clear why Asus sampled on such short notice; manufacturers typically give much longer lead times when testing new hardware. Add in some significant problems with testbed configuration (a series of unfortunate events so mind-boggling, I’m considering writing a post about them), and the end result was a badly compressed launch cycle.

Fortunately, Fury’s positioning is relatively straightforward. AMD is bringing the card in at $549, or roughly $50 more than the GeForce GTX 980. At that price point, the GPU needs to hit about 10% faster than its Team Green counterpart. AMD’s Fury X reliably delivered this kind of performance delta, but was priced to compete against the GTX 980 Ti, not the GTX 980. Fury is going after ostensibly easier prey.

Unfortunately, AMD’s rocket launch means that the 4GB HBM RAM comparisons I’ve wanted to do and wide-scale power consumption comparison are both on-hold for now. But let’s see what we can see from a quick run around the block, shall we?

All of our tests were run on a Haswell-E system with an Asus X99-Deluxe motherboard, 16GB of DDR4-2667, and Windows 8.1 64-bit with all patches and updates installed. The latest AMD Catalyst 15.7 drivers and Nvidia GeForce 353.30 drivers were used. Our power consumption figures are going to be somewhat higher in this review than in some previous stories — the 1200W PSU we used for testing was a standard 80 Plus unit, and not the 1275 80 Plus Platinum that we’ve typically tested with.

BioShock Infinite:

BioShock Infinite was tested using that game’s Ultra settings with the Alternative Depth of Field option using the built-in benchmark option at both 1080p and 4K.

BioShock infinite Radeon Fury

BioShock Infinite was tested using that game’s Ultra settings with the Alternative Depth of Field option using the built-in benchmark option at both 1080p and 4K. Fury pulls ahead of the GTX 980 nicely, nearly tying things up with the Radeon R9 Fury X and GTX 980 Ti. Playable 4K is no problem for any of the high-end cards in this sample.

Company of Heroes 2:

Company of Heroes 2 is an RTS game that’s known for putting a hefty load on GPUs, particularly at the highest detail settings. Unlike most of the other games we tested, COH 2 doesn’t support multiple GPUs. We tested the game with all settings set to “High,” with V-Sync disabled.

Company of Heroes 2 - Radeon Fury

Company of Heroes 2 is a mixed bag for AMD. At 1080p and a more playable framerate, we see the Fury lagging the GTX 980. At 4K, however, it’s the AMD cards that pull ahead. The margin between the Asus Strix R9 Fury and the R9 Fury X from AMD is rather small, and the Fury X ekes out a win over the stock-clocked GTX 980 Ti.

Metro Last Light:

We tested Metro Last Light in Very High Quality with 16x anisotropic filtering and normal tessellation, in both 1080p and 4K. While it’s a few years old at this point, Metro Last Light is still a punishing game at maximum detail.

Metro Last Light - Radeon Fury

The Asus Strix sweeps the GTX 980 in both tests, nearly tying the GTX 980 Ti. The overclocked version of the 980 Ti from EVGA (covered here) still edged Fury X, but the Fury offers nearly the same level of performance.

Total War: Rome 2

Total War: Rome II is the sequel to the earlier Total War: Rome title. It’s fairly demanding on modern cards, particularly at the highest detail levels. We tested at maximum detail levels in both 1080p and 4K.

Total War: Rome 2 Radeon Fury

In the game’s built-in benchmark, the Fury essentially ties the GTX 980 at 1080p but surpasses it in 4K, with the Fury X holding out a narrow edge above the Fury. Performance here is close across the board.

Shadow of Mordor:

Shadow of Mordor is a third-person open-world game that takes place in between The Hobbit and the Lord of the Rings. Think of it as Far Cry: Ranger Edition (or possibly Grand Theft Ringwraith) and you’re on the right track. We tested the game at maximum detail with FXAA (the only AA option available).

ShadowofMordorFury

In Shadow of Mordor, AMD’s Fury X doesn’t quite match the stock GTX 980 Ti in 1080p, but it ekes out a win by just under 10% in 4K mode. Similarly, the Asus Strix R9 Fury is roughly 10% faster in 1080p, but a full 26% faster in 4K mode.

Dragon Age: Inquisition

Dragon Age: Inquisition is one of the greatest role playing games of all time, with a gorgeous Frostbite 3-based engine. While it supports Mantle, we’ve actually stuck with Direct3D in this title, as the D3D implementation has proven to be superior in previous testing.

While DAI does include an in-game benchmark, we’ve used a manual test run instead. The in-game test often runs more quickly than the actual title, and is a relatively simple test compared with how the game handles combat. Our test session focuses on the final evacuation of the town of Haven, and the multiple encounters that the Inquisitor faces as the party struggles to reach the chantry doors. We tested the game at maximum detail with 4x MSAA.

InquisitionFury

Again, we see the Asus Strix extending a lead over the GTX 980, leading the Nvidia card by roughly 9% in 1080p mode and as much as 23.5% at 4K. The GTX 980 Ti is the fastest card overall, but AMD’s solutions continue to show superior 4K scaling compared to Nvidia — the R9 Fury X matches the GTX 980 Ti in 4K even though it’s surpassed at 1080p.

Noise and power consumption:

AMD’s initial run of Fury X coolers were remarkably quiet under load, even if some of the first batch had a pitch profile we found less-than pleasing. The Asus Strix R9 Fury isn’t quite as silent as the Fury X (that’s what you give up for using air as opposed to water), but, in a rare win for AMD, the Asus Strix R9 was logged as quieter than competing GeForce cards by both Tech Report and Anandtech (I don’t have access to sound equipment capable of picking up decibel levels low enough to be used for this kind of testing.

That’s a noted turn-around for AMD, considering that Hawaii’s debut cards were infamous for their noise. Third-party designs vastly improved on the initial cards, but Fury doesn’t just compete against Nvidia on this front — it leads Team Green solidly. (TR and Anandtech differ slightly on this point; AT reports the Fury as being the quietest card, while TR logs a GTX 970 in that position). Either way, it’s a big leap forward for AMD. GPU temperatures are also excellent, with the Strix R9 typically topping out in the mid-70s Celsius.

AMD has caught some flak for building what supposedly amounted to “Fat Tonga” as opposed to an all-new GPU, but the Strix R9’s thermals prove that Sunnyvale didn’t just hook its existing GPU up to a helium tank. The Asus R9 390X uses the same cooler as the Strix R9 Fury, but TechReport shows the R9 390X running both hotter and louder for lower overall performance.

We performed our own power consumption tests at idle and using Metro Last Light at 4K to stress test all GPUs. Power consumption was measured in the third run-through, to ensure that the cards heated up.

MetroLastLight-Power

The Strix’s overall power consumption is about 10% better than the R9 Fury X’s, which is in line with what we’d expect given overall performance. There’s still a significant gap between the GTX 980 and the R9 Strix, however, though it’s unlikely to make much difference in your power bill unless you game 24/7 or live in a state where electricity is extremely expensive.

WattsPerFrame

Our watts-per-frame metric divides the power consumption in Metro Last Light by each card’s power consumption. Here, we see that the Asus Strix R9 Fury maintains the same improved power consumption ratio as the Fury X, even if it can’t quite match Nvidia’s figures.

Conclusion:

How well the R9 Fury stacks up against the GTX 980 is going to depend on what your needs are. The Asus R9 Fury Strix doesn’t quite sweep the GTX 980, but it ties or beats it in virtually every benchmark. The Asus Strix is an excellent GPU with superior cooling, even if we were a bit surprised by the giant PCB and extremely short launch window (we only received our card on Wednesday for an NDA lift on Friday.

If the R9 Fury and the GTX 980 were both $500 cards, we’d say the R9 Fury was absolutely the better solution, particularly if you’re gaming above 1080p. AMD has set the price at $549, however, with the Asus Strix coming in around $579. That’s not bad, per se, but keep in mind that this GPU shows its best legs at 4K. That’s problematic, because 4K + maximum detail is still too demanding for a single GPU in most current titles. That means gamers who want to play in 4K without sacrificing visual fidelity are going to be better served by multiple GPUs, and Team Green’s multi-GPU support has long been superior to Team Red’s.

There’s an argument to be made for either card and buyers should be well-served by either solution. If you’re already in Team Red’s camp to start with, the R9 Fury is a great deal. It’s 6-10% slower than the Fury X but only costs about 85% as much, which makes it the more efficient card between the two in terms of dollars per frame. We understand why AMD chose to delay its launch slightly — the Fury steals some of the Fury X’s thunder. Combined, the Fury and Fury X put AMD on much better competitive footing against Nvidia. They aren’t the blowout wins that AMD captured in 2013, when the R9 290 and R9 290X took out the GTX 780 and GTX Titan, but they’re good cards with far better thermals than AMD’s previous top-end single card GPU launches.

AMD isn’t done with the Fury architecture just yet — Fury Nano and an unnamed dual-GPU Fury are both scheduled for later this year.

Tagged , , , , , , , , , , , , , , ,

Samsung expected to manufacture 14nm chips for Qualcomm, Apple, possibly Nvidia in 2015

Samsung foundry

When Samsung and TSMC laid out their next-generation manufacturing plans, the two chip companies decided to pursue very different goals. TSMC opted for a 20nm half-step node that would shrink die sizes but retain conventional planar silicon, while Samsung decided it would leap straight for 14nm manufacturing and introduce FinFETs directly after the 28nm node. Now, that decision to skip 20nm altogether is paying dividends for the Korean manufacturer — it’s hitting its 14nm stride while TSMC is still ramping 20nm, and expecting to sign multiple new customers (and a few old ones) because of it.

We’ve previously discussed how Apple was expected to move manufacturing back to 14nm at Samsung after using TSMC’s 20nm node for the iPhone 6 and iPhone 6 Plus, but new information suggests that companies like Qualcomm and Nvidia are ramping hardware there as well. This isn’t the first time we’ve heard rumors of Samsung fabbing for Nvidia, but it’s been several years since they last cropped up. Nonetheless, the timing makes sense — TSMC’s 20nm node ultimately offered fairly incremental gains over 28nm. Its 16nm FinFET node will offer a much larger improvement, but won’t be available for volume production until 2016. given the inevitable lead times between the beginning of volume production and commercial shipments, we can expect Samsung to have a 9-18 month lead over its rival (depending on the exact components and cost structure for the parts).

Samsung-FinFET

With access to Samsung’s 14nm technology, multiple manufacturers could deliver quick updates to 20nm hardware with significantly improved performance characteristics. It’s not clear if Nvidia would tap Samsung for its Tegra line of processors (now increasingly relegated to automotive computing), or if the company would manufacture GPUs at Samsung plants. There have been rumors that Nvidia might skip 20nm altogether, and while that would surprise us, since the GPU industry tends to be a rapid adopter of virtually every node, it’s possible that Nvidia might have built Maxwell on 28nm as a stopgap while it prepares a 14nm sequel for later in the year. AMD is known to be building 20nm hardware, but which fab its using (TSMC or GlobalFoundries) and when those parts will launch is still a matter of speculation.

Speaking of AMD, there’s a good chance that this move will drive business to its erstwhile fab partner, GlobalFoundries. GF signed a deal to deploy Samsung’s fabrication technologyin 2014 and to serve as the Korean manufacturer’s second source capacity. Any deal Samsung makes with Apple, Nvidia, or Qualcomm could also kick business over to GF as well. Samsung’s 14nm tech is also thought to be the reason why the company dropped Qualcomm from the Galaxy S6 — using its own 14nm chips in a flagship device gives it more ability to capture the profits from its sale.

Samsung’s semiconductor business earned $2.5B in operating profits in 2014, with further gains expected throughout this year. Assuming it ships 14nm in volume, this will be the first time in more than a decade that a chip fab other than TSMC has blazed the trail on a new process node.

Tagged , , , , , ,

AMD’s next-gen CPU leak: 14nm, simultaneous multithreading, and DDR4 support

Ever since it became clear that AMD’s Carrizo would be a mobile update with a focus on energy efficiency as opposed to raw performance, enthusiasts and investors have been hungry for details about the company’s upcoming CPUs in 2016. AMD has been tight-lipped on these projects, though we heard rumors of a combined x86-ARM initiative that was up and running as of early last year — but now, a handful of early rumors have begun to leak about the eventual capabilities of these new cores.

As with all rumors, take these with a substantial grain of salt — but here’s what Sweoverclockers.com is reporting to date. We’ll rate the rumors as they’re given on the site: According to the post, the new AMD Zen is:

Built on 14nm: For a chip launching in 2016, this seems highly likely. Jumping straight for 14nm won’t obviate the gap between AMD and Intel, but the company is currently building its FX chips on legacy 32nm SOI while its Kaveri and Carrizo are both 28nm bulk silicon. The double-node jump from 28nm to 14nm should give AMD the same benefits as a single-node process transition used to grant. Given the advantage of FinFET technology, we’d be surprised if the company went with anything else. The chips are also expected to be built at GlobalFoundries, which makes sense given AMD’s historic relationship with that company.

Utilize DDR4: Another highly likely rumor. By 2016, DDR4 should be starting to supplant DDR3 as the mainstream memory of choice for desktop systems. AMD might do a hybrid DDR3/DDR4 solution as it did in the past with the DDR2/DDR3 transition, or it might stick solely with the new interface.

Up to 95W: Moderately likely, moderately interesting. This suggests, if nothing else, that AMD wants to continue to compete in the enthusiast segment and possibly retake ground in the server and enterprise space. Nothing has been said about the graphics architecture baked on to the die, but opting for an up-to 95W TDP suggests that the company is giving itself headroom to fight it out with Intel once again.

Opt for Simultaneous multithreading as opposed to Cluster Multithreading: With Bulldozer, AMD opted for an arrangement called cluster multi-threading, or CMT. This is the strategy used by Bulldozer, in which a unified front end issues instructions to two separate integer pipelines. The idea behind the Bulldozer design was that AMD would gain the benefits of having two full integer pipelines but save die space and power consumption compared to building a conventional multi-core design.

Hyper-Threading

Intel, in contrast, has long used simultaneous multithreading (SMT), which they call Hyper-Threading,  in which two different instructions can be scheduled and execute within a single clock cycle. In theory, AMD’s design could have given it an advantage, since each core contains a full set of execution units as opposed to SMT, where those resources are shared, but in practice Bulldozer’s low efficiency crippled its scaling.

The rumor now is that AMD will include an SMT-style design with Zen. It’s entirely possible that the company will do this — Hyper-Threading is one example of SMT, but it’s not the only implementation — IBM, for example, uses SMT extensively in its POWER architectures. The reason I’m not willing to completely sign off on this rumor is that it’s a rumor that’s dogged AMD literally since Intel introduced Hyper-Threading 15 years ago.

The benefits of using SMT are always dependent on the underlying CPU architecture, but Intel has demonstrated that the technology is often good for a 15-20% performance increase in exchange for a minimal die penalty. If AMD can achieve similar results, the net effect will be quite positive.

The final rumor floating around is that the chip won’t actually make an appearance until the latter half of 2016. That, too, is entirely possible. GlobalFoundries’ decision to shift from its own 14nm-XM process to Samsung’s 14nm designs could have impacted both ramp and available capacity, and AMD has pointedly stated that it will transition to new architectures only when it makes financial sense to do so. The company may have opted for a more leisurely transition to 14nm in 2016, with the new architecture debuting only when GF has worked the kinks out of its roadmap.

HBM-Memory

No information on performance or other chip capabilities is currently available, and the company has said nothing about the integrated GPU or possible use of technologies like HBM. The back half of 2016 would fit AMD’s timeline for possible APU integration of HBM — which means these new chips could be quite formidable if they fire on all thrusters out of the gate. During its conference call last week, AMD mostly dodged rumors about delays to its ARM products, noting that it had continued sampling them in house and was pleased with the response. Presumably the company’s partners remain under NDA — there are no published independent evaluations of these products to date.

Tagged , , , , ,