Tag Archives: processors

ARM announces new Cortex-A35: ultra-low power, 64-bit

Over the past 18 months, we’ve seen consumer tablets and smartphones mostly jump from 32-bit ARM cores (the Cortex-A9 and A15) to their 64-bit counterparts (Cortex-A53 and A57). At the lower-end of the market, however, 32-bit continued to rule the roost. While ARM billed the Cortex-A53 as a replacement for the Cortex-A7 initially, tests showed that the A53’s power curve was high enough that it wasn’t a drop-in replacement for the 32-bit chip. Now, ARM has announced a new 64-bit processor that does replace the Cortex-A7, while simultaneously improving performance and flexibility.

The new Cortex-A35 is designed as a successor to the Cortex-A5 and A7, but presumably tilts towards the A7’s power efficiency and performance ratio rather than the minimalist Cortex-A5. A5 customers likely don’t have much need for 64-bit in the near future in any case — if you work in environments where the A5’s performance is good enough, it’s probably good enough for the foreseeable future. The exact positioning and replacement cycle is shown below:

A35-1

Anandtech reports that the A35 is targeting environments where power is below 125mW (ARM claims the A35 can operate at 1GHz while drawing just 90mW — and that’s on a 28nm process. Since ARM intends for the chip to deploy at the 14/16nm node, real power and voltage levels should be even better. This allows for a greater frequency range or even lower power consumption for Internet of Things devices that don’t require much in the way of performance.

A35-2

Like the A5 and A7, the Cortex-A35 is an in-order core with an eight-stage pipeline and limited dual-issue capabilities. What’s changed compared to those cores is that ARM has improved memory accesses, branch prediction, and instruction fetch to boost both power efficiency and overall performance. ARM borrowed heavily from the A53’s memory architecture to improve the A35’s final design, and boost the cache subsystem’s overall capabilities as well.

Flexible implementation

ARM has always offered flexible core implementations, but the Cortex-A35 takes that approach to new levels. The Cortex-A35 can be configured with 8K-64K L1 caches and an L2 cache between 128KB and 1MB. Customers who wish to do so can implement a Cortex-A35 core with an 8K L1, no FPU, Neon, L2 cache, hardware cryptography, or multi-core capability. Our recent story on an impending lawsuit against AMD over the core counts on Bulldozer processors argued that attempting to legally define a CPU “core” is effectively impossible, and ARM’s implementation offerings on the Cortex-A35 are part of the reason why. Two different vendors can build two different Cortex-A35’s with entirely different hardware capabilities, including functions we think of as essential to a modern processor, like the FPU.

A35-3

In typical configurations, ARM is telling users to expect a 6% improvement in integer performance, 16% in browsing performance, 36% in floating point, and 40% in a Geekbench-style MPI test. That’s compared to the Cortex-A7 and assumes equivalent process technology and clock speed tests. Overall, the goal is clearly to provide a chip that better suits an evolving IoT ecosystem (assuming IoT developers ever manage to create a smart product worth owning that isn’t riddled with security flaws).

By offering low-power devices that include advanced security capabilities like TrustZone, ARM is giving developers some hardware options to help with that goal. Whether or not designers will use them is another question. Devices based on the Cortex-A35 are expected to be in-market by the end of the year, and ARM has suggested that some companies might choose to use a pair of Cortex-A53 / A35 cores in Big.little configurations to take advantage of the efficiency of its lower-power cores. ARM has previously stated that it expects companies to continue to pair Cortex-A72 cores with Cortex-A53 cores, but it’s possible that we’ll see a few OEMs offer A72 / A35 pairings to maximize both power savings and performance.

Tagged , , , , , , , , , , ,

Apple responds to battery life concerns with its A9 SoCs

Yesterday, we covered reports from concerned iPhone 6s and 6s Plus owners, who have seen markedly different results between those devices built on Samsung’s 14nm node and those using TSMC’s 16nm. Apple has since released a statement covering these concerns in greater detail than we initially alluded to yesterday, and it’s worth considering how the company’s statements fit into the overall picture. Apple’s statement is reprinted below:

With the Apple-designed A9 chip in your iPhone 6s or iPhone 6s Plus, you are getting the most advanced smartphone chip in the world. Every chip we ship meets Apple’s highest standards for providing incredible performance and deliver great battery life, regardless of iPhone 6s capacity, color, or model.

Certain manufactured lab tests which run the processors with a continuous heavy workload until the battery depletes are not representative of real-world usage, since they spend an unrealistic amount of time at the highest CPU performance state. It’s a misleading way to measure real-world battery life. Our testing and customer data show the actual battery life of the iPhone 6s and iPhone 6s Plus, even taking into account variable component differences, vary within just 2-3% of each other.

Of benchmarks and battery life

Apple has a point when it says that benchmarks don’t often track the real-world experience of actually using a device. The primary purpose of most benchmarks is to gather performancedata, and the advent of modern benchmarking has its roots firmly in the pre-smartphone era, when battery life wasn’t relevant to desktops and workstations. Even now, many battery life tests amount to “Repeat this workload until the phone dies.”

Whether you use a light or heavy workload on a phone can have a profound impact on its battery life — and, by extension, on how the phone tests in comparison to other devices. Anandtech made this point in their own investigation:

77891

Compare the iPhone 5s against the iPhone 6. The iPhone 6’s battery is 16% larger than the iPhone 5s’s, but the iPhone 6’s light usage run-time is almost 30% longer than the iPhone 5s. Clearly, the later silicon is more power efficient. Under heavy load, however, the iPhone 6’s larger battery only manages to equal the iPhone 5s’s total run-time — not exceed it. Meanwhile, the iPhone 6 Plus’s heavy run time is worse than the Galaxy Note 5’s, but more than 90 minutes better in light usage.

This is why it’s impossible to dismiss Apple’s response as “You’re holding it wrong,” despite the tone-deaf way the company communicated its statement. If a battery test doesn’t accurately capture the way people use the phone, it’s a bad benchmark. It may accurately measure power consumption between two devices in a stated workload, but the entire point of such workloads is to actually capture real-world conditions.

Thus far, the battery tests that have been floated have involved looping a JavaScript test and Geekbench’s fixed-load test, which apparently stresses the iPhone 6 Plus at a fairly constant 30%. Neither of these are particularly representative of real-world conditions. In fact, in the one test we’ve seen where real-world loading was performed (a video playback test for 60 minutes), both of the iPhones lost the same amount of battery life. This implies that in at least some conditions, power consumption between the two devices is basically identical.

Heat and variability

There are two potential factors that could be causing Samsung devices to exhibit poor performance under load as compared to TSMC equivalents. The first, which we alluded to in our initial article, is heat. Transistors that are packed together more tightly naturally concentrate more heat into smaller areas. There’s a clear and known relationship between heat and power consumption, and while the exact relationship varies from chip to chip and node to node, it’s well-known that temperature has a significant impact.

Image by Anandtech forum user idontcare

The second factor that comes into play here is variability. It’s important to understand that while we talk about Apple building an A9 processor in the same way that we might discuss Ford building an engine, there are some critical differences between the two. When TSMC, Intel, or Samsung builds a wafer of chips, they don’t automatically “know” what kind of chips they have. Each company will test their silicon to determine how good (or bad) the wafer is. Good chips are those that can run at the target voltage and clock speeds with desired power consumption levels. Great chips are those that can run at dramatically lower power consumption, or hit higher clock speeds, while bad chips are those that consume too much power or simply can’t reach target frequencies.

Each company has different methods of recovering useful dies from poor samples, whether that means disabling some of the cache, one of the cores, or using the chip in a desktop system where battery power isn’t such a concern. The important thing to understand is that variability has been getting steadily worse with every product generation. To understand why, consider a hypothetical scenario in which a “good” transistor contains between 100-200 atoms of a material, a “great” transistor contains between 140-160 atoms, and a bad transistor (that won’t meet desired specifications) has either less than 100 or more than 200. In this example, these numbers correspond to an older process node — say, 45nm.

AppleA9

Now, imagine this same situation, but with very different numbers. In our second example, a good transistor contains between 20 and 40 atoms of a doping material, a great transistor has between 28 – 32 atoms, and a bad transistor is any transistor with less than 20 or more than 40. It’s much, much harder to control the distribution of 20 atoms than it is to control the distribution of 100 atoms. Remember, that since 14nm chips have much more transistors than 45nm chips, it’s not just a question of tighter control — you have to be more perfect to keep fail rates under control. This is why modern chips are sometimes designed with built-in logic redundancy — if one component of a chip doesn’t pass muster, you’ve got duplicate units ready to go.

Here’s what this means, in aggregate: While we are certain that Apple still strictly targets certain ranges for its parts, we’d expect to see greater variation in run-time and battery life between TSMC and Samsung hardware because even a company has legendarily strict as Apple has to accept the laws of physics.

What does this mean for TSMC vs. Samsung?

Thus far, Apple’s official position is that there is no difference between TSMC and Samsung devices. We suspect that if the company breaks from this stance, it will be because of heat differences between the two devices, rather than performance metrics. There are subtle ways to adjust performance to cut down on skin temperature, and it may be possible to create power rules for the Samsung devices that are different than those used for TSMC.

The one thing we’ll stick to is that this variation is almost certainly why Apple was forced to dual source its hardware in the first place. What will be interesting is seeing whether or not this issue continues with later iterations of the phone. Samsung and TSMC are both consistently improving yield on 16/14nm, which means we’ll see those improvements reflected in devices — even if Apple never announces that its later products have better power consumption or lower temperatures compared with the newer ones.

Tagged , , , , , , , , , , , , ,

Microfluidic cooling yields huge performance benefits in FPGA processors

As microprocessors have grown in size and complexity, it’s become increasingly difficult to increase performance without skyrocketing power consumption and heat. Intel’s CPU clock speeds have remained mostly flat for years, while AMD’s FX-9590 and its R9 Nano GPU both illustrate dramatic power consumption differences as clock speeds change. One of the principle barriers to increasing CPU clocks is that it’s extremely difficult to move heat out of the chip. New research into microfluidic cooling could help solve this problem, at least in some cases.

Microfluidic cooling has existed for years; we covered IBM’s Aquasar cooling system back in 2012, which uses microfluidic channels — tiny microchannels etched into a metal block — to cool the SuperMUC supercomputer. Now, a new research paper on the topic has described a method of cooling modern FPGAs by etching cooling channels directly into the silicon itself. Previous systems, like Aquasar, still relied on a metal transfer plate between the coolant flow and the CPU itself.

Here’s why that’s so significant. Modern microprocessors generate tremendous amounts of heat, but they don’t generate it evenly across the entire die. If you’re performing floating-point calculations using AVX2, it’ll be the FPU that heats up. If you’re performing integer calculations, or thrashing the cache subsystems, it generates more heat in the ALUs and L2/L3 caches, respectively. This creates localized hot spots on the die, and CPUs aren’t very good at spreading that heat out across the entire surface area of the chip. This is why Intel specifies lower turbo clocks if you’re performing AVX2-heavy calculations.

FPGA-Microchannel

By etching channels directly on top of a 28nm Altera FPGA, the research team was able to bring cooling much closer to the CPU cores and eliminate the intervening gap that makes water-cooling less effective then it would otherwise be. According to the Georgia Institute of Technology, the research team focused on 28nm Altera FPGAs. After removing their existing heatsink and thermal paste, the group etched 100 micron silicon cylinders into the die, creating cooling passages. The entire system was then sealed using silicon and connected to water tubes.

“We believe we have eliminated one of the major barriers to building high-performance systems that are more compact and energy efficient,” said Muhannad Bakir, an associate professor and ON Semiconductor Junior Professor in the Georgia Tech School of Electrical and Computer Engineering. “We have eliminated the heat sink atop the silicon die by moving liquid cooling just a few hundred microns away from the transistors. We believe that reliably integrating microfluidic cooling directly on the silicon will be a disruptive technology for a new generation of electronics.”

Could such a system work for PCs?

The team claims that using these microfluidic channels with water at 20C cut the on-die temperature of their FPGA to just 24C, compared with 60C for an air-cooled design. That’s a significant achievement, particularly given the flow rate (147 milliliters per minute). Clearly this approach can yield huge dividends — but whether or not it could ever scale to consumer hardware is a very different question.

As the feature image shows, the connect points for the hardware look decidedly fragile and easily dislodged or broken. The amount of effort required to etch a design like this into an Intel or AMD CPU would be non-trivial, and the companies would have to completely change their approach to CPU heat shields and cooling technology. Still, technologies like this could find application in HPC clusters or any market where computing power is at an absolute premium. Removing that much additional heat from a CPU die would allow for substantially higher clocks, even with modern power consumption scaling.

Tagged , , , ,

Smaller, Faster, Cheaper, Over: The Future of Computer Chips

At the inaugural International Solid-State Circuits Conference held on the campus of the University of Pennsylvania in Philadelphia in 1960, a young computer engineer named Douglas Engelbart introduced the electronics industry to the remarkably simple but groundbreaking concept of “scaling.”

Dr. Engelbart, who would later help develop the computer mouse and other personal computing technologies, theorized that as electronic circuits were made smaller, their components would get faster, require less power and become cheaper to produce — all at an accelerating pace.

Sitting in the audience that day was Gordon Moore, who went on to help found the Intel Corporation, the world’s largest chip maker. In 1965, Dr. Moore quantified the scaling principle and laid out what would have the impact of a computer-age Magna Carta. He predicted that the number of transistors that could be etched on a chip would double annually for at least a decade, leading to astronomical increases in computer power.

Photo

The Nehalem wafer processor, introduced by Intel in 2008. CreditIntel

His prediction appeared in Electronics magazine in April 1965 and was later called Moore’s Law. It was never a law of physics, but rather an observation about the economics of a young industry that ended up holding true for a half-century.

One transistor, about as wide as a cotton fiber, cost roughly $8 in today’s dollars in the early 1960s; Intel was founded in 1968. Today, billions of transistors can be squeezed onto a chip the size of a fingernail, and transistor costs have fallen to a tiny fraction of a cent.

That improvement — the simple premise that computer chips would do more and more and cost less and less — helped Silicon Valley bring startling advances to the world, from the personal computer to the smartphone to the vast network of interconnected computers that power the Internet.

In recent years, however, the acceleration predicted by Moore’s Law has slipped. Chip speeds stopped increasing almost a decade ago, the time between new generations is stretching out, and the cost of individual transistors has plateaued.

Technologists now believe that new generations of chips will come more slowly, perhaps every two and a half to three years. And by the middle of the next decade, they fear, there could be a reckoning, when the laws of physics dictate that transistors, by then composed of just a handful of molecules, will not function reliably. Then Moore’s Law will come to an end, unless a new technological breakthrough occurs.

To put the condition of Moore’s Law in anthropomorphic terms, “It’s graying, it’s aging,” said Henry Samueli, chief technology officer for Broadcom, a maker of communications chips. “It’s not dead, but you’re going to have to sign Moore’s Law up for AARP.”

In 1995, Dr. Moore revised the doubling rate to two-year intervals. Still, he remains impressed by the longevity of his forecast: “The original prediction was to look at 10 years, which I thought was a stretch,” he said recently at a San Francisco event held to commemorate the 50th anniversary of Moore’s Law.

But the ominous question is what will happen if that magic combination of improving speeds, collapsing electricity demand and lower prices cannot be sustained.

The impact will be felt far beyond the computer industry, said Robert P. Colwell, a former Intel electrical engineer who helped lead the design of the Pentium microprocessor when he worked as a computer architect at the chip maker from 1990 to 2000.

“Look at automobiles, for example,” Dr. Colwell said. “What has driven their innovations over the past 30 years? Moore’s Law.” Most automotive industry innovations in engine controllers, antilock brakes, navigation, entertainment and security systems have come from increasingly low-cost semiconductors, he said.

These fears run contrary to the central narrative of an eternally youthful Silicon Valley. For more than three decades the industry has argued that computing will get faster, achieve higher capacity and become cheaper at an accelerating rate. It has been described both as “Internet time” and even as the Singularity, a point at which computing power surpasses human intelligence, an assertion that is held with near religious conviction among many in Silicon Valley.

Photo

When you’re thinking that big, bumping into the limits of physics could be a most humbling experience.

“I think the most fundamental issue is that we are way past the point in the evolution of computers where people auto-buy the next latest and greatest computer chip, with full confidence that it would be better than what they’ve got,” Dr. Colwell said.

The Limits of Physics

Chips are made from metal wires and semiconductor-based transistors — tiny electronic switches that control the flow of electricity. The most advanced transistors and wires are smaller than the wavelength of light, and the most advanced electronic switches are smaller than a biological virus.

Chips are produced in a manufacturing process called photolithography. Since it was invented in the late 1950s, photolithography has constantly evolved. Today, ultraviolet laser light is projected through glass plates that are coated with a portion of a circuit pattern expressed in a metal mask that looks like a street map.

Each map makes it possible to illuminate a pattern on the surface of the chip in order to deposit or etch away metal and semiconducting materials, leaving an ultrathin sandwich of wires, transistors and other components.

The masks are used to expose hundreds of exact copies of each chip, which are in turn laid out on polished wafers of silicon about a foot in diameter.

Machines called steppers, which currently cost about $50 million each, move the mask across the wafer, repeatedly exposing each circuit pattern to the surface of the wafer, alternately depositing and etching away metal and semiconducting components.

A finished computer chip may require as many as 50 exposure steps, and the mask must be aligned with astonishing accuracy. Each step raises the possibility of infinitesimally small errors.

“I’ve worked on many parts of the semiconductor process,” said Alan R. Stivers, a physicist whose career at Intel began in 1979 and who helped introduce a dozen new semiconductor generations before retiring in 2007. “By far, lithography is the hardest.”

To build devices that are smaller than the wavelength of light, chip makers have added a range of tricks like “immersion” lithography, which uses water to bend light waves sharply and enhance resolution. They also have used a technique called “multiple pattern” lithography, which employs separate mask steps to sharpen the edges and further thin the metal wires and other chip components.

Photo

As the size of components and wires have shrunk to just a handful of molecules, engineers have turned to computer simulations that require tremendous computational power. “You are playing tricks on the physics,” said Walden C. Rhines, chief executive of Mentor Graphics, a Wilsonville, Ore., design automation software firm.

If that scaling first described by Dr. Engelbart ends, how can big chip companies avoid the Moore’s Law endgame? For one, they could turn to software or new chip designs that extract more computing power from the same number of transistors.

And there is hope that the same creativity that has extended Moore’s Law for so long could keep chip technology advancing.

If silicon is, in the words of David M. Brooks, a Harvard University computer scientist, “the canvas we paint on,” engineers can do more than just shrink the canvas.

Silicon could also give way to exotic materials for making faster and smaller transistors and new kinds of memory storage as well as optical rather than electronic communications links, said Alex Lidow, a physicist who is chief executive of Efficient Power Conversion Corporation, a maker of special-purpose chips in El Segundo, Calif.

There are a number of breakthrough candidates, like quantum computing, which — if it became practical — could vastly speed processing time, and spintronics, which in the far future could move computing to atomic-scale components.

Recently, there has been optimism in a new manufacturing technique, known as extreme ultraviolet, or EUV, lithography. If it works, EUV, which provides light waves roughly a tenth the length of the shortest of the light waves that make up the visible spectrum, will permit even smaller wires and features, while at the same time simplifying the chip-making process.

But the technology still has not been proved in commercial production.

Earlier this year ASML, a Dutch stepper manufacturer partly owned by Intel, said it had received a large order for EUV steppers from a United States customer that most people in the industry believe to be Intel. That could mean Intel has a jump on the rest of the chip-making industry.

Intel executives, unlike major competitors such as Samsung and Taiwan Semiconductor Manufacturing Company, or TSMC, insist the company will be able to continue to make ever-cheaper chips for the foreseeable future. And they dispute the notion that the price of transistors has reached a plateau.

Yet while Intel remains confident that it can continue to resist the changing reality of the rest of the industry, it has not been able to entirely defy physics.

Photo

“Intel doesn’t know what to do about the impending end of Moore’s Law,” said Dr. Colwell.

In July, Intel said it would push back the introduction of 10-nanometer technology (a human hair, by comparison, is about 75,000 nanometers wide) to 2017. The delay is a break with the company’s tradition of introducing a generation of chips with smaller wires and transistors one year, followed by adding new design features the next.

“The last two technology transitions have signaled that our cadence is closer to two and a half years than two years,” Brian Krzanich, Intel’s chief executive, said in a conference call with analysts.

No More ‘Free Ride’

The glass-is-half-full view of these problems is that the slowdown in chip development will lead to more competition and creativity. Many semiconductor makers do not have the state-of-the-art factories now being designed by four chip manufacturers, GlobalFoundries, Intel, Samsung and TSMC.

The delays might allow the trailing chip makers to compete in markets that don’t require the most bleeding-edge performance, said David B. Yoffie, a professor at Harvard Business School.

And even if shrinking transistor size doesn’t make chips faster and cheaper, it will lower the power they require.

Ultra-low-power computer chips that will begin to appear at the end of this decade will in some cases not even require batteries — they will be powered by solar energy, vibration, radio waves or even sweat. Many of them will be sophisticated new kinds of sensors, wirelessly woven into centralized computing systems in the computing cloud.

What products might those chips lead to? No one knows yet, but product designers will be forced to think differently about what they’re building, rather than play a waiting game for chips to get more powerful. Thanks to Moore’s Law, computers have gotten smaller and smaller but have essentially followed the same concept of chips, hardware and software in a closed box.

“In the past, designers were lazy,” said Tony Fadell, an electrical engineer who headed the team that designed the original iPod, and led the hardware design of the iPhone before founding Nest Labs, a maker of smart home devices like thermostats and smoke alarms.

Carver Mead, the physicist who actually coined the term Moore’s Law, agrees. “We’ve basically had a free ride,” he said. “It’s really nuts, but that’s what paid off.”

Indeed, a graying Moore’s Law could be alive and well for at least another decade. And if it is not, humans will just have to get more creative.

Tagged , , , , ,

Intel to invest $50 million in quantum computer research

Intel CEO Brian Krzanich released an open letter today, pledging to dedicate $50 million to long-term research of quantum computing. The CPU giant is partnering with TU Delft, the largest and oldest Dutch public technical university, and will work with QuTech, TU Delft’s quantum research institute. Intel is also pledging to dedicate its own resources and engineers to solving the problems of quantum computing.

It might seem odd to see Intel pumping so much money into quantum computing research, given that D-Wave’s systems have been tested and largely verified to be quantum computers. D-Wave’s devices, however, have some significant limitations. The number of Qubits has grown fairly quickly, but the total number of connections between the Qubits hasn’t scaled at the same rate — and it’s the connections between Qubits that dictate the complexity and nature of the problems the computer can actually solve. D-Wave systems are sparsely connected, which vastly simplifies routing and construction but also limits the real-world use cases of the computer.

106E2x900y900

D-Wave’s devices are one type of quantum computer, called an annealer, but it’s not the only type of quantum computer that might be theoretically constructed, nor universally the best for every kind of potential task. The challenges of building these devices, however, are considerable. Because quantum computation is extremely easy to disrupt, D-Wave uses liquid nitrogen to cool its hardware. Intel hasn’t stated which kind of devices it wants to investigate, but room-temperature quantum computing isn’t possible (at least, not as far as we know).

These types of computers, then, aren’t the kind of hardware that slots into a smartphone or that you’re likely to have sitting on your desk. In some ways, a functional quantum computer would resemble the hardware of the 1950s and 60s — huge installations with enormous power needs, fixed locations, and high operating costs. The reason that Intel and other manufacturers are so interested in building them anyway is because quantum computers can be used to solve certain problems that are so fiendishly difficult, it would require billions or trillions of years to accurately answer them using traditional transistors and cutting-edge equipment.

Quantum_Computing

Even if you think Moore’s law will pick up steam again at some point, the time scales involved make conventional transistors ill-suited to the task. As the Intel-provided infographic above points out, there are a number of other specialized applications for quantum computing as well, such as theoretically unbreakable cryptography (with the side effect that any existing cryptographic scheme can be trivially broken by full-scale quantum computing.

As early quantum computers come online, we’re beginning to get a basic sense of how quickly they can operate and what types of problems they solve best. Ars Technica recently coveredrecent updates to ongoing efforts to benchmark D-Wave systems that illustrate how understanding how a quantum computer works, and what kinds of answers it can provide, significantly changes the way we benchmark and test such systems. Ongoing research into the practical systems we can build today will guide further work on the blue-sky projects of tomorrow. As Krzanich notes, “Fifty years ago, when Gordon Moore first published his famous paper, postulating that the number of transistors on a chip would double every year (later amending to every two years), nobody thought we’d ever be putting more than 8 billion of them on a single piece of silicon. This would have been unimaginable in 1965, and yet, 50 years later, we at Intel do this every day.”

The physics of liquid nitrogen make it unlikely that we’ll have quantum smartphones 50 years from now — but that doesn’t mean quantum hardware won’t be pushing the frontiers of human knowledge and our understanding of the universe.

Tagged , , , , , ,

Qualcomm’s new mobile chip will learn how to identify malicious apps

In recent months it seems like machine learning has been primarily used to make nightmarish eye-riddled hellscapes and misidentify Star Trek planets as waffle irons. But we’re not just teaching machines to identify patterns for our own amusement — they could make also make our lives easier. Qualcomm’s new Smart Protect technology could be one such example. The chip maker today detailed the new feature, available on its upcoming Snapdragon 820 processor: a hardware-based anti-malware solution that Qualcomm says will monitor the behavior of apps on a device, detecting and classifying any that are deemed suspicious or anomalous.

Currently most anti-malware apps available on mobile devices rely on a list of known threats, meaning that malicious software can be fairly easily tweaked to bypass their security measures. Rather than relying on these lists to identify nefarious software, Smart Protect will monitor what’s actually happening on your smartphone, tablet, or other mobile device, making it possible to warn users of unexpected activity. Asaf Ashkenazi, director of Qualcomm’s product management, says users will get “nearly instantaneous notifications of detected privacy violations and malicious activity,” and because the technology is baked into the hardware itself, these reports will be possible offline and without draining your phone’s battery excessively.

The feature is set to become available on Qualcomm’s Snapdragon 820 processors when they launch next year. The company says it’s already working with security firms, including Avast, AVG, and Lookout, using an API to tie Smart Protect into their commercially available apps, meaning users will be able to take advantage of its capabilities.

Tagged , , , , , , , , , , , , ,

Qualcomm’s new Hexagon 680 DSP: Fast, efficient, shipping with Snapdragon 820

The annual engineering and technical conference known as Hot Chips kicked off yesterday, and Qualcomm was out in front to detail its new DSP, the Hexagon 680. Digital Signal Processors (DSPs) aren’t something we’ve discussed much at ExtremeTech, and Qualcomm is putting a major marketing push behind their DSP technology for the first time. How does the chip work, what makes it an integral part of Snapdragon 820, and how does it advance heterogeneous computing?

DSPs are specialized processors dedicated to digital signal processing. Like GPUs, DSPs are designed to exploit parallelism. Like CPUs, they often make use of SIMD (single instruction, multiple data) and VLIW processing to boost throughput and total performance per watt. Also like GPUs, DSPs are designed to perform a very specific subset of tasks. CPUs can handle these tasks (and sometimes do), but DSPs offer better performance than general processors, and more flexibility than a traditional ASIC. This relationship is captured in the slide below:

Qualcomm’s Hexagon 680 DSP

Qualcomm’s Hexagon 680 is designed to accelerate certain workloads at performance efficiencies well above anything a modern CPU can offer. The Hexagon 680 is a VLIW (Very Long Instruction Word) processor, meaning it’s designed to extract maximum parallelism per clock cycle and to spread workloads across a wide set of execution units.

ThreadingModel

The 680 DSP offers four parallel scalar threads, each with 4-way VLIW support and a shared L1/L2. Each of these scalar groups is clocked at 500MHz for a maximum throughput of 2GHz-equivalent worth of processing. On the vector side of the equation, the 680 has 32 1024-bit vector registers. Each instruction can address up to four of these per cycle, for a maximum output of 4096 bits per cycle per instruction. It also includes support for Qualcomm’s new Hexagon Vector Instructions, or HVX. The HVX registers can be controlled by any two of the scalar registers.

Here’s what this means in aggregate: The Hexagon 680 is designed to allow for extensive threading and to share data across the L1 and L2 caches. There’s no penalty to using the HVX units and the scalar units simultaneously, provided that the workload is designed for it. The vector processors don’t have access to L1, but treat L2 as their first level of memory. L1 and L2 are kept coherent and data can be streamed into L2 from DDR memory at up to 1.2Gpixels/s. This supports some of the advanced capabilities of the Hexagon 680 (we’ll talk about these below).

According to Qualcomm, the performance advantages of these new features is enormous. While this data is provided by the company and should be taken with a grain of salt, there’s nothing outlandish here. These kinds of accelerations are typical when moving to a high-end dedicated chip as opposed to executing code on a general-purpose CPU.

DSP benchmarks

Qualcomm believes that the programming model for the Hexagon 680 is similar enough to CPU models to allow programmers to use the hardware effectively, but with significant overall improvements.

DSP-vs-CPU

Power consumption should also be much reduced, thanks to the simpler nature of the VLIW model and use of L2 for vector processing rather than both the L1 and L2. The company also notes that by adopting its DSP for low frequencies, it can cut leakage current and reduce overall power consumption.

Applications and heterogeneous computing

The best application processor on Earth isn’t worth much without applications to run on it, but the Hexagon 680 DSP delivers on this front as well. Qualcomm claims that the new chip is fully heterogeneous, meaning it can share data between CPU, GPU, and the DSP. Qualcomm is also a founding member of AMD’s HSA consortium, and while it isn’t calling its heterogeneous compute model by that name, we expect the two to be similar on a conceptual level. The DSP inside the Snapdragon 820 can be used to render AR or VR, tapped for better video playback and encoding, or used by the camera for extensive improvements in low-light photography. Alternately, HVX can be used to enhance detail in standard photos, as shown below.

Enhance. Enhance. Enhance.

Qualcomm has stated that the Hexagon 680 can perform low light enhancement 3x faster than a Krait SoC, while using 1/10 as much power. Programmers will be able to use the DSP and write applications to run on it, which could give the Snapdragon 820 platform a substantial leg up over the competition. DSPs have shipped on SoCs for a long time, but few companies spend as much time talking up their solutions as part of a heterogeneous compute platform as Qualcomm has.

In the past, a component like the DSP would be invisible, buried under interest in the CPU and GPU. Qualcomm’s decision to talk about the chip is a sign of the times. As visual processing, augmented reality, and virtual reality take the stage, more and more consumers expect advanced capabilities from their smartphones. For lower-tech users, that means high quality photos and video, while gamers and enthusiasts want cutting-edge performance and better battery life. The Hexagon 680 DSP is meant to speak to all these needs, with power efficiency that will beat even the upcoming Kryo CPU, flexibility and heterogeneous compute capability to whet the appetites of programmers and application developers, and performance that appeals to enthusiasts, gamers, and the general public.

After these disclosures, the Kryo is the last piece of the puzzle still to drop into place. Hopefully we’ll have details on the CPU core sooner rather than later.

Tagged , , , , , ,

Intel will support FreeSync standard with future GPUs

Currently, there are two competing display standards that can provide smoother gameplay and refresh rates synchronized to GPU frame production — Nvidia’s proprietary G-Sync standard, and the VESA-backed Adaptive-Sync (AMD calls this FreeSync, but it’s exactly the same technology). We’ve previously covered the two standards, and both can meaningfully improve gaming and game smoothness. Now, Intel has thrown its own hat into the ring and announced that it intends to support the VESA Adaptive-Sync standard over the long term.

This is a huge announcement for the long-term future of the Adaptive-Sync. Nvidia’s G-Sync technology is specific to their own GeForce cards, though a G-Sync monitor still functions normally if hooked to an Intel or AMD GPU. The theoretical advantage of Adaptive-Sync / FreeSync is that it can be used with any GPU that supports the VESA standard — but since AMD has been the only company pledging to do so, the practical situation has been the same as if AMD and Nvidia had each backed their own proprietary tech.

AMD FreeSync

Intel’s support changes that. Dwindling shipments of low-end discrete GPUs in mobile and desktop have given the CPU titan an ever-larger share of the GPU market, which means that any standard Intel chooses to back has a much greater chance of becoming a de factostandard across the market. This doesn’t prevent Nvidia from continuing to market G-Sync as its own solution, but if Adaptive-Sync starts to ship standard on monitors, it won’t be a choice just between AMD and Nvidia — it’ll be AMD and Intel backing a standard that consumers can expect as default on most displays, while Nvidia backs a proprietary solution that only functions with its own hardware.

Part of what likely makes this sting for Team Green is that its patent license agreement with Intel will expire in 2016. Back in 2011, Intel agreed to pay Nvidia $1.5 billion over the next five years. That’s worked out to roughly $66 million per quarter, and it’s high-margin cash — cash Nvidia would undoubtedly love to replace with patent agreements with other companies. There’s talk that the recent court cases against Samsung and Qualcomm over GPU technology have been driven by this, but Nvidia likely love to sign a continuing agreement with Intel to allow the company to offer G-Sync technology on Intel GPUs. If Intel is going to support Adaptive-Sync, it’s less likely that they’d take out a license for G-Sync as well.

The only fly in the ointment is the timing. According to Tech Report, no current Intel GPU hardware supports Adaptive-Sync, which means we’re looking at a post-Skylake timeframe for support. Intel might be able to squeeze the technology into Kaby Lake, with its expected 2016 debut date, but if it can’t we’ll be waiting for Cannonlake and a 2017 timeframe. Adaptive-Sync and G-Sync are most visually effective at lower frame rates, which means gaming on Intel IGPs could get noticeably smoother than we’ve seen in the past. That’s a mixed blessing for AMD, which has historically relied on superior GPU technology to compete with Intel, but it’s still an improvement over an AMD – Nvidia battle where NV holds the majority of the market share.

Tagged , , , , , , , , , ,

To inspire software and hardware developers, Intel gets bold and very weird

At its annual developer conference, the chipmaker lays out the future and asks the tech community to help make it reality.

It was about 10 seconds into the robotic spider dance that you had to remind yourself you were watching a presentation by the world’s largest chipmaker, Intel.

CEO Brian Krzanich had just finished his hour and a half keynote address Tuesday at the annual Intel Developer Forum here by discussing not the company’s bread and butter — its processing chips that power the brains of modern-day computers — but wacky and outlandish proof-of-concepts. The series of technical demonstrations included a vending machine that could remember your face and keep track of the food you like, a full-length mirror that could change the color of your clothing in real time, and a smartphone in collaboration with Google that can see and 3D map of a room.

Oh, and spiders. There were a lot of them, all capable of being controlled in orchestral fashion with a single hand’s gesture. Krzanich realized the implication of impending arachnid Armageddon, and introduced the eight-legged companions by showing a video clip by comedian Jimmy Fallon, poking fun at the obvious terror.

In all, the demonstrations were meant to send a message: Intel has a vision for the future, and it wants to be the company that provides the tools for getting us there. That’s a stark contrast to what Intel used to talk about at these events: The latest chips, its newest production facilities and the newest computers being powered by it all.

The reason is that processors just aren’t as exciting as they used to be.

The Santa Clara, California, chipmaker grew to a workforce of more than 105,000 people with nearly $56 billion in sales last year largely on the worldwide popularity of the PC. But that was in the old days. Now, the PC is trying to stave off flat or falling sales that have dogged companies in the past two years. The tech industry, meanwhile, has instead focused on mobile phones and the many so-called “smart” devices, such as sensors and other hardware that Apple’s iPhone and Google’s Android have ushered in.

Tagged , , , , , , , , , ,

What is Moore’s Law?

If you’ve been around the internet for longer than Jayden Smith, you’re probably familiar with Moore’s Law. It’s often misquoted, often misunderstood, but its “law” status is rarely questioned. The most general possible way to state Moore’s Law is this: computing power tends to approximately double every two years. It gained notoriety because people like laws that let them predict the future of one of the world’s biggest industries, but the very physical basis for this principle means it is slightly different — and less reliable — than many people believe.

Though he did not give it that name, Moore’s Law was first proposed in a magazine article by Intel co-founder Gordon E. Moore. What it actually says is that the number of transistors that can be packed into a given unit of space will roughly double every two years. That prediction has remained impressively true, a fact that’s allowed everything from pocket-sized smartphones to Crysis 3, and the continuing computerization of the economy.

MooresLaw2

Yet, stated as a precaution about human abilities in physical manufacturing, and divorced from rather airy ideas like “computing power,” it becomes clear why Moore’s Law won’t necessarily always hold true. Remember that when Moore made his original prediction, he predicted a doubling every year, but he quickly amended this to every two years. Physical limitations on the manufacturing of these chips could easily push that number back to five years or more, effectively invalidating Moore’s Law forever, and revealing it to be nothing more than Moore’s Very Good But Ultimately Limited Prediction (MVGBULP).

moore 2

Today, all consumer processors are made out of silicon — the second most abundant element in the Earth’s crust, after oxygen. But silicon is not a perfect conductor, and limits to the mobility of the electrons it carries impose a hard limit on how densely you can pack silicon transistors. Not only does power consumption come a huge issue, but an effect called quantum tunneling can cause problems for keeping electrons contained beyond a certain thickness threshold.

Outside of research facilities, silicon transistors don’t currently get smaller than 14 nanometers — and while some 10 nanometer chips designs might someday reach the market, it’s seen as a foregone conclusion that to keep to Moore’s Law over a long period of time, we’ll have to come up with newer and better materials to be the basis of next generation computers.

One oft-cited example is graphene, or the rolled up tubes of graphene called carbon nanotubes. Graphene is “atomically thin,” often called two-dimensional, and so it allows a huge increase on the physical side of things. On the other hand, graphene does not have a useful bandgap — the energy difference we need to navigate to bump electrons back and forth between the conducting and non-conducting bands. That’s how silicon transistors switch on and off, which is the entire basis for their method of computation.

If this problem can’t be offset in some way, a graphene computer would have to pioneer a whole new logical method for computing. One graphene computer chip from IBM proved to be incredibly fast, 10,000 times faster than a silicon chip — but it was not a general-purpose processor. Since graphene can’t be easily switched on and off in mass quantities, we can’t simply swap in graphene for silicon and keep on with modern chip architectures.

Sebastian Anthony holding a wafer of graphene chips at IBM Research

Other materials may offer more practical reductions in size and electrical resistance, and actually allow Moore’s Law to continue unbroken, but only if they hit the market quickly enough. Silicon-germanium, or just germanium alone, have been talked about for some time, but have yet to really materialize in any sort of affordable form. It was recently discovered that a material called titanium tri-sulfide can provide many of the same physical advantages as graphene, and do so with an achievable bandgap — such a super-material might be what’s needed, but graphene-like problems with manufacturing then rear their ugly heads.

Quantum computing could be another answer, but research is still so preliminary that it’s doubtful. Some believe they’ll offer such a huge and immediate upgrade over modern processors that computer encryption will come tumbling down. However, quantum computing won’t necessarily come in the form of a programmable digital computer right away; early quantum computers won’t be able to run Windows, even if they are more than fast enough in a theoretical sense. Of all the possible “solutions” to looming problems with Moore’s Law, quantum computing is probably the least realistic. It has a lot of potential for specific applications, but quantum PCs are still too far out to be worth considering.

Moore himself admitted that his Law “can’t continue forever” in a 2005 interview. It’s the nature of exponential functions, he said — they eventually hit a wall, and while that makes perfect sense in the purely hypothetical world of mathematics, it tends not to work out as well in the real world. It could be that Moore’s Law will hold up when viewed on the century scale, zoomed out to diminish the importance of any small fluctuations between new technologies. But the fact remains that right now, we’re entering a lull as we wait for the next great processing tech to arrive.

Tagged , , , , , ,