At the launch of the new iPhone 11, Apple gave some details on the new SoC A13. This has been optimized for machine learning. The NPU is expected to operate six times faster than its predecessor, and the processor, graphics processor and NPU work best together to perform machine learning tasks.
Compared to the Apple A12, which is currently in the latest iPhones, the A13 at first glance, in general, many similarities, even if all the screws were activated. Sri Santhanam, vice president of silicon engineering at Apple, said it was the "fastest smartphone processor" in the market.
Like A12, A13 is manufactured by TSMC in a 7nm process, but in its 2nd generation. The SoC now includes 8.5 billion transistors – a major upgrade from the A12, which had 6.9 billion transistors – and, like its predecessor, a 6-core architecture with a quad-core cluster of low processors consumption and two high-performance processors. Although we do not know details about their microarchitecture at the present time, it is clear from the reference figures explained below that their CPI (clock cycle commands) exceeds all processors previously known owners of the competition.
The A13, like its predecessor A12, still has a 6-core architecture, while, for example, Qualcomm has implemented an octa-core architecture with the Snapdragon 865 or Samsung with its Exynos chips. According to Apple, the two high-performance processors are 20% faster than their A12 counterparts and are around 30% more energy saving. The four low-power processors should also be 20% faster and even 40% more energy efficient. It's the same for the GPU (+ 20% computing power, 40% less energy consumption). Finally, like its predecessor, the NPU has eight cores and should have 20% more computing power and 15% less power.
Apple threw these integers into the room without specifying any operating point or workload. Although Apple certainly employs the best chip designers in Silicon Valley, these too can not do magic. Let's look at the individual elements.
First, there are improvements on the manufacturing side. TSMC has two improved 7nm processes that Apple could have used. In fact, apparently, the N7P process was chosen rather than the so-called 7+ process. The basic 7-nm process used for the A12 chip is called N7. TSMC can now use the N7 + process for its first customers, using EUV (Extreme Ultraviolet) lithography for some of the chip layers. Taiwanese claim that this will provide higher density chips (about 20% more logic on the same area) and energy efficiency (about 10% better).
TSMC also has a 7 nm "improved performance" process called N7P. It does not use any EUV and is simply an optimized version of the 7 nm process used in the A12. TSMC indicates that it allows energy consumption 10% lower at the same clock frequency or 7% higher clock frequency with the same power consumption. 39, energy (Figure 1).
So, N7 + is TSMC's superior manufacturing process, but there are many reasons to believe that for some reason, Apple had or had to use N7P. Perhaps the risk of using an EUV was still too high.
As the second set screw, the clock frequency was increased from 2.49 to 2.66 MHz. This represents about 6.8% and suggests that the potential of N7P (+ 7%) was practically exhausted with the same energy budget.
The A12 has increased the number of Apple transistors to 6.9 billion, representing an increase of 60% over the 4.3 billion of A11. But the surface of the chip was about 83 mm² less than the A11 (about 88 mm²) and far from the biggest chips ever built by Apple in an iPhone. In fact, it was the smallest iPhone processor – in terms of surface – for nine years. The old Apple SoCs were much larger and the A5s and A10s each had a size greater than 120 mm². For the A13 with its 8.5 billion transistors (+ 23%) and the same density, the chip area would have logically increased by 23% to about 102 mm2.
The first benchmark measurements of the Apple A13 processors with the famous Geekbench showed a higher score of 12.5% for single-threaded processors and only 1.78% for multithreaded mode (Figure 2). It remains to be seen if these values with the final version iOS13 can be improved. Obviously, the six cores probably can not operate in parallel with a maximum clock frequency for thermal reasons. Improvements due to changes in the microarchitecture therefore amount to "only" 5.7% adjusted time, which is for example well below the value reached by the arm year after year with its Cortex-A processors. However, it is fair to say that the absolute CIP of Apple's internal development has outperformed the hearts of arms as well as the internal developments of its competitors Huawei, Samsung or Qualcomm for generations.
The next question is how Apple spent the extra budget on transistors. Clearly in the direction of machine learning and image processing. Last year already, Apple had improved the neural network engine of the A12 much better than expected. The NPU A11 can perform 600 billion operations 8 bits per second and Apple has made the A12 about eight times faster at 5 billion operations 8 bits per second. If the NPU NPU actually worked six times faster, as announced by Apple, it would reach 30,000 billion operations 8 bits per second.
There is also a new component in the chip, internally referred to as the "AMX" or "Matrix" co-processor, to handle some mathematically difficult tasks. This can help with computer vision and augmented reality that make Apple one of the essential features of its mobile devices.
If you ask me, it sounds like NPU. Is it just a new name for the same feature or better branding for its advanced features? Or is it something quite different, a different type of mathematical co-processing unit or maybe a set of SIMD instruction sets extensions, as we see them? for years on desktop processors (SSE and AVX), allowing a six times faster matrix multiplication – the main task for machine learning models – can be carried out than the predecessor?
Without iPhone 11 (hacked), many questions remain unanswered regarding SoC A13. What is certain, however, is that high-performance processors have unprecedented single-task computing power, which is useful for many applications that are still not well sized on multiple cores. What the NPU is capable of, you will see if, according to Apple, several unique photos, recorded in parallel, allow to create an "optimal" graph.