Kirin 9010
Recently, Huawei came out with new flagship phone Pura 70. While the phone itself has numerous innovations like its industry leading retractable lens technology, the all new Kirin-9010 SoC is quite the interesting indicator of Huawei’s chip-designing capability.
First of all, 9010 basically retains the same core architecture as 9000S with 1 large core, 3 medium core and 4 small core. The large/medium Taishan cores are all SMT capable and show up on Harmony OS as a 12-core CPU
While medium and small cores operate at slightly faster frequencies, the big change is in the large core where core got larger and frequency dropped by 12%
When Pura 70 Ultra first came out, numerous Geekbench and other tests showed that its CPU performance improved by about 10% in spite of the lower frequencies
So how did Kirin-9010 surpass Qualcomm snapdragon 8+ in computation power, despite no major improvement in the semi process? Well, we can see this from just looking at the IPC number where 9010 was 25% higher than 9000S
Normalized to about the same frequency, Taishan large core on 9010 did 25% more INT IPCs and 20% more Float IPCs than large core on 9000S. In fact, It did more than ARM Cortex-X2 used on Snapdragon 8+ and is half way to ARM Cortex-X3 used on Snapdragon 8 Gen 2. Since each ARM’s self designed large core showed about 10% improvement over previous generation large core, the improvement in Large core from 9000S to 9010 is especially noticeable. So, what did Huawei do exactly?
First of all, it shifted from using a 6-wide decoder to 8-wide decoder. That’s quite significant since Cortex-X1 still uses 5-wide decoder and only A16’s Everest core had reached 8-wide decoder. Wider decoder definitely provided improved performance.
Aside from the wider core, new Large Taishan core simply had larger Integer and Floating computational area. Integer is up from 80 to 105 Entries while also increasing from 4 to 6 ALUs with slight smaller Re-Order Buffer
The number of Float entries increased from 56 to 96, while the number of FP & SIMO PRF increased from 160 to 222 Entries
The Loader and Storage Scheduler was also modified with some increases and decreases. The big change seem to be the much larger loader Queue at 163. This would indicate Huawei found certain component to be not fully utilized and made way for other components.
Overall, the large and more complex system would indicate that SMIC has a more mature process and is capable of more consistently making chips with more complex features while maintaining acceptable yield.'
Another interesting part is the larger L1D TLB while increased from 128 to 256 pages. L3 cache seems to have also increased significantly.
All of this has led to improved branch prediction with fewer misses. Note in the chart below that branch miss for Mate 60 Pro is 3.55%, while Pura 70 Ultra is 3.19% and iPhone Pro Max is 2.44%.
There are clearly still improvements that Huawei could make for the next iteration of flagship Kirin chip for Mate 70. While I expect it to use improved process, I think additional 5% CPU performance can be squeezed out from regular core and architecture design improvement. At some point, they are likely to shift to 10-wide decoders. But even aside from that, more L1 and L2 caches as well as Int/FP computation units can be featured. ARM and Apple continually make improvements to their core architectures. Huawei seems to be 1 to 2 years behind here. No reason they can’t keep iterating quickly.
Aside from that, GPU performance was relatively similar based on 3DMark.
This would indicate there was no real changes to Maleeon-910. Most people would speculate that next SoC is likely to feature a significantly improved Maleeon-920. That’s where the next SoC might improve the most.
Overall, CPU/large core improvement led 9010 to better CPU performance as well as lower power consumption. Note here that Pura 70 had better power consumption curve than Snapdragon 8 Gen 1, but worse than 8+
It did have higher overall score at the expense of more power consumption. If the rumours are to be believed, next Huawei SoC is likely to have similar CPU to power curve like 8 Gen 2. Consider that flagship phones from second half of last year was still being launched with that, Mate 70’s SoC would only be 1 year behind Qualcomm in performance despite having access to significantly worse process node.
That is something to celebrate for Huawei and SMIC. Now, of course CPU/GPU performance aren’t the only important part of a phone. What really matters are the phone endurance, app response time and phone speed and more. Huawei’s tight control over the entire phone allows Harmony OS to run lighter than similar Android OS. As such, it actually was shown to have the second best app response time of any tested flagship phones while having competitive endurance (both better than iPhone 15 Max Pro)
Really pretty good performance for not having access to TSMC fabs.