Processador ARM for server

Strategy e Roadmap deve ter relação com futuros processadores não anunciados, a 5 nm e seguintes. Provavelmente vão referir que o Altra Max já se encontra em clientes e que vai estar à venda, mas o principal não deve ser isso.
É bastante provável que vão alinhar o Roadmap com os futuros N e V da ARM.

8:00 PST são as 16:00 em Lisboa.
 

AWS Joins Arm to Support Arm-HPC Hackathon this Summer​

AWS is calling all grad students and post-docs who want to gain experience advancing the adoption of the Arm architecture in HPC to join a world-wide community effort led by the Arm HPC User’s Group (A-HUG). AWS is supporting this event along with our friends at Arm.

The event will take the form of a hackathon this summer and is aimed at getting open-source HPC codes to build and run fast on Arm-based processors, specifically AWS Graviton2.

To make it a bit more exciting, A-HUG will be awarding an Apple M1 MacBook to each member of the team (max. 4 people) that contribute the most back to the Arm HPC community.​

https://www.hpcwire.com/off-the-wire/aws-joins-arm-to-support-arm-hpc-hackathon-this-summer/

Arm and Julich Sign Multi-Year HPC Collaboration Agreement​

Arm and the Julich Supercomputing Centre (JSC) today signed a multi-year cooperation agreement.
The collaboration focuses on the analysis and optimization of strategic HPC applications on Arm-based HPC systems, including Arm accelerated platforms (for example, Arm+GPU). The joint team carries on performance analysis and code engineering, taking advantage of specific features of Arm-based hardware to further advance application performance. The code requirements identified in this effort will help in the design of future HPC technologies and systems.
https://www.hpcwire.com/off-the-wire/arm-and-julich-sign-multi-year-hpc-collaboration-agreement/
 
Não vi, mas aqui fica a apresentação da Ampere:

Interessante que o grande anuncio é exactamente o contrário do que pensei. :D
Eles anunciaram que vão criar Cores próprios, não baseados nos da ARM:
QHFsjCj.jpg


E vão virar-se para a Cloud:
2zKp9ff.jpg


kylYPkP.jpg


For Ampere to relinquish the reliance on Arm’s next-gen cores, and instead to rely on their own design and actually go forward with that switch in the next-gen product, shows a sign of great confidence in their custom microarchitecture design – and at the same time one could interpret it as a sign of no confidence in Arm’s Neoverse IP and roadmap. This comes at a great juxtaposition to what others are doing in the industry: Marvell has stopped development of their own ThunderX CPU IP in favour of adopting Arm Neoverse cores. On the other hand, not specifically related to the cloud and server market, Qualcomm earlier this year have acquired Nuvia, and their rationale and explanation was similar to Ampere’s in that they’re claiming that the new in-house design capabilities offered performance that otherwise wouldn’t have been possible with Arm’s Cortex CPU IP.

https://www.servethehome.com/ampere-planning-custom-arm-cores-at-5nm-and-beyond/
https://www.anandtech.com/show/16684/ampere-roadmap-full-custom-cores

Cheira-me que a estratégia é a mesma da Marvell e não vai ficar ninguém a vender servidores ARM no mercado Genérico.
 
Realmente é algo surpreendente, vamos a ver qual a diferença para os N2 ou V1.

Em relação ao "foco" é que não surpreende, o dinheiro definitivamente não está no mercado genérico.

Talking Chip With Ampere Computing CEO Renee James​

TPM: Well, what server CPU designers and manufacturers do is extremely hard to do. And that’s why when there are screw ups, I don’t go crazy with the criticism because I don’t think people really appreciate how improbable it is that this works at all.

Renee James: I shouldn’t say this to you, but every single time one comes back, and it boots – I mean, we do full of full-scale simulation now, so we know we’ve already booted Linux on the 5 nanometer part – but every time it turns on and everything works the way it’s supposed to, just go, thank you.
https://www.nextplatform.com/2021/05/19/talking-chip-with-ampere-computing-ceo-renee-james/
 

OCI Jumps into Arm with Instances and Aggressive Developer Program​

Oracle Cloud Infrastructure (OCI) today launched a multi-prong Arm initiative including instances (VM and bare metal) based on Ampere’s Altra microprocessor, and a three-tier Arm developer program seeking, among other things, to woo the huge base of mobile and IoT Arm-based developers to the cloud application space.
https://www.hpcwire.com/2021/05/25/...h-instances-and-aggressive-developer-program/
 
Tenho estado a experimentar uma Máquinas Virtual em KVM, com 4 vCPUs, a correr em cima de um Ampere Altra. O "lshw" reporta 2 Ghz e há um SKU com 64 Cores @ 2 Ghz, mas o "lshw" pode não estar a recolher bem essa informação.
A nível de uso, do que tenho experimentado, porta-se como qualquer outro computador na maioria dos casos. O único problema que encontrei até agora, foi com software com dependencias de Perl, visto que alguns pacotes de Perl não se encontram no repositório ARM e existem nos x86.

97PAGBk.png


Wy6Gt7V.png


NRAdp2H.png


Corri o Geekbench 5.4, que tem uma versão preview para Linux/ARM e é algo rápido de se executar:
KpmPOwS.png


Score:
EchE24s.png


https://browser.geekbench.com/v5/cpu/8271514

Comparação com um Xeon E3-1220 v6 (4 Cores sem HT):
A0EYGf9.png


https://browser.geekbench.com/v5/cpu/compare/6177788?baseline=8271514

Comparação com um Ryzen 3 4300U (4 Cores sem HT):
xqg2w1X.png


https://browser.geekbench.com/v5/cpu/compare/7527713?baseline=8271514
 
Where to Encode: A Performance Analysis of x86 and Arm-based Amazon EC2 Instances
Public clouds, such as Amazon EC2, provide a large portfolioof services and instances optimized for specific purposes andbudgets. The majority of Amazon’s instances usex86processors,such as Intel Xeon or AMD EPYC. However, following therecent trends in computer architecture, Amazon introducedArm-based instances that promise up to40%better cost performanceratio than comparablex86instances for specific workloads. Weevaluate in this paper the video encoding performance ofx86andArminstances of four instance families using the latest FFmpegversion and two video codecs. We examine the impact of theencoding parameters, such as different presets and bitrates, onthe time and cost for encoding. Our experiments reveal thatArminstances show high time and cost saving potential of up to33.63%for specific bitrates and presets, especially for thex264codec. However, thex86instances are more general and achievelow encoding times, regardless of the codec.
https://arxiv.org/pdf/2106.06242.pdf
 

NVIDIA, Partners Extending Arm Ecosystem from Exascale to the Edge​


NVIDIA Hands Arm New Tools

These are some of many Arm-based HPC initiatives that can take advantage of the NVIDIA HPC software development kit, a comprehensive suite of compilers, libraries and tools that simplify application development and porting to the Arm architecture. The SDK acts as a foundation for an accelerated Arm HPC ecosystem.
In addition, NVIDIA plans to support Scalable Vector Extensions in Arm’s Neoverse platform. SVE first debuted in Fujitsu’s A64FX that powers Fugaku,
Accelerated Arm Kit Coming in July

We’re also making it easier to create, evaluate and benchmark HPC and AI applications on accelerated Arm systems with the NVIDIA Arm HPC Developer Kit. It’s a platform available from NVIDIA and GIGABYTE in the form of software loaded on a server powered by an
Ampere Altra Arm-based CPU, NVIDIA A100 Tensor Core GPUs and NVIDIA BlueField-2 DPUs for accelerated networking.
https://www.hpcwire.com/off-the-wir...ding-arm-ecosystem-from-exascale-to-the-edge/
 
A história da "ARM China" está cada vez melhor. Ao nível das melhores novelas. Agora, fizeram uma apresentação como empresa independente.

Para situar o caso:

The Semiconductor Heist Of The Century | Arm China Has Gone Completely Rogue, Operating As An Independent Company With Inhouse IP/R&D​


As part of the emphasis on the Chinese market, SoftBank succumbed to pressure and formed a joint venture. In the new joint venture, Arm Holdings, the SoftBank subsidiary sold a 51% stake of the company to a consortium of Chinese investors for paltry $775M. This venture has the exclusive right to license Arm’s IP within China. Within 2 years, the venture went rogue. Recently, they gave a presentation to the industry about rebranding, developing their own IP, and striking their own independently operated path.

This firm is called “安谋科技”, and is not part of Arm Holdings.

This is the tech heist of the century.
Removing Allen Wu has proven to be very difficult. Despite a 7-1 vote by the Arm China board, the company seal was still held by Allen Wu. In China, the seal is a stamp which authorizes the person in possession to bind a company and its representatives with rights and obligations. Retrieving this seal and the business license would be a multiyear drawn-out legal process. Furthermore, it would mean at least some investors besides Arm must be along for the ride. The Chinese court system would need to agree with ousting an executive in favor of one that was hand selected by western influencers.
Despite formally being fired, Allen Wu has remained in power. He ousted executives that were loyal to Arm. He has even hired security paid for by Arm China that reports to him. This security has kept Arm out of the Arm China offices. Allen Wu has aggressively taken over the firm and is operating it how he sees fit.
One interesting tidbit is that Allen Wu sued Arm China in order to declare his dismissal illegal. He essentially sued himself as he represented both sides in that specific court case.
wow :D

A ARM parou o fluxo de informação, mas de nada tem servido:
Arm has retaliated by halting the transfer of any new IP. The latest CPU IP Arm China has is the Cortex A77. Major critical technologies such as the Neoverse server CPUs that make the backbone of Amazon Graviton and Ampere Computing have not been sent over the wall. In addition to these server CPU designs, many new developments in CPU, GPU, NPU, and fabrics have remained out of reach. The most important of these besides the server line itself, is the Armv9 instruction set. This is the new instruction set architecture that will power the next decade of high-performance Arm designs. Simultaneously, Arm has tried to appeal to the government stating that this is bad for the Chinese semiconductor industry.

fnTHpJ0.jpg


sayKgL3.png


This leads us to the present day, where Arm China held an event at which they formally declared their independence. They proclaimed that 安谋科技 is China’s largest CPU IP supplier. It was born from Arm, but is an independently operate, Chinese owned company.

The event comprised of cheering on 安谋科技 business. Some of the fanfare was emphasizing that 安谋科技 had a cumulative 20B shipments since formation. It has over 90 partners, 29 of which have achieved mass production of chips using the Arm IP. These shipments range from mobile, network infrastructure, 5G and IoT. They were developed by the company’s 400+ person R&D team that is based entirely in China.

Besides standing out and calling themselves an independent entity, they also announced new IP which was independently developed. It is called the XPU line. The IP blocks include NPUs, SPUs, ISPs, and VPUs, but they made it clear they will extend beyond this.

The NPU, neural processing unit, is especially interesting because Arm itself has also developed a range of AI geared IP. Some of that IP has not yet reached the doors of Arm China. 安谋科技 is forging ahead to have an inhouse source of IP and no longer rely on Arm. This is just the beginning, and who knows where 安谋科技 goes from here. Perhaps they even begin working on their own CPU core, GPU, and server designs.

Most of this IP is targeted at mobile or IoT type use cases. The SPU, security processing unit, is specifically geared to creating secure enclaves and being a management engine. The ISP, image signal processor, is meant to take inputs from cameras and process them into a digital form. It applies various techniques and operations to enhance the raw images. The ISP is geared to work with the NPU to analyze images and videos in order to identify people, objects, and events. These IP blocks are critical for emerging applications which will deploy billions of cameras in China over the next decade. Lastly, there is the video processing unit which is meant to encode and decode videos in common formats such as H264, VP9, and soon AV1.

Arm China, 安谋科技, is asserting their independence. It is the most publicized instance of a joint venture in China going rogue, but also the most dangerous one. Over the decades IP has been taken and replicated in China, but this may be the most brazen attempt yet.

Arm has been shaken to its core with the 2nd largest market snatched from underneath it. While they are the largest individual owner in this firm, they have no control or power over it. 安谋科技 has set out on its own path and begun to develop its own IP. The base of Arm’s old IP is not the end of their line. There are many questions swirling about what this means for a potential Nvidia takeover or IPO, but it is clear that SoftBank’s short sighted profit driven behavior has caused a massive conundrum.

https://semianalysis.substack.com/p/the-semiconductor-heist-of-the-century
 
Não sei porquê, mas já estava mesmo à espera disto, era apenas uma questão de tempo.

A real questão agora é o impacto disto na ARM Holding e o que se vai seguir a nível de companhias chinesas quem se irá seguir na "import blacklist". Se bem que a maioria nem se irá importar com isso.
 
Isto está ao nível de uma boa novela Mexicana.
Há ali mais uns detalhes na história. Por exemplo, o CEO da "ARM China", aquele "Allen Wu" que foi demitido pelo Board, tem nacionalidade.......Americana. :D

Dois pontos importantes nesta história:
  1. This venture has the exclusive right to license Arm’s IP within China.
    Se eu percebo bem, se uma empresa Chinesa quiser criar algo com o IP da ARM, tem que licenciar com esta "ARM China".
    Arm has retaliated by halting the transfer of any new IP. The latest CPU IP Arm China has is the Cortex A77.
    No entanto, se quiserem, por exemplo, criar um CPU a usar o A510, como é que fazem? Podem pagar à "ARM China", mas eles não têm a info desse IP. :D

  2. A nVidia também fica numa posição "interessante". A nVidia tem todo o interesse que as autoridades Chinesas aprovem a venda da ARM, mas ao mesmo tempo, não vão querer que o mercado Chinês fique fora do seu controlo.

Por ultimo, a "ARM China" até já tem o seu site com o IP que eles estão a criar:
EHKb5vH.png


https://translate.google.com/translate?sl=zh-CN&tl=en&u=https://www.armchina.com/ZhouYi.html

Está aqui um belo "Negócio da China". :D
 
Já aí andam as versões ES do Ampere Altra Max com 128 Cores @ 3 Ghz. :)
gKE0qZW.jpg


Ao lado do Epyc e Xeon:
IYSirFk.jpg


qDjGvBF.jpg


As Specs:
  • 128 Armv8.2+ 64-bit CPU cores up to 3.0GHz maximum
  • 64KB L1 I-cache, 64KB L1 D-cache per core
  • 1MB L2 cache per core
  • 16MB System Level Cache (SLC)
  • 2x full-width (128b) SIMD
  • Coherent mesh-based interconnect – Distributed snoop filtering
  • 8x 72-bit DDR4-3200 channels
  • ECC, Symbol-based ECC, and DDR4 RAS features
  • Up to 16 DIMMs and 4 TB/socket
  • Full interrupt virtualization (GICv3)
  • Full I/O virtualization (SMMUv3)
  • Enterprise server-class RAS
  • Up to 128 lanes of PCIe Gen4 per CPU
    • 4 x16 PCIe + 4 x16 PCIe/CCIX with
      Extended Speed Mode (ESM) support for data transfers at 20/25 GT/s\
    • 32 controllers to support up to 32 x4 links
    • 128 PCIe lanes in 1P configuration
    • 192 PCIe lanes in 2P configuration
  • Coherent multi-socket support
  • Q4 x16 CCIX lanes
SKUs:

• AC-212825002 (128 cores, 250 W)
• AC-212823002 (128 cores, 230 W)
• AC-212819002 (128 cores, 190 W)
• AC-211224002 (112 cores, 240 W)
• AC-211221002 (112 cores, 210 W)
• AC-211218002 (112 cores, 180 W)
• AC-209623502 (96 cores, 235 W)
• AC-209622002 (96 cores, 220 W)
• AC-209619002 (96 cores, 190 W)
• AC-209617002 (96 cores, 170 W)

https://www.servethehome.com/ampere-altra-max-m128-30-128-core-arm-cpu-in-the-wild/

São bonitos. :D

EDIT: 128 Cores com 250W TDP dá menos de 2W por core, a 3 Ghz......
 
A versão de 80 Cores é N7 da TSMC. Não penso que alterem isso para uma "variante".
O país que aparece nos processadores, normalmente é o local onde é feita a assemblagem final. Neste caso, deve ser feita na Coreia do Sul.
 
Pois, estava era a tentar lembrar-me de algum OSAT sul coreano, acho que não há nenhum, pelo menos no topo.
Ou são de Taiwan ou chineses, e por norma as bases de operações também tendem a ser aí
 
A Adlink lançou um Ampere Altra no formato COM

ADLINK Launches COM-HPC Server Modules with 80-Core Ampere Altra Arm-based SoCs for Embedded Applications​

COM-HPC Ampere Altra Key Features:
  • Arm Neoverse N1-based architecture
  • Scalable, from 32 to 80 Arm v8.2 64-bit cores (60 to 175 watts)
  • 768GB DDR4 with 6 individual memory channels for demanding workloads
  • 64x PCIe Gen4 lanes
  • edk2 bootloader with TianoCore / UEFI
  • Arm SystemReady SR: ready to install stock aarch64 Ubuntu 20.04, Yocto Linux
  • Gigabit Ethernet support: 4x 10GbE and 1x GbE
  • SOAFEE-compliant
https://www.hpcwire.com/off-the-wir...tra-arm-based-socs-for-embedded-applications/

COM-HPC-ALT_CPU-F.jpg

https://www.adlinktech.com/Products/Computer_on_Modules/COM-HPC/COM-HPC_Ampere_Altra?lang=en
 
A Netflix tem uma apresentação em que eles tentam servir 400 Gbit/s de tráfego em Streaming de Video, a partir de apenas 1 Servidor usando FreeBSD, vários Processadores, e 2 Placas de Rede da Mellanox (que agora pertence à nVidia).
Primeira curiosidade é que eles usam 18 SSDs WD SN720 NVMe, que são "apenas" Pci-Ex Gen3 e não Gen4 e não são de todo topo de gama. Os SSDs não são um Bottleneck.

Um dos processadores é o Ampere Altra Q80-30. 80 Cores@3 Ghz. Pela apresentação, a nível de Software/Debug, ainda é uma plataforma muito verde:
V9sOP1d.png


Tdpz44S.png


Os outros processadores são o AMD Epyc 7502P (Rome). 32 [email protected] Ghz e o Intel Xeon 8352V. 36 [email protected] Ghz.

Benchmarks quando os dados são encriptados pelos CPUs:
22usA1n.png


Benchmarks quando os dados são encriptados pelas Placas de Rede:
onNhDbV.png


Neste ultimo, não há resultados do Xeon, porque há uma opção na BIOS que ainda não está disponível, naquela plataforma.

A apresentação é bastante interessante porque devido ao Epyc não ser um Processador Monolítico, têm que fazer todo o tipo de tweaks com NUMA, para atingir melhor performance, algo que não acontece na plataforma da Ampere:
EdZELTo.png


r84oP70.png


Também já terão um protótipo para iniciar testes, para ver se conseguem atingir 800 Gbit/s a partir de apenas um Servidor:
f9ZkxwV.png


Algo já com Pci-Ex Gen5?
Link do PDF da Apresentação -> https://people.freebsd.org/~gallatin/talks/euro2021.pdf

É bastante interessante porque tentam puxar pelos limites dos processadores de hoje em dia, incluído o topo de gama do que existe hoje em ARM. :)
 
Última edição:
Ampere Altra Max M128-30 Linux Performance Preview
The past month we have started our testing of Ampere's Altra Max M128-30, the company's new 128 core server processor, and in this article today are our initial benchmarks of this promising chip for high core count servers in both 1P and 2P configurations tested.
Ampere Altra Max processors continue to use the Neoverse N1 cores, support eight channels of DDR4-3200 ECC memory, 128 lanes of PCIe Gen4, and other features in common with Altra. Like the Q80-33, the M128-30 does maintain the same 250 watt TDP rating.
Across a wide range of tests, the Ampere Altra Max M128-30 2P is proving to be very competitive with the AMD EPYC 7763 2P in workloads that scale well and make full use of ARMv8 capabilities. Given the Ampere Altra Q80-33 2P was already outperforming the Xeon Platinum 8380 2P Ice Lake server, it's to no surprise Ampere's lead there has widened even more substantially with now being able to offer 128 cores per socket.
https://www.phoronix.com/scan.php?page=article&item=ampere-altramax-benchmarks&num=1
 
Back
Topo