Gráfica AMD RDNA 3 (Radeon 7000 series)

Dark Kaeser · 29 de Abril de 2022

THREAD de DIÁRIO DE BOSTO -> https://forum.zwame.pt/threads/amd-rdna-3-radeon-rx-7000-diario-de-bordo.1077211/

AMD RDNA 3 GPU Architecture Deep Dive: The Ryzen Moment for GPUs

RDNA 3 and GPU Chiplets

Navi 31 consists of two core pieces, the Graphics Compute Die (GCD) and the Memory Cache Dies (MCDs).

The GCD houses all the Compute Units (CUs) along with other core functionality like video codec hardware, display interfaces, and the PCIe connection. The Navi 31 GCD has up to 96 CUs, which is where the typical graphics processing occurs. But it also has an Infinity Fabric along the top and bottom edges (linked via some sort of bus to the rest of the chip) that then connects to the MCDs.

The MCDs, as the name implies (Memory Cache Dies) primarily contain the large L3 cache blocks (Infinity Cache), plus the physical GDDR6 memory interface. They also need to contain Infinity Fabric links to connect to the GCD, which you can see in the die shot along the center facing edge of the MCDs.

The result is what AMD calls the high performance fanout interconnect. The image above doesn't quite explain things clearly, but the larger interface on the left is the organic substrate interconnect used on Zen CPUs. To the right is the high performance fanout bridge used on Navi 31, "approximately to scale."

You can clearly see the 25 wires used for the CPUs, while the 50 wires used on the GPU equivalent are packed into a much smaller area, so you can't even see the individual wires. It's about 1/8 the height and width for the same purpose, meaning about 1/64 the total area. That, in turn, dramatically cuts power requirements, and AMD says all of the Infinity Fanout links combined deliver 3.5 TB/s of effective bandwidth while only accounting for less than 5% of the total GPU power consumption.

RDNA 3 Architecture Upgrades

That takes care of the chiplet aspect of the design, so now let's go into the architecture changes to the various parts of the GPU. These can be broadly divided into four areas: general changes to the chip design, enhancements to the GPU shaders (Stream Processors), updates to improve ray tracing performance, and improvements to the matrix operation hardware.

Another point AMD makes is that it has improved silicon utilization by approximately 20%. In other words, there were functional units on RDNA 2 GPUs where parts of the chip were frequently sitting idle even when the card was under full load. Unfortunately, we don't have a good way to measure this directly, so we'll take AMD's word on this, but ultimately this should result in higher performance.

Compute Unit Enhancements

Outside of the chiplet stuff, many of the biggest changes occur within the Compute Units (CUs) and Workgroup Processors (WGPs). These include updates to the L0/L1/L2 cache sizes, more SIMD32 registers for FP32 and matrix workloads, and wider and faster interfaces between some elements.

AMD's Mike Mantor presented the above and the following slides, which are dense! He basically talked non-stop for the better part of an hour, trying to cover everything that's been done with the RDNA 3 architecture, and that wasn't nearly enough time. The above slide covers the big-picture overview, but let's step through some of the details.

RDNA 3 comes with an enhanced Compute Unit pair — the dual CUs that became the main building block for RDNA chips. A cursory look at the above might not look that different from RDNA 2, but then notice that the first block for the scheduler and Vector GPRs (general purpose registers) says "Float / INT / Matrix SIMD32" followed by a second block that says "Float / Matrix SIMD32." That second block is new for RDNA 3, and it basically means double the floating point throughput.

You can choose to look at things in one of two ways: Either each CU now has 128 Stream Processors (SPs, or GPU shaders), and you get 12,288 total shader ALUs (Arithmetic Logic Units), or you can view it as 64 "full" SPs that just happen to have double the FP32 throughput compared to the previous generation RDNA 2 CUs.

This is sort of funny because some places are saying that Navi 31 has 6,144 shaders, and others are saying 12,288 shaders, so I specifically asked AMD's Mike Mantor — the Chief GPU Architect and the main guy behind the RDNA 3 design — whether it was 6,144 or 12,288. He pulled out a calculator, punched in some numbers, and said, "Yeah, it should be 12,288."

Along with the extra 32-bit floating-point compute, AMD also doubled the matrix (AI) throughput as the AI Matrix Accelerators appear to at least partially share some of the execution resources. New to the AI units is BF16 (brain-float 16-bit) support, as well as INT4 WMMA Dot4 instructions (Wave Matrix Multiply Accumulate), and as with the FP32 throughput, there's an overall 2.7x increase in matrix operation speed.

That 2.7x appears to come from the overall 17.4% increase in clock-for-clock performance, plus 20% more CUs and double the SIM32 units per CU. (But don't quote me on that, as AMD didn't specifically break down all of the gains.)

Bigger and Faster Caches and Interconnects

The caches, and the interfaces between the caches and the rest of the system, have all received upgrades. For example, the L0 cache is now 32KB (double RDNA 2), and the L1 caches are 256KB (double RDNA 2 again), while the L2 cache increased to 6MB (1.5x larger than RDNA 2).
The link between the main processing units and the L1 cache is now 1.5x wider, with 6144 bytes per clock throughput. Likewise, the link between the L1 and L2 cache is also 1.5x wider (3072 bytes per clock).

The L3 cache, also called the Infinity Cache, did shrink relative to Navi 21. It's now 96MB vs. 128MB. However, the L3 to L2 link is now 2.25x wider (2304 bytes per clock), so the total throughput is much higher. In fact, AMD gives a figure of 5.3 TB/s — 2304 B/clk at a speed of 2.3 GHz. The RX 6950 XT only had a 1024 B/clk link to its Infinity Cache (maximum), and RDNA 3 delivers up to 2.7x the peak interface bandwidth.

Note that these figures are only for the fully configured Navi 31 solution in the 7900 XTX. The 7900 XT has five MCDs, dropping down to a 320-bit GDDR6 interface and 1920 B/clk links to the combined 80MB of Infinity Cache. We will likely see lower-tier RDNA 3 parts that further cut back on interface width and performance, naturally.

AMD 2nd Generation Ray Tracing

Ray tracing on the RDNA 2 architecture always felt like an afterthought — something tacked on to meet the required feature checklist for DirectX 12 Ultimate. AMD's RDNA 2 GPUs lack dedicated BVH traversal hardware, opting to do some of that work via other shared units, and that's at least partially to blame for their weak performance.

RDNA 2 Ray Accelerators could do up to four ray/box intersections per clock, or one ray/triangle intersection. By way of contrast, Intel's Arc Alchemist can do up to 12 ray/box intersections per RTU per clock, while Nvidia doesn't provide a specific number but has up to two ray/triangle intersections per RT core on Ampere and up to four ray/triangle intersections per clock on Ada Lovelace.

It's not clear if RDNA 3 actually improves those figures directly or if AMD has focused on other enhancements to reduce the number of ray/box intersections performed. Perhaps both. What we do know is that RDNA 3 will have improved BVH (Bounding Volume Hierarchy) traversal that will increase ray tracing performance.

RDNA 3 also has 1.5x larger VGPRs, which means 1.5x as many rays in flight. There are other stack optimizations to reduce the number of instructions needed for BVH traversal, and specialized box sorting algorithms (closest first, largest first, closest midpoint) can be used to extract improved efficiency.

Overall, thanks to the new features, higher frequency, and increased number of Ray Accelerators, AMD says RDNA 3 should deliver up to a 1.8x performance uplift for ray tracing compared to RDNA 2. That should narrow the gap between AMD and Nvidia Ampere. Still, Nvidia also seems to have doubled down on its ray tracing hardware for Ada Lovelace, so we wouldn't count on AMD delivering equivalent performance to RTX 40-series GPUs.

Other Architectural Improvements

https://www.tomshardware.com/news/amd-rdna-3-gpu-architecture-deep-dive-the-ryzen-moment-for-gpus

"RDNA3" Instruction Set Architecture
https://developer.amd.com/wp-content/resources/RDNA3_Shader_ISA_December2022.pdf

Dark Kaeser · 29 de Abril de 2022

Bom parecem confirmar-se o lançamento de apenas 3 gráficas dedicadas, que alguns rumores já tinham avançado há umas semanas.

https://twitter.com/i/web/status/1519198751369310208

este tweet complementa o que ele já havia postado há uns tempos.

Entretantos a "confirmação" via patch da AMD para o compilador LLVM a adicionar 4 PCI ID da geração gfx11 (que é a designação interna que a equipa dos drivers open source se refere à RDNA3 - as gráficas actuais são gfx10)

Screenshot-2022-04-29-at-01-49-01-D124537-AMDGPU-clang-Definition-of-gfx11-subtarget.png

https://reviews.llvm.org/D124537

como sempre alguém se deu ao trabalho de andar a ver as linhas de código

https://twitter.com/i/web/status/1519722530826866690

a gfx1103 não é um dGPU, é um APU.

Roberto1973 · 6 de Maio de 2022

https://twitter.com/i/web/status/1521052789996400640

Dark Kaeser · 6 de Maio de 2022

Pois, contradiz o que até agora se assumia como sendo o produto final:
- Navi31 - 60 Wgp = 15360 SP
https://www.techpowerup.com/gpu-specs/amd-navi-31.g998
- Navi32 - 40 Wgp = 10240 SP
https://www.techpowerup.com/gpu-specs/amd-navi-32.g1000

Nos links da TPU a informação foi alterada entretanto, neste momento a informação é contraditória, pois nas características refere os números coloquei, enquanto a informação sobre os produtos (no fundo) já refletem os rumores mais recentes, do post anterior.
A Navi33 continua igual.

Outro dos rumores, relacionado com o ponto anterior é que a "capacidade de cálculo teórico" voltou aos ~74TFlops, que tinham sido avancados pelo próprio Greymon55 ainda o ano passado.

E com este acerto do "core count" isso implica ~3GHz....

Miguel_Pereira · 6 de Maio de 2022

Mesmo sendo 74TFLOPS, é um belo salto. A 6900XT tem 23TFLOPS. É o triplo.

Eu sei que não dá para comparar de forma tão directa assim, mas é um salto enorme. Claro que se a Nvidia vem mesmo com 100TFLOPS como se fala, está ganh a guerra do produto halo na outra cor.

Dark Kaeser · 6 de Maio de 2022

Vale o que vale, mas neste momento há rumores para todos os gostos, e não era a estreia da AMD em libertar informação a determinadas pessoas que contradizem a libertada para outras, que não coincidindo permitem assim "encontrar" a fonte.

Daí apenas ter aberto o tópico com aquela informação, apesar de já ter visto e "registado" alguns dos rumores que por ai circulam.

Não tenho a certeza sobre quem publicou os acertos no "core count", pois este user também publicou os mesmos números

https://twitter.com/i/web/status/1521092228575281152

Mas uns dias antes, aquando dos patches open source, também encontrou isto nas drivers

https://twitter.com/i/web/status/1520256804298629124

E pode estar aqui uma explicação para os números de cálculo FP32...

machu · 7 de Maio de 2022

Miguel_Pereira disse:
Mesmo sendo 74TFLOPS, é um belo salto. A 6900XT tem 23TFLOPS. É o triplo.

Eu sei que não dá para comparar de forma tão directa assim, mas é um salto enorme. Claro que se a Nvidia vem mesmo com 100TFLOPS como se fala, está ganh a guerra do produto halo na outra cor.

a 3080 com 30 TFLOPS vs a 6800XT com 21 TFLOPS e a 6800XT bate a 3080 em perfomance
https://versus.com/br/amd-radeon-rx-6800-xt-vs-nvidia-geforce-rtx-3080

a quem aponte que os 74 TFLOPS da AMD podem ser suficientes para bater a nvidia 4090 com 100 TFLOPS

muddymind · 7 de Maio de 2022

A 3080 é equiparável à 6800xt em sub 4k e em 4k é claramente superior:

Seja como for TFLOPs não são de todo comparáveis entre arquitecturas (nem entre rdna1 vs rdna2 vs rdna3) portanto isso não passa de um exercício totalmente académico sem grande interesse

Acho que nesta próxima geração se a AMD finalmente decidir que novas tecnologias são importantes (RT) então pela primeira vez o factor TDP vai ser efectivamente relevante na escolha dado os patamares absurdos que andam a ser falados nos rumores. Eu já estou nos limites que considero aceitáveis com a minha rtx3090 e recuso-me ir acima do que já estou

machu · 7 de Maio de 2022

muddymind disse:
Seja como for TFLOPs não são de todo comparáveis entre arquitecturas (nem entre rdna1 vs rdna2 vs rdna3) portanto isso não passa de um exercício totalmente académico sem grande interesse

verdade e reforças o que disse...o que quis salientar foi que os teraflops nao é o único factor...a 6800xt com -10 teraflops bate-se com a 3080...são 33% menos teraflops que 3080...33% não é brincadeira e ainda assim bate-se.

suponhamos que o mesmo ratio de 33% se verificava na nova geração (e nada nos diz que não pode ser mais)...então bastaria que a 7900 tivesse 66 teraflops para acompanhar a nvidia com 100 teraflops.
Tendo sido indicado 74 teraflops para a 7900 então de facto esta podia superar a 4090 em perfomance.

Ansatsu · 16 de Maio de 2022

O Kopite disse hoje que está decepcionado com o RDNA3, mas não avançou mais nada além disso.

https://twitter.com/i/web/status/1526135149246976001

Dark Kaeser · 16 de Maio de 2022

Estamos na fase da Silly Season, entre as entradas e saídas dos 3 grandes do futebol e os rumores de hardware venha o

https://twitter.com/i/web/status/1526149291659735040

Ansatsu · 16 de Maio de 2022

Dark Kaeser disse:
Estamos na fase da Silly Season, entre as entradas e saídas dos 3 grandes do futebol e os rumores de hardware venha o

O disapointed pode estar relacionado com muita coisa. Pode por exemplo estar à espera que a Amd apanhasse a Nvidia em Ray Tracing e já ter reparado que não é desta, ou então com outra coisa qualquer.
Não quer dizer necessariamente que esteja relacionado diretamente com o poder de rasterização do RDNA3.

Mas a ideia que tenho é que o Kopite é dos poucos leakers que é realmente um Insider ou tem contacto com Insiders. A maioria dos outros parecem-me mais do tipo de apanhar umas coisas aqui e ali e depois metem umas pitadas de suposição em cima e está feito, mas de Insiders não têm nada.

strafejump · 16 de Maio de 2022

acho que ninguém está à espera de ver a amd apanhar a nvidia em RT. desde que começou com os chiplets que o objectivo da amd é optimizar a utilização das wafers. poupar nos transistores ao máximo. com o RT, há-de ser algo semelhante ao dlss vs fsr2, ser suficientemente bom sem ter de recorrer a hardware exclusivamente dedicado para a tarefa.

Ansatsu · 16 de Maio de 2022

strafejump disse:
com o RT, há-de ser algo semelhante ao dlss vs fsr2, ser suficientemente bom sem ter de recorrer a hardware exclusivamente dedicado para a tarefa.

Com o RT não há muita volta a dar e podes usar os shaders normais para o calculo RT mas é muito mais lento.

Mesmo com bastante potencia recorrendo a hardware dedicado, o impacto na performance é bastante elevado.

Roberto1973 · 18 de Maio de 2022

https://twitter.com/i/web/status/1523724732553641985

AMD Navi 31 GPU now rumored to feature 384-bit memory bus, 24GB GDDR6 memory

https://videocardz.com/newz/amd-nav...-feature-384-bit-memory-bus-24gb-gddr6-memory

AMD RDNA 3 ‘GFX11’ GPUs Get First Mesa Support, RADV Graphics Tech Adds Task Shaders

AMD adds significant open-source Radeon graphics driver code into Mesa 22.2 for Q3 2022, Enables Initial RDNA 3 'GFX11' GPU Support

igordafonseca · 19 de Maio de 2022

No lançamento suponho que serão as versões high end as primeira a sair não?
Há algum roadmap dos lançamentos?

Dark Kaeser · 19 de Maio de 2022

É esperar uns dias...

AMD Advancing the High-Performance Computing Experience

Monday, May 23, 2022 2:00 PM (GMT +8)

Join us for the digital AMD CEO keynote at Computex 2022 as Dr. Lisa Su shares the AMD vision to advance the PC experience through next generation mobile and desktop PC innovations. Combining cutting-edge CPUs, GPUs and software, AMD and its ecosystem partners will show breakthrough performance and leadership experiences for gamers, enthusiasts and creators.

https://www.amd.com/en/events/computex

...e ver o que vai sair daqui.

BigBot · 22 de Maio de 2022

igordafonseca disse:
No lançamento suponho que serão as versões high end as primeira a sair não?
Há algum roadmap dos lançamentos?

Navi 33 (7700 XT) primeiro, já em Q4, logo a seguir ao lançamento das Lovelace.

Roberto1973 · 24 de Maio de 2022

Sempre pensei que falassem da rdna3 no AMD keynote 2022.

nullus · 24 de Maio de 2022

Roberto1973 disse:
Sempre pensei que falassem da rdna3 no AMD keynote 2022.

Demasiado cedo e como a nvidia deve lançar alguns meses primeiro, não lhes querem dar informações do que vão lançar. Até porque depois podem modificar clocks, tdps e segmentação.

Falaram daquilo que lhes interessava mais na perspectiva de empresa fabricante de silicio. Laptops.

Gráfica AMD RDNA 3 (Radeon 7000 series)

Colaborador

​

RDNA 3 and GPU Chiplets​

RDNA 3 Architecture Upgrades​

Compute Unit Enhancements​

Bigger and Faster Caches and Interconnects​

AMD 2nd Generation Ray Tracing​

Other Architectural Improvements​

Colaborador

Power Member

Colaborador

Power Member

Colaborador

Power Member

1st Folding then Sex

Power Member

Power Member

Colaborador

Power Member

Power Member

Power Member

Power Member

Power Member

Colaborador

AMD Advancing the High-Performance Computing Experience​

Monday, May 23, 2022 2:00 PM (GMT +8)​

Power Member

Power Member

Power Member

RDNA 3 and GPU Chiplets

RDNA 3 Architecture Upgrades

Compute Unit Enhancements

Bigger and Faster Caches and Interconnects

AMD 2nd Generation Ray Tracing

Other Architectural Improvements

AMD Advancing the High-Performance Computing Experience

Monday, May 23, 2022 2:00 PM (GMT +8)