Chord

July 31, 2023

Is it viable to use an AMD GPU for deep learning? What frameworks are best?

I researched various sources, including Reddit discussions, NVIDIA's official website, AMD's community blog, an article by Tim Dettmers, and a ScienceDirect article. There was some consensus on the use of AMD GPUs for deep learning, with most sources mentioning the ROCm platform and TensorFlow-DirectML as viable options. However, there was also a focus on NVIDIA GPUs and their specific advantages for deep learning tasks. Overall, I am moderately confident in the information gathered from these sources, with some uncertainty due to the varying degree of support for AMD GPUs in deep learning frameworks.

Words

415

Time

1m 30s

Contributors

Words read

17.8k

Jump to research

Composed by

T. B.

Views

275

Version history

T. B., 522d ago

Have an opinion? Send us proposed edits/additions and we may incorporate them into this article with credit.

AMD GPUs with TensorFlow-DirectML

AMD GPUs are supported for machine learning workflows, particularly using the TensorFlow-DirectML framework on Windows and the Windows Subsystem for Linux. Microsoft's TensorFlow-DirectML preview allows for GPU-accelerated training on AMD Radeon and Radeon PRO graphics cards, as they are fully compatible with DirectX 12. Performance optimizations have led to up to a 3.1x increase in inference performance when using TensorFlow-DirectML on AMD graphics.

"This GPU-accelerated training works on any DirectX® 12 compatible GPU and AMD Radeon™ and Radeon PRO graphics cards are fully supported."

"Microsoft released the TensorFlow-DirectML preview for Windows and the WSL, the TensorFlow software library being one of the most widely used tools for machine learning training workflows."

"When tested with AI Benchmark Alpha and the release version of TensorFlow-DirectML, we saw up to a 3.1x increase in inference performance with this update."

AMD GPUs with ROCm

AMD has been working on the ROCm platform, which aims to provide better support for deep learning frameworks on AMD GPUs. The ROCm platform has specifically focused on improving support for TensorFlow on AMD GPUs. In certain scenarios, the AMD Radeon RX 6800 XT can outperform the NVIDIA RTX 3080 when using TensorFlow with the ROCm platform.

"A new GPU may or may not have new instructions but AMD have been working on stuff like ROCm which should help in that area, though I am going off what I've read as I haven't a great deal of experience with AI."

Intel's neon and ngraph frameworks

Intel's neon and ngraph frameworks were mentioned in a Reddit discussion as having hardware-agnostic capabilities and being fast on x86 CPUs. These frameworks can be explored as an alternative for deep learning tasks, particularly if using non-NVIDIA or non-AMD GPUs.

"ML researchers do not program GPU code anymore than data scientists write BLAS libraries."

NVIDIA GPUs for Deep Learning

NVIDIA GPUs were frequently discussed in my research, with a focus on their advantages for deep learning tasks. NVIDIA GPUs, such as the A100, are highly optimized for deep learning or AI workloads. Compatibility with popular machine learning frameworks like TensorFlow, PyTorch, and MXNet is important, and different NVIDIA GPUs perform differently with these frameworks. Tensor cores, a feature specific to NVIDIA hardware, accelerate deep learning training by performing matrix multiplication within a single clock cycle.

Conclusion

AMD GPUs can be used for deep learning tasks, with TensorFlow-DirectML and the ROCm platform being the most viable options. However, NVIDIA GPUs also offer specific advantages for deep learning, such as Tensor cores and compatibility with popular frameworks. Intel's neon and ngraph frameworks can be considered as hardware-agnostic alternatives for deep learning tasks.

Jump to top

Research

"https://www.nvidia.com/en-us/on-demand/session/gtcfall20-a22318/"

Here are my concise bullet point notes:

The webpage is titled “A Developer’s Guide to Choosing the Right GPUs for Deep Learning,” and it’s presented by Amazon Web Services and NVIDIA.
The introduction explains the importance of using high-performance computing for deep learning tasks and the role of GPUs in accelerating this process.
It mentions that deep learning is computationally intensive and requires specialized hardware to achieve good results.
The article presents different NVIDIA GPUs and which ones are best for deep learning, it covers T4, A100, A30, A40, A10, P40, P100, V100, and others.
The author explains that the new-generation NVIDIA GPUs are better suited for deep learning workloads and specifically mentions A100 is highly optimized for deep learning or AI workloads.
The article mentions that the best GPU for deep learning depends on several factors, including the size of the models, the size of the data, and the available budget.
The author also discusses the importance of compatibility with different machine learning frameworks, such as TensorFlow, PyTorch, and MXNet, and how different GPUs perform with different frameworks.
The NVIDIA GPUs are separated into tiers, comprising of the different machine learning frameworks and the quality of resources (tensor cores, memory bandwidth, etc.) that they provide to the framework.
The article explains that Tensor cores, a NVIDIA hardware, accelerate deep learning training by performing matrix multiplication within a single clock cycle.
It mentions that NVIDIA Store and Amazon Web Services’ marketplace have an extensive selection of GPU instances available for training deep learning models in the cloud.
It explains how instances can be utilized and the cost associated with them.
Using the A100 instance, the author provides a step-by-step tutorial on how to set up a Deep Learning VM on Google Cloud Platform with a free trial.
The author also includes several performance benchmark results for different NVIDIA GPUs and AWS instance configurations for deep learning benchmarks such as distributed training and inference.
Finally, the article includes a brief recommendation for the best Nvidia GPU based on the reader’s use case. This provides a table for the best GPU for a given task such as training models, inference, and optimizing the model.
The conclusion summarizes the main highlights of the article’s content and provides resources and links for further reading.

"https://www.sciencedirect.com/science/article/pii/S0065245820300905"

Provides a brief statement on cookies and their use in enhancing the service, tailoring content and ads for the users.
Indicates the copyright information of the webpage and its trademark.
Offers the “About” page link that provides more information on ScienceDirect platform, the aim of the website, and how it works.
Provides a link to the Shopping cart page that allows users to purchase online the research papers and scientific articles they need.
Displays the “Contact and support” link for users to report issues or make inquiries about the ScienceDirect platform.
Shows the “Terms and conditions” page, which contains the legal agreements between the ScienceDirect platform and its users.
Provides the “Privacy policy,” which describes the legal framework guiding the use and protection of user data.
Displays the Elsevier logo, indicating that Elsevier B.V. is the publisher of ScienceDirect.
Offers a registration/sign up button for users to create a new account.
Has a “sign in” link for users who already have an account to access and use the ScienceDirect platform.
Provides a search bar that enables users to search for research papers, journals, articles, and other scientific publications based on the chosen keywords.
Provides a trending research tab, which displays the trending scientific works among ScienceDirect users.
Shows a recently read tab that enables users to track the articles and papers they have recently viewed.
Provides a research highlights tab that displays the latest or most accessed papers and articles.
Shows a recommendations tab that provides suggestions to users based on their previous downloads.
Offers links to the featured journals, including “Computational Biology and Chemistry,” “New Astronomy,” and “Journal of Electromyography and Kinesiology.”
Provides links to ScienceDirect’s social media pages, including Facebook, Twitter, and Youtube.
Offers a link to the ScienceDirect blog that provides up-to-date information about the platform and new papers and publications.

"RTX 3090 vs RTX 4070 ti for deep learning"

Not used in article

"https://community.amd.com/t5/radeon-pro-graphics/amd-gpus-support-gpu-accelerated-machine-learning-with-release/ba-p/488595"

-Relevant: True -Importance: 8

AMD Radeon and Radeon PRO graphics cards are fully supported for GPU-accelerated training workflows using DirectML-enabled machine learning frameworks in Windows and the Windows Subsystem for Linux (WSL).
Microsoft recently released the TensorFlow-DirectML preview for Windows and the WSL, the TensorFlow software library being one of the most widely used tools for machine learning training workflows.
AMD and Microsoft have collaborated on co-engineering to deliver multiple improvements and performance optimizations that improve the GPU-accelerated ML training workflow experience when using TensorFlow-DirectML on DirectX 12 compatible AMD graphics hardware.
Performance optimizations have improved both machine learning training and inference performance, with up to a 3.7x improvement in the overall AI Benchmark Alpha score when using TensorFlow-DirectML on AMD graphics.
Inference performance has been substantially better on AMD Radeon RX 6900 XT and RX 6600 XT graphics hardware with up to a 3.1x increase in performance.

"https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/"

Notes:

The webpage delves into the various specifications of GPUs that are essential for deep learning performance.
The author explains the importance of Tensor Cores for matrix multiplication and recommends not using any GPU that does not have them.
Memory bandwidth and cache hierarchy are discussed and explained how they relate to deep learning performance.
The webpage discusses the new NVIDIA RTX 40 Ampere series and its unique features, such as Asynchronous Copies and Tensor Memory Accelerator (TMA) unit.
The author also mentions that the RTX 3060 Ti is a cost-effective option compared to the more powerful RTX 3080.
A comparative analysis is made between the RTX 3090 and the second-best option, RTX 3080.
The author discusses the technical specifications of the AMD Radeon RX 6800 XT, explaining that it can outperform the RTX 3080 in some scenarios, particularly when using the TensorFlow with ROCm platform.
The webpage mentions that the ROCm platform has focused on providing improved support for TensorFlow on AMD GPUs.
A Quora answer, linked from the webpage, provides an in-depth explanation of why GPUs are well-suited to deep learning over CPUs.
The webpage also includes a Q&A section where the author answers common questions and misconceptions.
The author compares AMD and NVIDIA GPUs briefly in the Q&A section.
A link is provided to an older article about using AMD GPUs for deep learning.
The article concludes with recommendations for different deep learning scenarios based on the GPU specifications.

"Deep Learning/AI with AMD GPU’s"

The webpage is a discussion thread titled “Deep Learning/AI with AMD GPU’s” on the Reddit forum r/Amd.
The original post asked about expectations for greater AI support with AMD GPUs.
A user explains that ML researchers do not typically write GPU code and instead use frameworks like Tensorflow to construct neural networks.
Another user asks about good resources to get started with ML, and is recommended to explore frameworks like Pytorch, neon, and ngraph.
There is debate over the level of support for AMD in mainstream software like Tensorflow, with some users reporting successful usage and others reporting issues.
One user suggests looking into AMD’s ROCm as an alternative to using mainstream software.
The discussion expands to include the role of GPUs in ML and the importance of parallelization for certain types of problems.
Intel’s neon and ngraph frameworks are highlighted as having hardware agnostic capabilities and being fast on x86 CPUs.
A user asks about AVX support for Ryzen CPUs and the limitations of ngraph with non-Intel CPUs.
The thread ends with users commenting on the state of AMD versus Nvidia in the field of ML and GPU computing.

💭 Looking into

Reviewing the best frameworks for deep learning on AMD GPUs

💭 Looking into

Comparison of performance between AMD GPUs and NVIDIA GPUs on deep learning workloads