February 25, 2023

Affordable GPU's for NLP and training large language models

Words

929

Time

1m 56s

Contributors

120

Words read

55.3k


Jump to research

Composed by

Profile picture

K. B.

Views

249

Version history

K. B., 586d ago

Have an opinion? Send us proposed edits/additions and we may incorporate them into this article with credit.

NVIDIA DGX SuperPOD

NVIDIA DGX SuperPOD

It seems that NVIDIA DGX SuperPOD is a production-ready, enterprise-grade solution to simplify the development and deployment of large language models [3] . This system automates complexity of LLM training with data processing libraries that ingest, curate, organize and clean data and uses technologies for data, tensor and pipeline parallelization to enable training of large language models to be distributed efficiently across thousands of GPUs [3] . Furthermore, it provides LLM inference workloads to scale across multiple GPUs and nodes with real-time performance [3] . Furthermore, organizations can join the early access program for the NVIDIA NeMo Megatron accelerated framework for training large language models [3] . In terms of cost effectiveness compared to other solutions, Google Cloud Platform preemptible instance can be ~80% cheaper than a regular instance [7] , while deploying bare-metal can cost around $65/month [7] . AWS Spot Pricing also offers savings at around 2.1 cents per hour or 15.12 a month [7] , while GPU.land offers Tesla v100s at $0.99/hr - 1/3rd the price of GCP/AWS [7] . To optimize costs further, customers can benefit from automatic savings based on monthly usage along with committed use discounts on Compute Engine resources like machine types or GPUs which offer up to 57% savings on workloads [9] , as well as free cost management tools such as budgets and quota limits.
NVIDIA Triton Inference Server

NVIDIA Triton Inference Server

NVIDIA Triton Inference Server seems to be a suitable option for NLP and training large language models. It is included in the NVIDIA AI Enterprise software suite, which is optimized, certified and supported by NVIDIA [3] and is available from the NVIDIA NGC catalog, a hub for GPU-optimized AI software [3] . Furthermore, it provides LLM inference workloads to scale across multiple GPUs and nodes with real-time performance [3] . Google Cloud Platform also offers an estimated cost of around $17/month for some what 16 node hour machine (prediction) [9] , which can be further discounted with committed use discounts on Compute Engine resources like machine types or GPUs [9] .
NVIDIA LaunchPad

NVIDIA LaunchPad

NVIDIA LaunchPad seems to be a great option for those looking for an affordable GPU solution for NLP and training large language models. It provides a production-ready, enterprise-grade solution to simplify the development and deployment of large language models [3] , and enterprises can experience developing and deploying large language models at no charge in curated labs [3] . Additionally, the price of using Google Cloud Platform is quite competitive - new customers get $300 in free credits to run, test, and deploy workloads [9] , and it saves money over other providers through automatic savings based on monthly usage [9] .
NVIDIA NeMo Megatron and Megatron 530B

NVIDIA NeMo Megatron and Megatron 530B

It seems NVIDIA NeMo Megatron and Megatron 530B are options for those looking to train large language models for NLP. It is included in the NVIDIA AI Enterprise software suite, which is optimized, certified and supported by NVIDIA [3] , and users can apply to join the early access program for the accelerated framework [4] . Additionally, pricing for it can be found in the NVIDIA NGC catalog [9] and it is available from GPU.land at $0.99/hr [7] . Lastly, organizations can save up to 57% on workloads with committed use discounts on Compute Engine resources like machine types or GPUs [9] .
AWS spot gpu instances

AWS spot gpu instances

It seems that AWS spot gpu instances might be a good option for training large language models. They can be up to 80% cheaper than regular instances and come with secured HTTPS API endpoints [2] . Additionally, they have a non-interruptible instance feature which allows them to run 24/7 [6] , while the cost only scales with usage [9] . Furthermore, they are part of the pay-as-you-go pricing structure which charges customers only for the services they use [9] .
All answers

All answers

  • NVIDIA DGX SuperPOD
  • NVIDIA Triton Inference Server
  • NVIDIA LaunchPad
  • NVIDIA NeMo Megatron and Megatron 530B
  • AWS spot gpu instances
  • Pre-emptible TPU on GCP
  • Google Colab
  • Exafunction NLP
  • AWS GPU instances
  • GPU.land
  • TensorDock
  • Jarvislabs.ai
  • AWS Lambda
  • Kubernetes Cluster
  • AMD Radeon GPU
  • GPU

    Renting time on Amazon EC2 or GCP from Google [1] [4] is a great way to access powerful GPUs without a large upfront cost. Building a PC with a decently chunky NVIDIA GPU [1] could also be more cost-effective than buying an eGPU box.

    NLP

    Exafunction NLP [2] is 6x cheaper than OpenAI per token and supports custom and fine-tuned models. AWS GPU instances [2] are expensive, but needed for low latency responses.

    training large language models

    Cloud Computing Options

    Cheapest cloud GPU services include AWS spot gpu instances [6] , Paperspace [6] , Vast.ai [6] , Genesis cloud [6] , GCP AI Platform [6] , GPU.land [6] , Google Colab [6] , DataCrunch.io [6] , TensorDock [6] and Jarvislabs.ai [6] . Preemptible instances on GCP like Cloud TPU v4 Pods can save up to 57% on workloads compared to regular instances [9] [10] .

    Building a PC

    Buying a GPU can be cost effective over renting in the cloud if you factor in electricity costs and building a PC. [7] AMD Radeon GPUs can be used with ROCM but have a tiny userbase. [7] A 16 GB VM on GCP costs $65/month. [8]

    Jump to top

    Research

    Source: "Pricing | Cloud TPU | Google Cloud" (from web, cloud.google.com)

    • Cloud TPU v4 Pods
      • Available in us-central2-b
      • Uses the same v4 pricing system
      • Charges accrue while TPU node is in a READY state
      • Price per chip-hour is $3.22 (on demand), $2.03 (1Y CUD reservation), $1.45 (3Y CUD reservation), and $0.97 (preemptible)
    • Cloud TPU v2 & v3
      • Single device TPU types are billed in one-second increments and available at either an on-demand or preemptible price
      • TPU Pod types provide access to multiple TPU devices that are connected on a dedicated high-speed network
      • To use TPU Pod types, you must request quota using evaluation quota, 1 year or 3 year commitment
      • 1 TPU VM has 4 chips and 8 cores, billed in VM-hours
      • Free access via TRC (TPU Research Cloud)
    • Optimize your cost
      • Cloud TPU v4 provides up to 35% savings on Transformer-based models and up to 50% on ResNet compared to A100 on Azure
    • Estimate your cost
      • Estimate the cost of using Cloud TPU with the Compute Engine pricing calculator
    • Take the next step
      • Use the Cloud TPU sign up form to purchase quota and/or learn more about Cloud TPU
      • Check the regions and zones in which Cloud TPU is available
      • Read the Cloud TPU documentation
      • Get started with Cloud TPU

    Source: "[P] Deploying ML models on a budget" (from reddit, r/MachineLearning)

    • Google Cloud Platform preemptible instance
      • ~80% cheaper than a regular instance
      • secured HTTPS API endpoint
      • autostarts when shuts down (at least once every 24 hours) with only a few minutes of downtime
    • AWS Lambda
      • Low memory (max 2-4GB)
      • quick timeouts
      • costs scale with time
    • Kubernetes Cluster
      • Managed cluster costs >$100
      • cost for the resources used
    • Deploying bare-metal
      • 16 GB VM on Google Cloud Platform costs $65/month
    • GPU.land
      • Tesla v100 for $0.99/hr - 1/3rd the price of GCP/AWS
      • non-interruptible instance
    • AWS Spot Pricing
      • R5 large instance type is 2 cores and 16 GB and goes for 2.1 cents per hour or 15.12 a month and that’s running 24x7
    • Heroku
      • CPU instance is free

    Source: "Pricing Overview | Google Cloud" (from web, cloud.google.com)

    • Start running workloads for free
      • Create an account to evaluate how Google Cloud products perform in real-world scenarios.
      • New customers get $300 in free credits to run, test, and deploy workloads.
      • All customers can use 20+ products for free, up to monthly usage limits.
    • Only pay for what you use
      • With Google Cloud’s pay-as-you-go pricing structure, you only pay for the services you use.
      • No up-front fees. No termination charges.
      • Pricing varies by product and usage—view detailed price list.
    • Save up to 57% on workloads
      • Google Cloud saves you money over other providers through automatic savings based on monthly usage.
      • Pre-pay for resources at discounted rates—save up to 57% with committed use discounts on Compute Engine resources like machine types or GPUs.
    • Stay in control of your spending
      • Control spending with budgets, alerts, quota limits, and other free cost management tools.
      • Optimize costs with actionable, AI-powered intelligent recommendations and custom dashboards that display cost trends and forecasts.
      • 24/7 billing support available.
    • Estimate your costs
      • Understand how your costs can fluctuate based on location, workloads, and other variables with the pricing calculator.
      • Get a custom quote by connecting with a sales representative.
    • Work with a trusted partner
      • Find a partner to help find the best solution.
    • Start using Google Cloud
      • Try it free or go to the console.
    • Continue browsing
      • See all products available.

    Source: "Is it better to use cloud computing or to buy m..." (from reddit, r/MLQuestions)

    • Amazon EC2 P3 Instances
      • Costs around \~$0.42 per hour with really good hardware
      • Can use Google Colab for learning ML and research, but not for training huge models with high expectations
    • Buying a GPU
      • Second-hand last-generation GPU can cost around $100-200
      • GTX 1080ti can cost around $500 (though prices have jumped up a lot)
      • Need to factor in cost of electricity and building a PC
    • AMD Radeon GPU
      • Can use ROCM (their version of CUDA) for Tensorflow, but it has a tiny userbase so may take more time hunting down obscure bugs and is not recommended for now

    Source: "Cheapest Cloud GPU service" (from reddit, r/learnmachinelearning)

    • AWS spot gpu instances
      • Cost 30% of their list price, and rarely get interrupted
      • Can be used through Sagemaker
    • Paperspace
      • Gives users free notebook
    • Vast.ai
      • Distributed cloud computing market where individuals rent out their GPU’s and set their own price
      • Far cheaper
    • Genesis cloud
      • Has pretty competitive pricing
    • Google Cloud Platform (GCP) AI Platform
      • Costs around $17/month for some what 16 node hour machine (prediction)
      • Rest API also works like charm
      • Storage costs around $25 for 1 TiB/month
    • GPU.land
      • Tesla V100s at $0.99/hr
      • Instances boot in 2 mins and can be pre-configured for Deep Learning
    • Google Colab
      • Free for 12 hours per session
    • DataCrunch.io
      • Very cost efficient options
    • TensorDock
      • Cloud GPUs are among the most affordable on the market
      • V100s from $0.57/hr
      • Instances boot in as little as 45 seconds and can be pre-configured to run ML training workloads on Jupyter Notebook/Lab
    • Jarvislabs.ai
      • Starts at 0.19$/hr for RTX 5000, and 0.99$/hr for an A100
      • Takes less than a few minutes to get started

    💭  Looking into

    What are the advantages and disadvantages of using NVIDIA DGX SuperPOD for training large language models?

    💭  Looking into

    What are the most cost-effective methods for training large language models?

    💭  Looking into

    What are the available instance types and their associated costs?

    💭  Looking into

    What are the size and speed limitations of Google Colab?

    💭  Looking into

    What are the recommended components for building a gaming PC for NLP and training language models?

    💭  Looking into

    What are the available instance types and their associated costs?

    💭  Looking into

    What are the available instance types and their associated costs?

    Source: "Efficient Large-Scale Language Model Training o..." (from web, arxiv.org)

    None

    Source: "NVIDIA Brings Large Language AI Models to Enter..." (from web, nvidianews.nvidia.com)

    • NVIDIA DGX SuperPOD
      • Provides a production-ready, enterprise-grade solution to simplify the development and deployment of large language models.
      • Enables enterprises to overcome the challenges of training sophisticated natural language processing models.
      • Automates the complexity of LLM training with data processing libraries that ingest, curate, organize and clean data.
      • Uses advanced technologies for data, tensor and pipeline parallelization, enabling the training of large language models to be distributed efficiently across thousands of GPUs.
      • Can run on two NVIDIA DGX systems to shorten the processing time from over a minute on a CPU server to half a second, making it possible to deploy LLMs for real-time applications.
    • NVIDIA Triton Inference Server
      • Provides LLM inference workloads to scale across multiple GPUs and nodes with real-time performance.
      • Included in the NVIDIA AI Enterprise software suite, which is optimized, certified and supported by NVIDIA.
      • Available from the NVIDIA NGC catalog, a hub for GPU-optimized AI software that includes frameworks, toolkits, pretrained models and Jupyter Notebooks, and as open source code from the Triton GitHub repository.
    • NVIDIA LaunchPad
      • Enterprises can experience developing and deploying large language models at no charge in curated labs.
    • NVIDIA NeMo Megatron and Megatron 530B
      • Enables enterprises to build their own domain-specific chatbots, personal assistants and other AI applications that understand language with unprecedented levels of subtlety and nuance.
      • Megatron 530B is the world’s largest customizable language model.
      • Organizations can apply to join the early access program for the NVIDIA NeMo Megatron accelerated framework for training large language models.

    Source: "Recommendations: External GPU for NLP code?" (from reddit, r/LanguageTechnology)

    • Renting time on Amazon EC2 or GCP from Google
      • Might be cheaper and easier to set up
      • Flexibility of cloud instances is great
      • Can learn important cloud skills
      • Price to rent mid-level/SOTA hardware will probably stay steady
    • Building a PC with a decently chunky NVIDIA GPU
      • Could be more cost-effective than buying an eGPU box
      • Prioritize a GPU with a large amount of RAM over pure compute power
      • For best system stability, upgrade to Ubuntu 19.04 or 19.10
    • Using Google Colab
      • Might be too limiting for some use cases
      • Can run BERT base uncased with 256 input length and 16 batch size
    • Pre-emptible TPU on GCP
      • Costs about $1.5 / hour
      • Can use the $300 credit they give every new user

    Source: "[2104.04473] Efficient Large-Scale Language Mod..." (from web, arxiv.org)

    • Tensor parallelism
      • Allows for training of models with trillions of parameters
      • Can scale to thousands of GPUs
      • Can achieve a per-GPU throughput of 52% of theoretical peak
    • Pipeline parallelism
      • Can improve throughput by 10+% with memory footprint comparable to existing approaches
      • Can be composed with tensor and data parallelism to scale to thousands of GPUs and models with trillions of parameters
    • Data parallelism
      • Can be composed with tensor and pipeline parallelism to scale to thousands of GPUs and models with trillions of parameters

    Source: "[P] Exafunction NLP – Run models like GPT-J and..." (from reddit, r/MachineLearning)

    • Exafunction NLP
      • 6x cheaper than OpenAI per token
      • HTTPS API that can be tried out for free
      • Supports custom and fine-tuned models
      • 100 - 1000x cheaper than Hugging Face for smaller models like BERT
    • AWS GPU instances
      • Expensive
      • Needed for low latency responses

    💭  Looking into

    What is the best affordable GPU for NLP and training large language models?