February 25, 2023
Affordable GPU's for NLP and training large language models
Words
929
Time
1m 56s
Contributors
120
Words read
55.3k
Have an opinion? Send us proposed edits/additions and we may incorporate them into this article with credit.
NVIDIA DGX SuperPOD
NVIDIA Triton Inference Server
NVIDIA LaunchPad
NVIDIA NeMo Megatron and Megatron 530B
AWS spot gpu instances
All answers
training large language models
Cloud Computing Options
Jump to top
Research
Source: "Pricing | Cloud TPU | Google Cloud" (from web, cloud.google.com)
-
Cloud TPU v4 Pods
- Available in us-central2-b
- Uses the same v4 pricing system
- Charges accrue while TPU node is in a READY state
- Price per chip-hour is $3.22 (on demand), $2.03 (1Y CUD reservation), $1.45 (3Y CUD reservation), and $0.97 (preemptible)
-
Cloud TPU v2 & v3
- Single device TPU types are billed in one-second increments and available at either an on-demand or preemptible price
- TPU Pod types provide access to multiple TPU devices that are connected on a dedicated high-speed network
- To use TPU Pod types, you must request quota using evaluation quota, 1 year or 3 year commitment
- 1 TPU VM has 4 chips and 8 cores, billed in VM-hours
- Free access via TRC (TPU Research Cloud)
-
Optimize your cost
- Cloud TPU v4 provides up to 35% savings on Transformer-based models and up to 50% on ResNet compared to A100 on Azure
-
Estimate your cost
- Estimate the cost of using Cloud TPU with the Compute Engine pricing calculator
-
Take the next step
- Use the Cloud TPU sign up form to purchase quota and/or learn more about Cloud TPU
- Check the regions and zones in which Cloud TPU is available
- Read the Cloud TPU documentation
- Get started with Cloud TPU
Source: "[P] Deploying ML models on a budget" (from reddit, r/MachineLearning)
-
Google Cloud Platform preemptible instance
- ~80% cheaper than a regular instance
- secured HTTPS API endpoint
- autostarts when shuts down (at least once every 24 hours) with only a few minutes of downtime
-
AWS Lambda
- Low memory (max 2-4GB)
- quick timeouts
- costs scale with time
-
Kubernetes Cluster
- Managed cluster costs >$100
- cost for the resources used
-
Deploying bare-metal
- 16 GB VM on Google Cloud Platform costs $65/month
-
GPU.land
- Tesla v100 for $0.99/hr - 1/3rd the price of GCP/AWS
- non-interruptible instance
-
AWS Spot Pricing
- R5 large instance type is 2 cores and 16 GB and goes for 2.1 cents per hour or 15.12 a month and that’s running 24x7
-
Heroku
- CPU instance is free
Source: "Pricing Overview | Google Cloud" (from web, cloud.google.com)
-
Start running workloads for free
- Create an account to evaluate how Google Cloud products perform in real-world scenarios.
- New customers get $300 in free credits to run, test, and deploy workloads.
- All customers can use 20+ products for free, up to monthly usage limits.
-
Only pay for what you use
- With Google Cloud’s pay-as-you-go pricing structure, you only pay for the services you use.
- No up-front fees. No termination charges.
- Pricing varies by product and usage—view detailed price list.
-
Save up to 57% on workloads
- Google Cloud saves you money over other providers through automatic savings based on monthly usage.
- Pre-pay for resources at discounted rates—save up to 57% with committed use discounts on Compute Engine resources like machine types or GPUs.
-
Stay in control of your spending
- Control spending with budgets, alerts, quota limits, and other free cost management tools.
- Optimize costs with actionable, AI-powered intelligent recommendations and custom dashboards that display cost trends and forecasts.
- 24/7 billing support available.
-
Estimate your costs
- Understand how your costs can fluctuate based on location, workloads, and other variables with the pricing calculator.
- Get a custom quote by connecting with a sales representative.
-
Work with a trusted partner
- Find a partner to help find the best solution.
-
Start using Google Cloud
- Try it free or go to the console.
-
Continue browsing
- See all products available.
Source: "Is it better to use cloud computing or to buy m..." (from reddit, r/MLQuestions)
-
Amazon EC2 P3 Instances
- Costs around \~$0.42 per hour with really good hardware
- Can use Google Colab for learning ML and research, but not for training huge models with high expectations
-
Buying a GPU
- Second-hand last-generation GPU can cost around $100-200
- GTX 1080ti can cost around $500 (though prices have jumped up a lot)
- Need to factor in cost of electricity and building a PC
-
AMD Radeon GPU
- Can use ROCM (their version of CUDA) for Tensorflow, but it has a tiny userbase so may take more time hunting down obscure bugs and is not recommended for now
Source: "Cheapest Cloud GPU service" (from reddit, r/learnmachinelearning)
-
AWS spot gpu instances
- Cost 30% of their list price, and rarely get interrupted
- Can be used through Sagemaker
-
Paperspace
- Gives users free notebook
-
Vast.ai
- Distributed cloud computing market where individuals rent out their GPU’s and set their own price
- Far cheaper
-
Genesis cloud
- Has pretty competitive pricing
-
Google Cloud Platform (GCP) AI Platform
- Costs around $17/month for some what 16 node hour machine (prediction)
- Rest API also works like charm
- Storage costs around $25 for 1 TiB/month
-
GPU.land
- Tesla V100s at $0.99/hr
- Instances boot in 2 mins and can be pre-configured for Deep Learning
-
Google Colab
- Free for 12 hours per session
-
DataCrunch.io
- Very cost efficient options
-
TensorDock
- Cloud GPUs are among the most affordable on the market
- V100s from $0.57/hr
- Instances boot in as little as 45 seconds and can be pre-configured to run ML training workloads on Jupyter Notebook/Lab
-
Jarvislabs.ai
- Starts at 0.19$/hr for RTX 5000, and 0.99$/hr for an A100
- Takes less than a few minutes to get started
💭 Looking into
What are the advantages and disadvantages of using NVIDIA DGX SuperPOD for training large language models?
💭 Looking into
What are the most cost-effective methods for training large language models?
💭 Looking into
What are the available instance types and their associated costs?
💭 Looking into
What are the size and speed limitations of Google Colab?
💭 Looking into
What are the recommended components for building a gaming PC for NLP and training language models?
💭 Looking into
What are the available instance types and their associated costs?
💭 Looking into
What are the available instance types and their associated costs?
Source: "NVIDIA Brings Large Language AI Models to Enter..." (from web, nvidianews.nvidia.com)
-
NVIDIA DGX SuperPOD
- Provides a production-ready, enterprise-grade solution to simplify the development and deployment of large language models.
- Enables enterprises to overcome the challenges of training sophisticated natural language processing models.
- Automates the complexity of LLM training with data processing libraries that ingest, curate, organize and clean data.
- Uses advanced technologies for data, tensor and pipeline parallelization, enabling the training of large language models to be distributed efficiently across thousands of GPUs.
- Can run on two NVIDIA DGX systems to shorten the processing time from over a minute on a CPU server to half a second, making it possible to deploy LLMs for real-time applications.
-
NVIDIA Triton Inference Server
- Provides LLM inference workloads to scale across multiple GPUs and nodes with real-time performance.
- Included in the NVIDIA AI Enterprise software suite, which is optimized, certified and supported by NVIDIA.
- Available from the NVIDIA NGC catalog, a hub for GPU-optimized AI software that includes frameworks, toolkits, pretrained models and Jupyter Notebooks, and as open source code from the Triton GitHub repository.
-
NVIDIA LaunchPad
- Enterprises can experience developing and deploying large language models at no charge in curated labs.
-
NVIDIA NeMo Megatron and Megatron 530B
- Enables enterprises to build their own domain-specific chatbots, personal assistants and other AI applications that understand language with unprecedented levels of subtlety and nuance.
- Megatron 530B is the world’s largest customizable language model.
- Organizations can apply to join the early access program for the NVIDIA NeMo Megatron accelerated framework for training large language models.
Source: "Recommendations: External GPU for NLP code?" (from reddit, r/LanguageTechnology)
-
Renting time on Amazon EC2 or GCP from Google
- Might be cheaper and easier to set up
- Flexibility of cloud instances is great
- Can learn important cloud skills
- Price to rent mid-level/SOTA hardware will probably stay steady
-
Building a PC with a decently chunky NVIDIA GPU
- Could be more cost-effective than buying an eGPU box
- Prioritize a GPU with a large amount of RAM over pure compute power
- For best system stability, upgrade to Ubuntu 19.04 or 19.10
-
Using Google Colab
- Might be too limiting for some use cases
- Can run BERT base uncased with 256 input length and 16 batch size
-
Pre-emptible TPU on GCP
- Costs about $1.5 / hour
- Can use the $300 credit they give every new user
Source: "[2104.04473] Efficient Large-Scale Language Mod..." (from web, arxiv.org)
-
Tensor parallelism
- Allows for training of models with trillions of parameters
- Can scale to thousands of GPUs
- Can achieve a per-GPU throughput of 52% of theoretical peak
-
Pipeline parallelism
- Can improve throughput by 10+% with memory footprint comparable to existing approaches
- Can be composed with tensor and data parallelism to scale to thousands of GPUs and models with trillions of parameters
-
Data parallelism
- Can be composed with tensor and pipeline parallelism to scale to thousands of GPUs and models with trillions of parameters
Source: "[P] Exafunction NLP – Run models like GPT-J and..." (from reddit, r/MachineLearning)
-
Exafunction NLP
- 6x cheaper than OpenAI per token
- HTTPS API that can be tried out for free
- Supports custom and fine-tuned models
- 100 - 1000x cheaper than Hugging Face for smaller models like BERT
-
AWS GPU instances
- Expensive
- Needed for low latency responses
💭 Looking into
What is the best affordable GPU for NLP and training large language models?