A key discipline in public cloud computing is cost optimization, which refers to the process of managing and minimizing the expenses associated with running compute workloads using cloud services. One important cost-saving component, which has emerged are spot instances. So, what are spot instances, what use cases can they be applied to and what are the trade-offs when using them to run your workloads in the cloud?
What are spot instances?
Spot instances are a compute offering provided by hyperscaler providers such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure, which allow organizations to bid on spare compute capacity, providing a highly flexible, dynamic, and economical solution for various compute provisioning scenarios.
Compute charges are based on spot-pricing, in which (unlike on-demand instances, which are based on a fixed hourly price) pricing fluctuates based on the supply and demand of a cloud provider’s overall current compute capacity. Organizations bid using a highest bidder auction type strategy (an hourly rate they are willing to pay for a specific instance type and size) and if their bid exceeds the current spot price, their instances are provisioned. If the spot price exceeds the bid price, then the instance owner receives a short notice interruption or eviction notification (varies dependent on cloud provider but is usually 30 seconds or 2 minutes).
Cost savings offered by spot instances can be significant, often allowing organizations to access compute resources at a fraction of on-demand costs (typically 70-90%). This makes spot instances an attractive option for a wide range of workloads and by taking advantage of excess capacity within a cloud provider’s infrastructure, organizations can achieve substantial cost savings without compromising the performance or scalability of their applications.
When should you use spot instances?
Spot instances are suitable for various scenarios but not all. They range from development workloads to large-scale data processing tasks. We will explore the use cases for spot instances, highlighting the benefits and considerations for several types of applications and when to avoid.
Big data and analytics
The fault tolerant nature of big data systems makes spot instances an excellent choice for big data processing and analytics workloads. Tasks such as data extraction, transformation, and analysis can be performed at a significantly lower cost, leveraging the available spot capacity. By utilizing spot instances, organizations can process large data sets efficiently, enabling them to gain valuable insights without unduly impacting OpEx budget.
Containers are often stateless and fault-tolerant making them a great fit to run at scale on worker nodes deployed in a container-based or Kubernetes cluster using spot instances.
Continuous integration and continuous deployment (CI/CD)
CI/CD pipelines require scalable and cost-effective compute resources and spot instances can be integrated into CI/CD workflows, allowing resources to be dynamically provisioned during build, test, and deployment stages.
Spot instances can be used to build fault-tolerant systems that can withstand interruptions or instance terminations. By utilizing spot instances from multiple availability zones, organizations can distribute their workload across multiple instances, reducing the risk of service disruption. In case of instance termination due to a price spike or capacity shortage, the workload can seamlessly shift to other instances, maintaining system availability without incurring additional costs. To support fault tolerance, applications required to be loosely coupled utilizing message queues and events. Application Programming Interface (API) calls also need to be idempotent. However, if the application is not fault tolerant and stateful the decommissioning of a spot instance could cause disruption, so each application should be considered and evaluated for spot instance suitability.
High-performance computing (HPC)
Spot instances can be effectively used in high-performance HPC applications. Scientific simulations, computational fluid dynamics, and genetic modelling are examples of compute-intensive workloads that benefit from loose coupling and parallelization. By leveraging spot instances with Parallel Cluster technology for vCPU selection, organizations can perform complex calculations at reduced cost, enabling acceleration in innovation while maximizing operational efficiency and optimizing cost and performance.
Machine learning (ML) and artificial intelligence (AI) training
Spot instances offer an attractive option for training ML models and running AI workloads. Training complex models often requires significant computational resources, which can be expensive using on-demand instances. Spot instances provide a cost-effective alternative and by optimizing their spot instance strategy, organizations can achieve substantial savings while leveraging the power of ML and AI technologies.
Test and development environments
Spot instances provide an ideal environment for testing and development purposes. Development and test teams can provision instances at a fraction of the cost and the transient nature of spot instances is advantageous for short-lived projects or environments. This allows organizations to optimize their resource allocation and reduce the time required for development cycles.
However it is not recommended to use spot instances for test environments where performance testing for UAT is required due to the nature of spot instance not reflecting a production always available instance / environment.
Web applications and batch processing
Web applications with fluctuating demand, such as e-commerce platforms or media streaming services, can leverage spot instances to scale their infrastructure cost-effectively. During periods of high traffic, additional spot instances can be provisioned to manage the increased workload, ensuring optimal performance. Similarly, batch processing tasks like video transcoding, image rendering, or large-scale data transformations can be processed efficiently using spot instances, minimizing the time and costs involved.
What are the limitations of spot instances?
It is important to stress that spot instances come with caveats. If demand increases and spare compute capacity decreases, the spot price may well exceed the organizations current bid price. This results in interruptions and instances are terminated after a short notification window.
This means that spot instances are not suitable for mission-critical or time-sensitive workloads that require uninterrupted availability. To mitigate the risk of interruptions, organizations can employ strategies like diversification, where workloads are distributed across multiple spot instance sizes and families. This helps ensure that even if one spot instance is terminated, the overall workload remains unaffected. Additionally, organizations can use tools and automation to monitor spot instance prices and manage bids effectively.
It is important for organizations to fully understand the nature and characteristics their workloads and the disadvantages of spot instances. This will enable key decision-makers to make informed choices to determine the suitability of spot instances for their specific use cases. These disadvantages include:
- Unpredictable Availability – this can disrupt critical operations and affect service continuity, making spot instances less suitable for applications that require consistent availability.
- Potential for Interruptions – organizations relying on spot instances should have strategies in place to manage interruptions, such as using checkpointing mechanisms or implementing fault-tolerant architectures.
- Resource Price Volatility – organizations using spot instances need to closely monitor pricing trends and have mechanisms in place to manage their costs effectively, such as implementing budget thresholds.
- Limited Applicability – some applications require continuous availability and consistent performance, making on-demand instances or reserved instances a more viable option. Workloads with short execution times or those that can manage interruptions and restarts gracefully may benefit more from spot instances.
What you need to remember before getting started.
Cost optimization has become imperative for organizations as they transform their business to meet modern digital demands, and spot instances offer an attractive solution for achieving significant savings.
While cost savings are a major benefit, it’s important to note that spot instances come with trade-offs. Since the spot prices can fluctuate, there is a possibility that the instances can be interrupted or terminated by the cloud provider if the spot price rises above the spot instance bid. While the rate of spot price fluctuations has improved with new models focusing on capacity and demand rather than consumers out bidding each other, they can be terminated by the cloud provider at short notice. Therefore it is crucial for workloads running on spot instances must be fault-tolerant and able to manage interruptions gracefully.
This means not every workload can take advantage of spot instances as they are migrated to the cloud, but, as we have seen from some of the use cases in this article, they are a viable choice for many applications.
By strategically leveraging spot instances alongside other compute options, organizations can achieve a balance between cost-efficiency and reliability in their cloud infrastructure, unlocking new possibilities for innovation and growth.
Infosys supports and collaborates with its customers today, using a wealth of experience on across AWS, Azure and GCP. With over 110,000 certified cloud experts and holding advanced alliance partner accreditations, we can collaborate with customers to consult and advise how to best implement spot instances as part of an overall cloud strategy.