As clients use of cloud storage grows exponentially, adopting disciplined approaches to managing and optimizing these resources is critical.
For example, some of our clients maintain over 15 petabytes (PB) of data, that increases at a rate of 0.5 terabytes (TB) per day. Assuming we calculate using EU-West AWS’s rate of $0.023 per gigabyte (GB) per month, 15PBs of storage costs approximately $345,000 per month or $4.14 million annually. The daily 0.5TB increase accumulates to 182.5TB annually, adding $4,197 in monthly costs or $50,364 per year.
Even though AWS storage charges are quite competitive, if proactive management isn’t implemented, the sheer volume of data will eventually cause material costs to rise. This tendency can be lessened by using best practices including compression, storage tiering, and duplicate elimination.
According to industry projections, the amount of digital data created over the next 5 years is expected to grow by 4 times, driven largely by the rapid growth of AI systems. This exponential increase underscores the importance of continuously optimizing data storage and implementing cost-saving measures. At this projected rate of growth, that initial 15PB of data would become 60PB in 5 years, potentially costing over $16.5 million annually for storage.
- Tiering data across storage classes
- Automating transitions between classes
- Deleting outdated/unnecessary data
- Right-sizing storage resources
- Enabling compression
- Deduplication with additional software
Using these best practices will result in continuous cost reductions and increased productivity. Above all, these policies set the discipline needed to long-term, strategically manage our data at scale.
Leverage Storage Tiering
AWS offers various storage classes optimized for different access patterns. Selecting the appropriate tier can significantly reduce costs.
- Use S3 Standard for frequently accessed data. This offers low latency and high throughput.
- Use S3 Standard-IA for infrequently accessed data. This has a lower per-GB cost than Standard but charges a retrieval fee. Savings range from 30-60% compared to Standard.
- Use S3 Glacier for rarely accessed data with retrieval time flexibility. Costs up to 90% less than Standard-IA.
- Use S3 Glacier Deep Archive for archival data accessed once per year or less. The lowest cost option.
- Implement S3 Lifecycle Rules to automatically transition objects between tiers based on age or last access date. This automates cost optimization.
Actively Manage Data Lifecycles
Actively delete or archive data that is no longer frequently used. This limits cost accumulation from unused data.
- Delete outdated, transient or duplicate data.
- Advocate use of pre-signed URLs whenever possible
- Archive data with annual or infrequent access to Glacier for significant savings.
- Review regulation requirements and delete or archive backups accordingly.
- Consider third-party archive tools that use automation to enforce rules consistently.
Compress Data Whenever Possible
Before writing to S3, compression techniques like GZIP and LZ4 can drastically reduce storage volume.
- If storage savings outweigh computation expenses, compress fresh data uploads.
- Turn on compression for Amazon Aurora and other database services.
- For analytics data, use compressed formats like Parquet; for photos, use JPEG.
Optimize Database Storage
- Use Amazon Aurora for relational data. It separates compute and storage for better scalability.
- For document and key-value data, use Amazon DynamoDB. It has lower overhead than relational databases.
- Partition database tables and indexes to optimize performance and lower costs.
- Configure data types tightly based on usage to avoid over-provisioning storage.
- Archive or delete aged records and data no longer frequently used.
Right Size Storage Resources
- Resize EBS volumes to align with actual usage.
- Choose smaller EC2 instance types with attached EBS when possible.
- Scale RDS DB instance storage size and type to match needs.
- Delete unused EBS snapshots.
Review Data Migration Options
- Evaluate the end-to-end data transfer flow (for existing or newly created data)
- Ensure data is always located near the using location.
- Evaluate Snow Family products for importing new data from on-premises
- Assess how much data you need to access due to egress costs
- Use Lambda functions to run storage optimization tasks on schedules or data events.
- Build workflows around AWS Step Functions to coordinate cross-service tasks like Lifecycle transitions.
- Integrate Lambda/Step Functions with EventBridge for event-driven automation.
Track Costs and Usage
Analytics tools provide visibility into usage and spending trends.
- Use AWS Cost Explorer to view storage usage patterns and growth.
- Leverage Cost & Usage Reports for detailed analysis in third-party tools.
- Review Storage Lens metrics like daily storage changes and activity levels.
Top 10 Storage Optimization Takeaways
Use storage tiering to match access patterns and costs.
- Use retention, archiving, and deletion policies to actively manage data lifecycles.
- Condense data to save on storage space.
- Choose managed database services based on the data patterns you have.
- Appropriate size storage resources for requirements.
- Use serverless tools to automate manual optimizations.
- Examine cost and use information to find areas where money might be saved.
- Use EBS volumes and snapshots to maximize the storage of EC2 instances.
- Move cold data to layers with rarely access.
- Create an efficient culture around storage procedures.
Implementing these guidelines at the start of your cloud journey will lay the foundations for managing storage economically at scale. The recommendations provided can create a playbook for teams to optimize storage costs systematically.
Ultimately, when spending a large percentage of the overall annual recuring cloud costs, it is crucial to always encrypt the data as needed (at rest and/or in transit, server-side or client-side, up to you!).
Ross has over 25 years of enterprise IT experience. He started with roles in engineering and infrastructure, then moved into strategic cloud consultancy on Azure and AWS. More recently, Ross has focused on complex cloud transformations and alliances partner networks.
Dr Hassan Shuman is a multi-certified cloud architect, AI specialist and trusted advisor in the industry. With 22 years in IT consultancy, he helps clients transform how their businesses operate with an open, extensible data and AI platform that runs on any cloud. Hassan is also a regular speaker at events including AWS summits and IDC conferences on topics including cloud migrations, serverless technologies, AWS machine learning and data science.
Valentina has over 10 years of experience in IT consultancy, project management and digital transformations. Prior to this, Valentina worked in Mergers&Acquisitions. Passionate about new technologies and certified AWS cloud architect, she is specializing in cyber security while helping clients to optimize and secure their cloud infrastructures.