Every development team fantasizes about their app hitting it big. Few actually plan for it, leaving them flat-footed by the reality of rapid scaling and its effect on cloud bills.
Public cloud GPUs are expensive, making them ideal for occasional usage bursts. Once you use cloud GPUs more regularly, however, your cloud bill will make you regret your product’s popularity.
For a customer of ours with a popular photo-editing app, usage jumped by 40% during Covid. Given the app processes AI-heavy workloads, cloud GPU usage spiked to an extraordinary level…and stayed there. It was an absolute dream from an adoption perspective but an absolute nightmare in bills. Once millions of users turned into millions of dollars owed, the company reached out to Deft.
What public cloud can’t provide: high-performance computing resources at stable prices
The app was downloaded 35 million times in its first year. Every developer and investor wants their product to be that popular, but they don’t always know what to do when data needs exceed their wildest dreams. A mega-popular app processing millions of photos with AI can quickly overwhelm infrastructure, let alone bank accounts. While the public cloud was perfect for a small-scale app finding its fanbase, it eventually cost them a fortune.
Despite hundreds of thousands of dollars in monthly bills, the team faced lag and deployment issues. They had complex needs that required real, human support — something that doesn’t exist with major public cloud providers.
All the money in the world couldn’t get Google or AWS to offer them personalized support. And the app developers were, in fact, paying all the money in the world.
Ultimately, they were driven off the public cloud by the one thing that wakes up every CFO in a cold sweat — a surprise cloud bill. Without warning, the team received an invoice for three times the expected amount. The culprit was a single line item that read “data transfer.” Relying on cloud GPUs had put them into a position where growth wasn’t necessarily a good thing.
When stabilizing cloud costs is more critical than growth
When the app developers approached Deft, they had a pretty good handle on what they needed:
- a data center to support an always-on, steady-state application environment;
- an object storage platform without data egress fees;
- secure, performant connectivity to public cloud services for burst capabilities; and
- a vendor they trusted to answer the phone and provide expert support.
Any managed IT services provider should be able to provide the first two. On the third point, Deft operates in a class of its own. So many of our clients are long-term partners because we pair them with a team of experts who stick with them. When they pick up the phone, they know someone will answer, and they know it will be someone who knows their business.
Saving millions by moving GPU-heavy workloads off the cloud
Over many pandemic-distanced visits and one trip to Deft’s headquarters in Chicago, we were able to sketch out the rough contours of what the app needed to support its new user base:
- Close to 500 managed dedicated servers with the ability to self-provision more
- 10 petabytes of object storage on a new cluster in Ashburn, Virginia
- 25-gig ethernet
- A completely new spine-and-leaf network configuration that scales from 300 to 30,000 servers without outages
- Managed network connectivity into GCP and AWS
The new setup saved them millions of dollars. Millions!!
Demonstrating our customer-first ethos through 500 hand-assembled servers
Once the contract was signed, we got started on the custom-managed IaaS solution. Usually, we would buy 500 pre-configured servers, ship them to our Ashburn data center, then fly our team out to work on the build. The 2020 of it all made that more or less impossible.
Instead, supply chain issues and travel restrictions forced our team to order 500 individual drives, RAM and disks, assemble them in our Elk Grove Village data center, and ship the completed servers to Ashburn. There, all of our local employees (techies and non-techies alike) rallied to handle the installation.
Using the public cloud to test new features with little investment
It made perfect financial sense for the app company to move its production environment on-prem. But just because the public cloud didn’t work great for the bulk of its business didn’t mean the cloud was out of the picture. As part of our project, we gave them direct connectivity to GCP and AWS. Now, the company can tap the public cloud to test new features without racking up surprise bills. (Today, they’re testing algorithms to catch objectionable content, automatic image compression, and various AI/ML applications.) Anything that works will ultimately move to a long-term spot on Deft-hosted servers.
Knowing when to use public cloud vs. on-prem GPUs
Supporting more than 150 million users a month, this client has needs that are an order of magnitude bigger than most other companies. Still, the fundamental problem they faced can happen to anyone.
While variable computing needs will always be cheaper in a public cloud, anything stable will always be cheaper on-prem. That’s equally as true for a startup as it is for a Fortune 500 company.
This client isn’t the first company we’ve moved out of the public cloud, and they won’t be the last. If you’re seeing massive cloud bills and aren’t quite sure you’re getting what you’re paying for, let’s talk.