Bonus Myth of Kubernetes Resoure Optimization: Overprovisioning

In this blog series we’ve examined Five Myths of Kubernetes Resource Optimization. But one final, bonus myth remains unaddressed:

I’ve done everything I can. The overprovisioned resources that lead to underutilized CPU and memory in my workloads is just the cost of doing business.

Unfortunately, many companies running cloud environments have come to think of overprovisioned resources as a cost of doing business, as inevitable as rent and taxes. This acquiescence to cloud waste has become pervasive, affecting even the most sophisticated IT teams. In fact, respondents in a recent survey indicated a 39 percent year-over-year increase in the amount of cloud spend over budget, and a third of companies say they will exceed their cloud budget by up to 40 percent.

In this final blog installment in this series we’ll demonstrate that things don’t have to be this way—and we’ll propose a solution.

Resource and Cost Optimization at the Infrastructure Level

In this blog series we’ve examined five different options that can help remediate cost overruns in the cloud, along with the benefits and limitations of each:

Observability and Monitoring
Cluster Autoscaling
Instance Rightsizing
Manual Application Tuning
Spark Dynamic Allocation

The fundamental gap with all of these options is that none of these solutions addresses the significant underutilization inherent within Kubernetes applications. Instead, all of these solutions are infrastructure-level optimizations.

Optimization at the infrastructure level saves money at the hardware layer and ensures the best financial return on an infrastructure investment. But only about 60 percent of the waste in a cloud environment exists at the infrastructure level.
app level framework

Figure 1: Only about 60 percent of the optimization tasks done for resources and cost exists at the infrastructure level.

What remains untouched with all these options are overprovisioned resources at the application/platform level where workloads run. This resource waste typically comprises around 40 percent of the optimization potential in a cloud environment.

This application/platform-level overprovisioning is not your fault, nor is it the fault of your developers. It stems from an underlying issue in application resource provisioning, particularly with data workloads.

As we’ve seen in this series, developers must request a certain allocation level of memory and CPU for their applications, and they typically request resources to accommodate peak usage; otherwise their applications get killed.

However, most applications run at peak provisioning levels for only a small fraction of time, which results in most applications having overprovisioned resources.

The Challenge of Overprovisioning

On average, typical applications can be overprovisioned by 30 to 50 percent or sometimes more. The FinOps Foundation recently reported that reducing overprovisioning waste has become a top priority among cloud practitioners, while the Flexera State of the Cloud 2023 report found that cloud spend overran budget by an average of 18 percent in their surveyed enterprises, resulting in nearly a third of cloud spend going to waste every day.

That’s a lot of underutilized resources going attributing to wasted spend!

Pepperdata Capacity Optimizer: Real-Time, Automated Resource Optimization for YARN and Kubernetes Workloads

Before Pepperdata Capacity Optimizer is enabled, the scheduler sees all the nodes in a cluster as fully utilized. It cannot schedule pending pods without resorting to autoscaling to spin up additional nodes.

The moment Capacity Optimizer is enabled, it pays attention to actual hardware utilization of the running applications and signals to the scheduler where more node capacity is available. Pending pods can then be launched on existing nodes—nodes that previously appeared fully utilized to the scheduler—without any need for the autoscaler to add new nodes.

The scheduler can now schedule the pending pods based on actual utilization, thus improving utilization of existing nodes.

Capacity Optimizer also enhances the efficiency of your cloud autoscaler by ensuring new nodes are provisioned only when existing nodes are fully utilized. This optimizes resource scaling without altering the autoscaler's downscaling behavior. In practice, Pepperdata's customers have noted up to a 71% decrease in the use of autoscaling once Capacity Optimizer was enabled.

If there are any pods in the pending state, the autoscaler (e.g. Karpenter) adds new nodes only when all existing nodes are fully utilized. Otherwise there is no need for the autoscaler to add new nodes since all nodes are running at maximum utilization.

Pepperdata Capacity Optimizer Gives Time Back to Your Developers

As with any solution, there are some things that Capacity Optimizer does not do (and in this case, what we don’t do may be very helpful!):

Capacity Optimizer does not change application code or configurations, nor does it require developers to do so. Developers are often sensitive about other people or third-party services touching their applications, especially without their knowledge. Capacity Optimizer never modifies applications or configuration parameters. It also never requires developers to tweak their applications. Instead, Capacity Optimizer provides the system scheduler with real-time visibility into actual CPU and memory usage levels—enabling the scheduler to pack pods to nodes with existing capacity.
Capacity Optimizer does not require any upfront data modeling. Unlike other solutions, Capacity Optimizer does not analyze a set of applications over a period of time and then tune them the next time they run. That type of effort is usually futile, because as we have seen in this blog series, modern data environments are highly dynamic; whatever happened last week may be completely different from whatever is happening this week. Instead, Capacity Optimizer is a real-time solution that empowers the native scheduler and Cluster Autoscaler with improved, point-of-action information that remediates waste as it occurs.
Capacity Optimizer does NOT rely on tuning recommendations; it’s real-time, automated resource optimization. As we saw in Myth #4, manual application tuning in response to automated recommendations can be both onerous and of limited value. Capacity Optimizer automates tuning, and does this at scale so your developers can spend their time on other critical projects instead of application tuning. However, if a customer really wants a list of tuning recommendations, detailed metrics and recommendations for individual applications are always available in the Pepperdata dashboard.

The net result of all these benefits: developers are freed from the tedium of tuning applications and configurations so they can focus on higher-value projects that grow your business.

The Proof is in the ROI

As with all solutions, the real metric is: are customers happy with their results? The answer is a resounding YES.

Capacity Optimizer has been deployed by some of the largest and most demanding enterprises in the world, including members of the Fortune 5, security-conscious global banks, and other top-tier companies that have come to trust and rely on Pepperdata. Hardened and battle tested in those environments for the last decade, Capacity Optimizer also provides deep application level observability as it optimizes cloud clusters continuously, autonomously, and in real time.

The examples below—one benchmark and four anonymized customers’ results—demonstrate incredibly compelling and ongoing successes achieved with Capacity Optimizer. Remember that these are customers who already did all of the engineering necessary to optimize their systems and still achieved incredible savings once they implemented Capacity Optimizer.

Pepperdata Customers Enjoy Significant Daily and Yearly Cost Savings

customer savings image
Figure 2: A Pepperdata Capacity Optimizer benchmark and customers using the software running their data workloads on Amazon EKS showcase significant cost savings on both a daily and annual level.

For even more detail on how our customers have saved, please see our Case Studies with the results of Pepperdata savings for Autodesk, Extole, and many others.

Are You Ready to Eliminate Underutilized Resources and Overspending?

To summarize the entire series of myths, we reviewed the common misconceptions around five key optimization strategies at the infrastructure and application level: deploying observability and monitoring tools and solutions, implementing Cluster Autoscaling, rightsizing instances, tuning applications manually, and enabling Spark Dynamic Allocation. These optimization strategies improve price/performance in your cloud cluster generally, but do not address the resource overprovisioning waste inherent within the application itself.

Pepperdata Capacity Optimizer resource optimization for data workloads on Kubernetes increases GPU, CPU, and memory by up to 80 percent to deliver an average cost savings of 30 percent—automatically, continuously, and in real time—with no application code changes, recommendations, or manual tuning.

You don’t need an engineering sprint or a quarter to plan for Pepperdata. In a simple 60-minute call, we’ll create a Pepperdata dashboard account with you. Pepperdata is installed via a simple bootstrap script into your Amazon EMR environment and via Helm chart into Amazon EKS. It’s totally free to test in your environment, and the savings you gain during your Proof of Value is also free.