惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

The Hacker News
The Hacker News
月光博客
月光博客
Last Week in AI
Last Week in AI
D
DataBreaches.Net
MyScale Blog
MyScale Blog
The Register - Security
The Register - Security
D
Docker
酷 壳 – CoolShell
酷 壳 – CoolShell
Y
Y Combinator Blog
WordPress大学
WordPress大学
Microsoft Security Blog
Microsoft Security Blog
I
InfoQ
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
P
Privacy International News Feed
云风的 BLOG
云风的 BLOG
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
L
LangChain Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
C
Check Point Blog
V
V2EX
P
Palo Alto Networks Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
小众软件
小众软件
博客园 - 叶小钗
A
Arctic Wolf
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
Martin Fowler
Martin Fowler
Simon Willison's Weblog
Simon Willison's Weblog
Security Latest
Security Latest
阮一峰的网络日志
阮一峰的网络日志
博客园 - 【当耐特】
Know Your Adversary
Know Your Adversary
N
Netflix TechBlog - Medium
Recorded Future
Recorded Future
B
Blog RSS Feed
T
Tenable Blog
S
Secure Thoughts
Vercel News
Vercel News
Hugging Face - Blog
Hugging Face - Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com
PCI Perspectives
PCI Perspectives
T
Tor Project blog
MongoDB | Blog
MongoDB | Blog
A
About on SuperTechFans
罗磊的独立博客
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
F
Fortinet All Blogs
Webroot Blog
Webroot Blog
T
Threat Research - Cisco Blogs

Apptio

How IBM Apptio Delivers Data Center Value in the Age of AI - Apptio Managing K8s Agent Updates at Scale with Helm and Terraform - Apptio GitOps with IBM Kubecost: Preventing Argo CD Rollbacks - Apptio GitOps with IBM Kubecost: API-Driven Rightsizing - Apptio It’s Here: Meet the New IBM Apptio Report Studio – A Faster, More Intuitive Approach to Reporting - Apptio New Tech, Same Rules: Cloud Lessons for an AI Advantage - Apptio IBM Cloudability Advanced Containers for Kubernetes FinOps - Apptio The Next Era of IT Financial Management Reporting with the New IBM Apptio Report Studio - Apptio From Guesswork to Confidence: Introducing Intelligent Forecasting for Tech Spend Planning - Apptio Smarter Technology Spend with AI-Driven Financial Intelligence - Apptio Budgets Are Up, Confidence Isn't: 2026 Global Tech Investment Insights - Apptio IBM Kubecost 3.1: Kubernetes Resource Quota Rightsizing - Apptio Driving FinOps Forward in 2025 and Beyond - Apptio ITFM Maturity: The Next CIO Imperative in the Age of Innovation - Apptio How Banks Can Optimize IT Spend Without Sacrificing Impact - Apptio Introducing IBM Apptio Product TCO: Turn Product Spend into Strategic Investments with Clear, End-to-End Total Cost of Ownership and Unit Costs: Creating a Strategic Lens for IT Investment Decisions - Apptio Workforce Management: The Engine of Strategic Portfolio Management - Apptio FinOps for AI: Enabling the Next Wave of Cloud Innovation - Apptio IBM Kubecost 3.0: Faster, Smarter, and Built for Scale - Apptio Introducing IBM Apptio Mainframe TCO: Complete Visibility into Mainframe Costs and Usage - Apptio Essential K8s Cost Metrics for Reducing Spend - Apptio The New Standard for Strategic Portfolio Management: Financial Visibility at Every Level - Apptio K8s Cost Ownership: Who’s Responsible and How to Make It Work - Apptio Kubecost 2.8: Centralized Custom Pricing and a Big Performance Leap with ClickHouse - Apptio Unlock the Power of IT Financial Management with IBM Apptio Essentials - Apptio Full Transparency for Smart AI Investments with IBM Apptio’s AI Total Cost of Ownership & Usage - Apptio What’s New in IBM Apptio Planning - Apptio Labeling in Kubernetes: From Metadata to Money-Saving Insights - Apptio Innovative Approaches to Drive Tech Spend Management with AI, Analytics, and Automation - Apptio Discover Kubecost on Apptio’s Website: Enhancing FinOps with Kubernetes Insights - Apptio How Kubecost Collections Works: From the Engineers Who Built It - Apptio Kubecost 2.7 Release Highlights - Apptio Introducing Hybrid IT TCO Impact: IBM Apptio’s Newest Solution to Manage Your Evolving Hybrid Environment - Apptio FinOps Essentials: Best Practices for Finance Teams - Apptio Upgrading Your Kubecost Experience - Apptio Kubecost 2.6 Release Highlights - Apptio Kubernetes Efficiency: Cut Waste, Not Performance - Apptio FinOps Essentials: Best Practices for Product Teams - Apptio Optimizing GPU Monitoring for AI Efficiency - Apptio Kubecost 2.5 Release Highlights - Apptio FinOps Essentials: Strategies for Smarter Cloud Spending - Apptio Achieving Cost-Effective Scaling in Kubernetes - Apptio Celebrating OpenCost’s Journey to CNCF Incubation - Apptio Unlocking Technology Value: The Essential Role of TBM in Modern IT Management - Apptio The Complex Costs of AI: Investments, Funding, and ROI Tracking - Apptio Addressing Carbon Emissions in IT: The New Business Imperative - Apptio Kubecost Brings NVIDIA GPU cost monitoring for AI workloads in 2.4 - Apptio Kubecost Launches Support for Oracle Cloud - Apptio Kubecost 2.4 Release Highlights - Apptio Evolving Container Cost Visibility with IBM Cloudability - Apptio Transforming the Way You Manage Your Digital Portfolios: IBM Apptio and ServiceNow - Apptio Maximizing Kubernetes Cost Efficiency with Kubecost and Amazon EKS - Apptio FinOps Foundations: Strategies for Cross-Team Alignment - Apptio Managing Kubernetes Costs in a Multi-Cloud Environment - Apptio Unlock Kubernetes Savings with Kubecost’s Automated Actions - Apptio Celebrating a Milestone: Kubecost Surpasses 10 Million Installs - Apptio Kubernetes Pricing: On-Premises Versus Cloud Environments - Apptio Maximizing Multi-Cloud Efficiency with Kubecost APIs - Apptio Aligning Tech Financials and Labor Resources - Apptio Kubecost 2.3 Release Highlights - Apptio IBM Cloudability Introduces New Innovations for FinOps Practitioners - Apptio Kubecost Launches the First Kubecost Certification - Apptio Enhancing Cloud Reporting: Attributing Costs to Kubernetes Applications - Apptio Building a FinOps Solution for All - Apptio Navigating the Growing Complexities of Technology Spend Management - Apptio Apptio Introduces New Products to Elevate Technology Value - Apptio Developing a Strategy for Kubernetes Cost Monitoring - Apptio Visualize your Kubernetes Network Costs - Apptio Getting Highly Accurate and Granular Cost Metrics with Kubecost - Apptio 8 Steps to Achieve Enterprise Agile Planning Success - Apptio FinOps and TBM at Public Sector Summit - Apptio Introducing Kubecost Collections - Apptio Apptio & ServiceNow launch new Application Portfolio Management capabilities - Apptio Beyond the Hype: A Balanced View of AI Adoption - Apptio 2.1 Release Highlights - Apptio Starting your Technology Business Management Journey: A Guide for Smart Technology Investments with Apptio Cost Management - Apptio IBM Cloudability: 2023 Innovation in Review - Apptio Kubecost 2.0 Packaging - Apptio Kubernetes Readiness Probe: Tutorial & Examples - Apptio Kubernetes Health Check - How-To and Best Practices - Apptio Introducing Kubecost 2.0 - Apptio Kubernetes Liveness Probe: Tutorial & Examples - Apptio Kubernetes Cluster Size: Your Guide to Optimization - Apptio EKS Cost Monitoring: The Kubecost and AWS Solution - Apptio Kubernetes Deployment Strategies: Tutorial & Examples - Apptio Guide to Kubernetes Management Tools - Apptio Automate Your Rightsizing Workflows with IBM Cloudability and ServiceNow - Apptio Kyverno and Kubecost: Real-time Kubernetes Cost Management - Apptio Making Smarter IT Decisions With Apptio Plus ServiceNow Breaking Down Silos Between Finance and Agile Teams With Lean Budgeting - Apptio Maximize Cloud Savings Without Running Afoul of AWS RI Marketplace Enforcements - Apptio Tracking Waste on Kubernetes Clusters - Apptio Blending FinOps With Observability - Apptio 3 Key Opportunities for Finance Teams in SAFe 6.0 - Apptio Kubecost Predict Part 2: Introducing the Admission Controller - Apptio The CEO as the Chief Transformation Officer - Apptio Client Challenge Client Challenge Client Challenge
Cost-Effective Deployment Policies With Kyverno - Apptio
PSV Ravi Kumar · 2023-11-17 · via Apptio

The challenges

Apptio Kubernetes Platform (AKP) hosts many different types of workloads, both static and transient in nature. At Apptio, we allocate Dedicated Hosts for the steady-state workload and rely on Spot instances for the transient workloads. Dedicated Hosts help with CapEx and make our cloud compute spend more predictable for the year, assuming these Dedicated Hosts are fully used, which turned out to be non-trivial. There were multiple reasons why our Dedicated Hosts ended up underutilized. We have a lot of smaller, more transient workloads that would sneak in onto the large Dedicated Hosts. The largest static workloads required r5.8xlarge instances, but AWS did not support them on r5. That meant we had to fill the excess with r5.4xlarge. Initially these workloads were relatively few, and we deployed them to On-Demand hosts, even though there was room for them on the Dedicated Host fleet. Eventually, the largest workloads grew in number and the associated rise in the On-Demand cost became hard to ignore. Finally, we lacked a robust policy engine to make the allocation of workloads more efficient.

Many nodes ended up being utilized at just 60% or 70%. Subsequently, the hosting density decreased even more due to planned and unplanned restarts of the workloads. Although we had good control over which instance types the Cluster Autoscaler would scale up, we did not have the equivalent control of the scheduling of pods onto nodes that had already been provisioned. Following the restarts, some workloads that were supposed to run on the Dedicated Hosts ended up on the On-Demand instances. In other words, we were underusing the services that we paid for, and we were wasting money on the unplanned On-Demand spend. Another downside was that our total number of nodes on the platform was unreasonably high. The high number of nodes has several drawbacks. Management overhead was one. Another was related to third-party vendor costs. Many vendors, like Datadog, for example, charge per node. We have about 3k production nodes instrumented with Datadog agent and about 75% of them also have Application Performance Monitoring (APM). The associated cost was not insignificant. Increasing the instance size of some node groups in our clusters would mean fewer nodes, a potentially better hosting density and a lower Datadog bill.

The actual challenge was to implement policies by which certain workloads, based on their resource constraints and purpose, would reliably go to specialized node groups with different underlying instance sizes.

Historically, we have been using Gatekeeper to manage policies. Along with several benefits, Gatekeeper has limitations, e.g., a steep learning curve of the rego language and limited support for mutations.

To address the shortcomings, we turned our attention to Kyverno, which would give us some extra control to modify the pod node selectors when they launch.

Kyverno

The initial rollout of Kyverno coincided with the rollout of larger EC2 instances to accommodate more resource demanding workloads and focused on improving the hosting density of other workloads simply because they could better fit into the larger instance. New Kyverno policies would apply to specially designated services of our flagship application and ensure their placement on the larger instances. For this purpose, we chose r5.12xlarge instead of the traditionally used r5.4xlarge and r5.8xlarge.

To introduce r5.12xlarge instances, we had to be mindful of several constraints.

AWS imposes a limit of 25 EBS volumes on Nitro instance types, which includes R5. This means if we were to allow pods smaller than a certain size to run on the r5.12xlarge instances, we would be risking not being able to fill them up because of the EBS volume limit and accumulating wasted capacity; in which case, we need to deal with smaller pods differently.

The flagship application also has several more ephemeral services used for data processing. They are not static by nature and are regularly being created and destroyed. If we used such a large instance type to host these workloads, we would be running the risk of significant fragmentation and wastage. This means we needed a different approach for the data processing pods, too.

Solution

We ended up with the following policy setup:

All steady-state services of the flagship application, which included web applications and web services deployments, are scheduled to the r5.12xlarge or r5.4xlarge instance types, depending on the size of their memory requests.

Pods requesting less than 40 GB are scheduled onto r5.4xlarge instances only. These instances will not be provided on Dedicated Hosts, only On-Demand instances. Placing them on these instances ensures that the 25 x EBS volume limit is not exceeded on the r5.12xlarges. The percentage of pods that are this small size is relatively low, making this approach the most optimal.

Pods requesting 40 GB or more are placed onto the r5.12.xlarge instances only. These instances can be on Dedicated Hosts or On-Demand instances, depending on the available Dedicated Host capacity. Soft affinity rules are in place to make these pods prefer Dedicated Host instances over On-Demand instances at the scheduling time.

All ephemeral data processing pods are scheduled to r5.4xlarge or r5.8xlarge instance types that are either On-Demand instances or Spot. We explicitly want to avoid hosting these workloads on the Dedicated Hosts to reduce the fragmentation of the Dedicated Host capacity. The r5.8xlarge will only be scaled up if a pod requires a larger size, otherwise r5.4xlarge will be used. This reduces the amount of fragmentation and allows instances to be churned more frequently. Separate instance groups are provisioned for this purpose with a designated label and an associated taint to prevent the other more static workloads from running on them.

The results

After these Kyverno policies were introduced, the hosting density and the utilization of Dedicated Hosts increased to 100%.

kyverno node reduction - Cost-Effective Deployment Policies With Kyverno - Apptio
Kubernetes node reduction after implementing Kyverno policies

As shown above, our total number of nodes for just the flagship application went down from 1500 to 1000, with a pleasant side effect of Datadog monthly bill decreasing by 14%.

kyverno compute cost 2 e1700488012831 - Cost-Effective Deployment Policies With Kyverno - Apptio

Most importantly, our unplanned On-Demand spend went down by $3k/day, i.e., 10% in savings. As part of our broader observability practice, we track our cloud hosting spend using our own tool, Cloudability. The decrease in the On-Demand spending is shown on this Cloudability chart.