惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
Apple Machine Learning Research
Apple Machine Learning Research
A
About on SuperTechFans
MongoDB | Blog
MongoDB | Blog
Y
Y Combinator Blog
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Security Latest
Security Latest
Project Zero
Project Zero
A
Arctic Wolf
L
LINUX DO - 热门话题
Microsoft Azure Blog
Microsoft Azure Blog
P
Palo Alto Networks Blog
Know Your Adversary
Know Your Adversary
D
Darknet – Hacking Tools, Hacker News & Cyber Security
Cloudbric
Cloudbric
大猫的无限游戏
大猫的无限游戏
Google DeepMind News
Google DeepMind News
G
Google Developers Blog
Stack Overflow Blog
Stack Overflow Blog
T
Threatpost
T
The Exploit Database - CXSecurity.com
T
Tailwind CSS Blog
PCI Perspectives
PCI Perspectives
WordPress大学
WordPress大学
T
Tor Project blog
阮一峰的网络日志
阮一峰的网络日志
The Hacker News
The Hacker News
V
Visual Studio Blog
M
MIT News - Artificial intelligence
月光博客
月光博客
D
DataBreaches.Net
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Simon Willison's Weblog
Simon Willison's Weblog
Attack and Defense Labs
Attack and Defense Labs
The Register - Security
The Register - Security
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
MyScale Blog
MyScale Blog
N
Netflix TechBlog - Medium
S
Security Affairs
T
The Blog of Author Tim Ferriss
P
Proofpoint News Feed
Spread Privacy
Spread Privacy
AI
AI
S
Schneier on Security
L
LangChain Blog
C
Cybersecurity and Infrastructure Security Agency CISA
博客园 - 叶小钗
量子位
H
Heimdal Security Blog
J
Java Code Geeks

Amplitude

Meet the Winners of the 2026 Amplitude AI Impact Awards Beyond Last-Touch Attribution: Find Out Which Interactions Really Matter Agent Connectors Are Better Together Agents That Act on What Actually Happened How Square Used Amplitude to Enhance the Seller Experience and Power Growth Migrating Analytics Platforms Without The Chaos Wanted Lab Grows Sign-Ups by 150% & Builds Experimentation Culture How to Balance Inference Cost and User Experience for Agents Introducing Zoning Insights: Web Intelligence at a Glance Five best practices for getting started with AI agents 24 Quarters at #1. Here’s What’s Next. How We Built a Product That Tells Us What To Build Next: Inside Amplitude Wave Looking Beyond Campaign Metrics: 7 Marketing Success Stories AI Evals for Product Managers: A Beginner’s Guide to Getting Started The Builder Skills Library Introducing Agent Connectors in Amplitude Understand How AI Thinks, Get Better Results How We Redesigned Amplitude Docs for Agents and Made Everyone an Author AI Broke Your Experimentation Program. Here’s How to Fix It. Every Stuck User Is a Support Ticket Waiting to Happen Tracing the Sale: Connect Behavior to Conversions with Persisted Properties Building CLI Agents: It’s What You Don’t Give Them That Counts Three Tips for Better Prompts in Amplitude Global Agent How AI Took the Data Analyst’s Job, and Created a Better One Default Prompts Are Tanking Your Agent’s Retention Optimizing Core Web Vitals with Amplitude’s Global Agent Don’t Ask Global Agent Anything, Ask These Three Things How We Built a Design Agent at Amplitude with Claude Managed Agents and Cloudflare The Problem with Chasing Churn How Hostinger Achieved a 20%+ Conversion Lift Through Experimentation How STAGE Streams Smarter by Putting Data at the Center Building the Validation Stack for AI Product Development Making AI Analytics Safe for Financial Services Teams Amplitude Heatmaps Update: More Reliable Screenshots and Accurate Placement Most Teams Ship Agent Personalities by Accident. We Didn’t. What I Learned Pointing a Ralph Loop at My Product for a Week How Mercado Libre Scales Decision Making with AI Claude Cowork for PMs: 5 Playbooks to Get Started How ACKO Drove 13% More Conversions & 50% Drop in Calls with GenAI Agents Just Made Your Feature Launch Channel Smarter Introducing The Amplitude Quickstart Series Rebuilding Session Replay’s Delivery Layer to Be Lighter on Your Page The Eval Signal That Predicts 3x Agent Retention Agents Write Code. Fixing It Is Still On You. Amplitude and Statsig Partnership 5 Agent Skills to Automate Your Weekly Product Review Amplitude Plug and Play: New AI Plugin in Claude and Cursor Marketplaces Introducing Amplitude Wizard CLI: Set Up Amplitude from Your Codebase Making AI Search Count (and Convert) How VEED Evolved Its AI Search Strategy What’s New with Amplitude Agents Effortless Support at Scale: Making Human Support More Human AI Week 2026: Upleveling All Together Amplitude AI Builders: Paul Hultgren Chats about AI Assistant Dashboard Dread to AI-Driven Decisions: How Tira Rebuilt Its Analytics Workflow Your Product Deserves a Better Support Agent How Cisco Systems Accelerated Adoption by 20% Through Data Innovation
Homegrown FinOps Tools: How AI “Build” Beat “Buy” for Us in <1 Year
Hac Phan · 2026-05-11 · via Amplitude

In April 2025, Amplitude officially started its FinOps org. I joined as the first and only FinOps Engineer, and my first big task was to pick our FinOps tool.

Traditionally, this is a no-brainer: for a company of Amplitude’s size and scale, you buy your FinOps platform.

However, April 2025 is also when Amplitude began a strategic pivot into becoming AI-first. That mandate showed up everywhere: in how we build and ship code, how we improve features through the sense/decide/act loop, and even in how we handle day-to-day work questions.

I wanted to rethink how an AI-native company would approach FinOps. Using the FinOps Foundation Framework as a guiding principle, I built my own tools and solved problems the AI way. And after only a year, the AI has already had a significant impact on efficiency.

Here are some of my experiences and what I built during a year of AI FinOps at Amplitude.

How AI changes the FinOps “build vs. buy” question

In FinOps, you need the right tools to ingest billing data from many sources, normalize it, and expose reports and dashboards on top. That’s how you’re able to justify the business value of your company’s tech stack and find ways to optimize it.

A traditional FinOps playbook might encourage us to scale FinOps by procuring a foundational FinOps tool. But with the advent of AI coding assistants, I decided to see if I could build our own economically.

In April 2025, Amplitude hosted an AI Hackathon Week where we learned how to use the new AI coding assistant tools and see what we could accomplish in a week. I was blown away by what they were capable of. The tab completions in GitHub Copilot were magical, but this was on another level.

Just as important as the coding assistants are the internal agents I built. Together, they changed FinOps from manual answering of every question to a system that encodes and reuses knowledge. By my estimate, I saved 50% of my time, allowing me to focus on cost optimization rather than answering questions or researching issues.

First steps: Data foundations and 3 AI Agents, v1

(Note: Agent names are borrowed from my favorite K-pop group. I’ll let you figure out which one.)

Redshift for data infrastructure

The first decision was where to store and normalize our billing data. I needed something that could query AWS Cost and Usage Reports directly in S3 (without building and maintaining a full ETL pipeline) and also let me define a single normalization layer that stays fresh without manual rebuilds.

Redshift checked both boxes: External Schemas (via Spectrum) let me query CUR data in place from S3 via the Glue Catalog, and Materialized Views gave me a single, easy-to-maintain table where all normalization logic lives as a single source of truth. When we needed non-AWS vendor data, I added lightweight Lambda functions to fetch and insert it into Redshift—no new pipeline architecture required. We considered other options, such as Snowflake, RDS, and BigQuery, but ultimately Redshift was the cheapest that met both requirements.

To set this up, I worked with the AI coding assistants to write all the infrastructure-as-code required to maintain the Redshift cluster, the Materialized Views, and the refresh schedule.

Agent YA for Slack-based answers

During our AI Hackathon week, I decided to build our first “AI Slack Bot” to help answer AWS cost-related questions. Reflecting on how I’d do it by hand, the first iteration of YA simply took the user’s questions, generated a SQL query, ran it, and returned the results.

YA was designed with:

  • The entire schema of mv_normalized_costs in its system prompt, which was about 40 different columns
  • Tracked conversation “memory” by recording down previous questions and answers injected into the prompt
  • An explainer agent that would interpret the SQL results and attempt to answer the question

Ask a question in Slack, get an instant breakdown of your top AWS RDS cost drivers.

Initially, YA did some things very well:

  • It generated pretty complex SQL queries, such as “month-over-month change”
  • It generated SQL much faster than I could write it

But it also had some drawbacks:

  • If users weren’t familiar with the schema, it would generally fail to generate the correct query, which was especially true for non-standard tables.
  • It would require the user to know exactly what values to ask for. For example, if the service was tagged foo_bar but the user asked for foobar, YA would simply return zero rows.
  • Later, v1.1 would add a “Clarification Step” to clarify the user’s questions before the SQL Generator retrieves the data.

Agent TY for cost anomalies

Cost anomalies were one of the most time-consuming parts of my day. I’d manually check dashboards and try to eyeball what looked off. If something looked fishy, I would have to spend hours digging through multiple data sources to figure out what changed and why. Agent TY was created to automate that entire loop.

TY was designed with:

  • A vw_cost_anomalies that sits on top of mv_normalized_costs to identify anomalous changes in spending
  • Multiple views similar to the dashboards I would use to research the issue, such as vw_top_usage_spend_by_service or vw_top_resource_spend_by_service

The agent would identify the anomalous service and try to determine which usage, resource, or other factor caused the anomaly. A report would be created in Slack, structured around five sections: Who, What, Where, When, and How.

AI catches a DynamoDB cost spike, explains what changed, and recommends what to look at next.

Agent YR for reservations analysis

Reservations and Savings Plans are one of the biggest levers for cost optimization, but tracking utilization, coverage, and expirations across multiple AWS services is tedious spreadsheet work. Agent YR was built to automate that analysis and surface actionable recommendations weekly.

Using the AWS SDK, YR pulls utilization and coverage data for Compute Savings Plans, ElastiCache, and RDS alongside existing reservation inventory. Then, it normalizes all instance data to a common unit (xlarge) so we can compare coverage across different instance sizes within the same family—without that normalization, a mix of large, 2xlarge, and 4xlarge instances makes apples-to-apples analysis impossible.

Each week, YR sends a Slack report with current coverage and utilization numbers, flags upcoming expirations, and recommends net new reservations.

AWS reservation health includes wasted spend, expiring reservations, and savings opportunities.

Iterating and improving

Agents v2

The v1 agents worked, but they didn’t scale. Each agent had its own bloated system prompt with baked-in schema definitions, and none could share data or tools. If I added a new table, I had to update every agent individually.

For v2, I refactored around a single idea: turn data access into shared tools rather than embedded knowledge.

  • I converted YA’s SQL Generator for mv_normalized_costs into a tool that all agents can use.
  • YA would turn into a single unified agent with a much smaller system prompt and many registered tools. I leave it to the unified agent to decide which tool to use and when.
  • Other tables would also be wrapped around their own tools, with definitions of their schemas and how to interpret the data.

This had several advantages:

  • Adding new tables and tools became much simpler. All I had to do was add a new tool with a description of the table’s schema.
  • It also allowed TY to use the data from mv_normalized_costs to do its work rather than relying on static views to research anomalies. I leave it to TY to call the SQL Generator tool as needed to triangulate the cause of the anomaly.
  • YR could now analyze current usage and identify specific resources that caused a drop in utilization or coverage. Later on, with the addition of the Datadog MCP, YR will even recommend migrating certain resources from one instance type to another.

In this example, Agent YA reminded me why I had a task to migrate our ElastiCache cluster from r6g. It was able to better recommend what I should do over the following months.

AI-powered ElastiCache analysis explains reservation utilization, renewal risks, and the reasons for creating migration tasks.

Data foundations v2

At this point, it was becoming increasingly difficult to maintain all the separate Lambda functions that would pull from various vendors. So I consolidated all processes into a single service called data-orchestrator:

  • This would handle all the complex logic for pulling data from multiple sources and recreating any necessary views.
  • I use AWS Step Functions to orchestrate the flow of the data-orchestrator. This allowed for parallel steps (e.g., fan out and collect data from multiple sources) as well as dependencies (e.g., create this view after all data has been collected).

What we’re building next for FinOps

My goal was to democratize data access for everyone, at any time. With these Agents, I was no longer the bottleneck for analysis or insights. This has freed up at least 50% of my time, allowing me to focus on high-leverage cost optimization.

Learnings

AI coding assistants let me build internal tools at a pace that wouldn’t have been possible two years ago. And because everything is in-house, iteration is fast; what used to take days now takes less than an hour. During the first couple of months, I was pushing out code changes daily.

The future

Right now, our agents can sense issues and make recommendations, but they still require manual changes from engineers. The next step is to move them toward action, enabling agents to pinpoint required changes, submit them as pull requests, and eventually detect and resolve their own errors.