惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
GbyAI
GbyAI
博客园 - 三生石上(FineUI控件)
量子位
大猫的无限游戏
大猫的无限游戏
Last Week in AI
Last Week in AI
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
博客园 - 叶小钗
G
GRAHAM CLULEY
博客园 - Franky
V
Visual Studio Blog
SecWiki News
SecWiki News
E
Exploit-DB.com RSS Feed
The Hacker News
The Hacker News
K
Kaspersky official blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
T
Tor Project blog
W
WeLiveSecurity
S
Security Archives - TechRepublic
T
Tenable Blog
Apple Machine Learning Research
Apple Machine Learning Research
O
OpenAI News
阮一峰的网络日志
阮一峰的网络日志
小众软件
小众软件
博客园_首页
Jina AI
Jina AI
N
News | PayPal Newsroom
T
Troy Hunt's Blog
P
Privacy & Cybersecurity Law Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
人人都是产品经理
人人都是产品经理
Microsoft Azure Blog
Microsoft Azure Blog
Forbes - Security
Forbes - Security
T
Threatpost
Security Latest
Security Latest
www.infosecurity-magazine.com
www.infosecurity-magazine.com
The Register - Security
The Register - Security
T
Threat Research - Cisco Blogs
I
Intezer
博客园 - 聂微东
Recorded Future
Recorded Future
Attack and Defense Labs
Attack and Defense Labs
月光博客
月光博客
P
Privacy International News Feed
L
LangChain Blog
Spread Privacy
Spread Privacy
C
Cisco Blogs
酷 壳 – CoolShell
酷 壳 – CoolShell
D
Darknet – Hacking Tools, Hacker News & Cyber Security
Schneier on Security
Schneier on Security

Mastercard Dynamic Yield

Email, SMS and push done right: A marketing leader’s guide to channel selection How Valamar engages travelers earlier with real-time booking context Mastercard Dynamic Yield Recognized as a Leader in the 2026 Gartner® Magic Quadrant™ for Personalization Engines 2026 Personalization Maturity: Disruption Is Redefining E-Commerce Success Modern customer journey orchestration: Latest capabilities, best practices and omnichannel strategies — Mastercard Dynamic Yield Saks Fifth Avenue Elevated Luxury With AI Personalization 2025 Personalization Maturity Report for E-commerce - ES — Mastercard Dynamic Yield 2025 Personalization Maturity Report for E-commerce - PT — Mastercard Dynamic Yield How to Drive More Subscribers to Your Mailing List: Proven Strategies for MarketersMastercard Dynamic Yield Reconnect by Mastercard Dynamic Yield: Smarter Customer Journey Orchestration Send-Time Optimization — Mastercard Dynamic Yield Channel Prioritization — Mastercard Dynamic Yield Real-Time Adaptation and Dynamic Optimization — Mastercard Dynamic Yield Post-click Experiences — Mastercard Dynamic Yield Search Ranking Optimization — Mastercard Dynamic Yield Visual Search — Mastercard Dynamic Yield Semantic Search — Mastercard Dynamic Yield How Bergzeit Increased Conversions 3x with Conversational AI Email Deliverability Best Practices: Reach the Inbox. Deliver the Experience. The enterprise guide to IP warming: Boost deliverability, ensure compliance, and power seamless journeys Visual Search Meets Multimodal AI: A New Era of Product Discovery Where human ingenuity fits in the AI-driven marketing era Infographic: The state of personalization maturity in e-commerce - 2025 AI and Personalization Are Revolutionizing E-commerce Search Transform product discovery with Experience Search: AI that understands your shoppers AI Fuels New Demands for Personalization — Is E-Commerce Maturing Fast Enough? From Fragmentation to Connection: Mastering User Identification for Personalization — Mastercard Dynamic Yield 2026 Personalization Maturity Report for E-commerce - PDF — Mastercard Dynamic Yield Add To Cart Recommendation Modal — Mastercard Dynamic Yield Shoppable Video Notification — Mastercard Dynamic Yield Dynamic Yield by Mastercard Recognized as a Leader by Gartner® and Forrester Leroy Merlin Gains 32% Purchases with ML Recommendations Conversational Commerce: Your Guide to This Market-Shifting Technology Your Global Test Could Be Limiting Your Personalization Growth — Mastercard Dynamic Yield Personalize with Empathy to Meet Evolving Customer Needs The Resource Constraints Blocking Banks’ Personalization Gain Steering by Data: How to Avoid Assumptions and Motivate Your Team — Mastercard Dynamic Yield AI and personalization can close the empathy gap between brands and their customers A Leader in the Gartner Magic Quadrant for Personalization - Dynamic Yield Black Friday Is Coming—Is Your Personalization Strategy Airtight? Personalization Blueprint Survey - Dynamic Yield by Mastercard How Personalization Fuels Success in Latin America's Digital Boom Signet Jewelers Sees 88% Conversion Lift from Personalization Solving Data Issues for Financial Services with Personalization — Mastercard Dynamic Yield How to Executive Reporting Can Help You Grow Your Personalization Program Breaking the personalization barrier for banks Bring the personal back to shopping this holiday season​ with Shopping Muse Dynamic Yield makes Personalization a Breeze for Issuer Dynamic Yield by Mastercard Is Making Personalization a Breeze for Banks How to Deliver a Less Frustrating Online Shopping Experience VIDEO: Banking's Personalization Revolution: Data-Driven Transformation Bunnings' Buyer Center Casas Bahia's Buyer Center Magalu's Buyer Center Carrefour's Buyer Center 3 Tips to Integrate GenerativeAI into Your Personalization Workflow — Mastercard Dynamic Yield TUI Cruises Sees 10.3% Uplift in Add to Cart from Personalization The Revenue Gains From Personalization That FIs Can’t Ignore Calling All UK Banks: Personalisation Is Crucial to Meeting the New Consumer Duty Mandate What Marketers Miss in the GenAI Discussion vidaXL's Buyer Center The 2 Breakthrough Technologies Driving Smarter Product Recommendations Fashion Retailers: Your Product Feed Needs Spring Cleaning, Too — Mastercard Dynamic Yield Tommy Hilfiger's Buyer Center G-Star Raw's Buyer Center Hunkemöller's Buyer Center Here's Why Your Customers Are Tuning You Out Intersport's Buyer Center How AI Is Ushering in the Future of Interactive Commerce Mastering Channel Prioritization: How to Optimize Re-Engagement with a Winning Strategy Clark's Buyer Center Optimized messaging for purchase completion Affinity-powered triggered messages - personalization use cases Anticipate customer's next best item - personalization use cases Charlotte Tilbury's Buyer Center Rituals' Buyer Center The Dynamic Duo of A/B Testing and Personalization Müller's Buyer Center Next's Buyer Center La Redoute's Buyer Center Why Gen Z Craves Personalized Restaurant Experiences The human advantage in the age of AI and personalization Sky Personalizes Subscription Management for Millions On Leverages Personalization to Build Community Build-A-Bear Workshop's Buyer Center Oak Furnitureland's Buyer Center Coach's Buyer Center The Perfect Match: Marry Your CMS and Personalization Systems for Customer Love 4 Signs You Need to Move Beyond Your ESP's Email Personalization Functionality Sainsbury's, meet Dynamic Yield Charles Tyrwhitt's Buyer Center Burberry's Buyer Center Personalization in QSR: The Possibilities You Didn’t Know Existed The State of Personalization Maturity in Grocery/CPG Chanel's Buyer Center Swarovski's Buyer Center Building the Right It: How “Pretotyping” Guides Product Decisions with Concrete Data The Power of a Primary Audience Strategy for Financial Services Similarity Badge — Mastercard Dynamic Yield How Deep Learning is Adding Predictive Personalization Prowess to User Affinity Profiling
Going for gold: The building blocks of effective open-source software
Elad Rosenheim · 2021-07-02 · via Mastercard Dynamic Yield

This post is part one of a two-part series brought to you by DY Labs, an initiative by members of our Product and R&D departments who have a passion for experimentation and building developer resources for the greater digital marketing and engineering communities. This series focuses on the cohort’s latest project, Funnel Rocket, which is a serverless query engine optimized for complex user-centric queries at scale. In part one, Elad Rosenheim, our VP of Technology, walks through his journey in the world of product and engineering and how his experiences in the field have pushed him and others to innovate and tackle challenges facing developers across industries in order to develop open-source solutions.

An earlier version of this post was previously published by Elad on Medium.

Over the last decade, we’ve experienced the rise and rise of open-source software for big data and big scale. That was indeed a big leap forward, yet I’ve become disillusioned with much of the surrounding hype. Rants aside, here’s where I think the underlying issues are, and most importantly – what we can do about it. It’s my personal take, so let’s start with my own personal story.

From the dot-com bubble to the open-source era 

I can trace my first real job back to around 1998. During that period, real servers were still made by companies like Sun, running Solaris on expensive proprietary metal. I recall catching a glimpse of a purchase order for a few Sun Ultra workstations to roughly around 2000. One line item, in particular, caught my eyes, as I could directly compare it to my PC at home: a cool $150 for a CD drive, worth about $230 today. It’s very probable that this drive came in Sun’s iconic purple, thus exuding its commanding authority over the pedestrian CD drive in your home, but still — it didn’t feel right. Savvy young developers started pushing for Linux, and people rushed to join the first wave of the dot-com bubble.

Fast forward to 2013, when I joined a tiny startup – Dynamic Yield – and started working on “web-scale” problems in earnest. Up to that point, scale (to me) usually meant tuning a beastly Java application server doing maybe 100 requests per second at most, each request typically hitting the relational database a dozen times within a transaction. Every single request used to mean something important that someone cared about. Now, data has become a clickstream from across the web: hardly transaction-worthy, but coming in at thousands of requests per second, and growing fast.

The founding team already had their first database (a single MySQL instance) brought to its knees on that traffic and moved on to Apache HBase for anything non-administrative. Before a year went by, I introduced Redis for fast access to user data and Elasticsearch for ad-hoc analytics to our growing spiderweb of an architecture. We were deep in NoSQL territory now. For batch jobs, custom Java processes and clunky Hadoop MapReduce jobs started making way for the new shiny tech of the day: Apache Spark. Kafka and stream processing frameworks would come next.

Much of that new wealth of tools for scale was directly inspired by Google’s whitepapers. These marked a big shift in how scale and resilience were tackled or at least thought about.

First, it was about GFS (the Google File System) and MapReduce, then BigTable. It pushed the wider tech community to think in terms of commodity hardware and inexpensive hard disks rather than purple hardware and RAID, relying on multiple replicas for high availability and distributing work. You realized hardware will fail and keep failing, and what you needed were tools that will happily skip over such road bumps with minimal fuss and grow in capacity as fast as you can throw new machines at ’em.

Not having to pay a hefty per-server license fee was a big plus. The concept of GFS inspired Apache HDFS, Google MapReduce’s principles were recast as Apache Hadoop M/R, and BigTable was remade as Apache HBase. Later on, Google’s Dremel (which you probably know today in its SaaS incarnation — BigQuery) inspired the Parquet file format and a new generation of distributed query engines.

To me, these were exciting times – being able to scale so much with open source, provisioning new servers in hours or minutes! The promises of elasticity, cost efficiency, and high availability were, to a large extent, realized — especially if you had been accustomed to waiting months for servers, for IT priests to install a pricey NAS, or for the procurement of more commercial middleware licenses.

I guess this goes against the grain of nostalgia permeating so much of the discourse. Usually, it sounds something like, “We did so much in the old days with so little memory and no-nonsense, hand-polished code! Oh, these kids and their multi-megabyte JS bundles will never know…” I like to compare that to a tip I’ve learned from parenting: There is always someone who has achieved so much more with far less but is simply eager to tell you all about it. 

The burden of complexity

It seems like you only get a look at the real, gnarly underbelly of a cluster at the worst times: during peak traffic periods, at nights, or on holidays.

There is a complexity to a cluster’s state management that just does not go away, and instead, is waiting to explode at some point. Here’s how it usually starts off, at least in my experience:

  1. Whether you’re managing a random-access database cluster, a batch processing, or a stream processing cluster, you get it to work initially and for a while, it seems to work!
  2. You don’t really need to know how it elects a master, how it allocates and deallocates shards, how precious metadata is kept in sync, or what ZooKeeper actually does in there.

At some point, you hit some unexpected threshold that’s really hard to predict because it’s related to your usage patterns, your specific load spikes, and things go awry. Sometimes, the system gets completely stuck, sometimes limping along and crying out loud in the log files. Often, it’s not a question of data size or volume as you might expect, but a more obscure limit being crossed. Your Elasticsearch cluster might be fine with 2,382 indices, but one day, you get to 2,407 and nodes start breaking, pulling the rest of them down with them to misery lane. I just made up these numbers —your metrics and thresholds are likely different, but you get the picture. 

In the best-case scenario, you solve the issue at hand that day, but often, the same issue will repeat. This is where you need to step back and give it time. Sometimes, it takes weeks to stabilize things or even months to hunt down a recurring issue. And over time, this becomes a time suck. Sometimes, multiple and seemingly unrelated incidents happen to all blow up over a short period of time, and the team experiences fatigue from fighting multiple fires simultaneously. Eventually, you’ve put enough work hours and thrown enough extra resources at the problem (i.e. supplying a few more servers, giving it more headroom) and it goes away, but deep down, you know it will return. As a head of R&D, I know this can be extremely draining, because whether the fire is at team A, B, or C — it’s ultimately your fire as well, every single time. 

Sometimes, people will respond to your troubles with how great and reliable their tooling is or was at company X: “We’re using this awesome database Y and have never had any significant issue!” or “You’re over-complicating it, just use Z like us and your problems will go away.” However, this can often be over-simplifying the core issue at hand, skirting around the real complexity and nuances of your end goal. And over time, I’ve found solace in focusing on tending to my roots, because choosing to neglect them is only sustainable for so long.

Let’s talk about elasticity

Operational complexity is an enduring burden, but I should also mention limited elasticity and its friend, high cost. While tools do get better in that respect, what may have been considered “elastic” in 2012 is simply not what I’d call elastic today.

Let’s take Apache Spark for example. There’s still a steady stream of people going into Big Data who think it will solve all their problems, but if you’ve spent significant time with it, you know that in order for your jobs to work well, you need to carefully adjust the amount of RAM vs. CPU used, you need to dive into config settings, and perhaps even tinker with Garbage Collection a bit. You need to analyze the “naive” DAG it generates for your code to find choke points, then modify the code so it’s less about the cleanest functional abstraction and more about actual efficiency. In our case, we also needed to override some classes. And that’s how this song goes: care-free big data processing by high-level abstractions rarely lives up to the hype.

The challenge goes further, however.

One idea that Apache Hadoop pushed for since its inception was the ability to build a nice big cluster of commodity servers in order to throw several jobs from different teams at different times at its resource manager (nowadays known as YARN), letting it figure out which resources each job needs and how to fit all these jobs nicely. The concept was to think about the capacity of the cluster as an aggregate of its resources: this much CPU, storage, memory. You would scale that out as necessary, while R&D teams kept churning out new jobs to submit. That idea pretty much made its way into Spark as well.

The problem is, one single cluster really doesn’t like juggling multiple jobs with very different needs in terms of computing, memory, and storage. What you get instead is a constant battle for resources, delays, and occasional failures. Your cluster will typically be in one of two modes:

  1. Over-provisioned to have room for extra work (read: idle resources that you pay for), or
  2. Under-provisioned, leaving you wishing it was easy or quick to scale depending on how well jobs progress right at this moment (I’m not saying it’s not possible, but it’s definitely not easy, quick or out-of-the-box)

Web servers usually “have one job” and give you the ability to measure their maximum capacity as X requests per second. They rely on databases and backend servers, but they don’t need to know each other, let alone pass huge chunks of data between them over the network or thrash the disks. Data processing clusters do all these different things, concurrently, and never exactly at the same exact moment, which may cause issues. If you run into such problems, you’ll probably start managing multiple clusters, each configured to its needs and with its own headroom. However, provisioning these multiple clusters takes precious time, and operating them is still a hassle. Even a single job in isolation has multiple stages which stress different resource types (CPU, memory, disk, network), which makes optimal resource planning a challenge, even with multiple dedicated clusters.

You could get some of that work off your shoulders if you go for a managed service, but the “management tax” can easily cost outlandish amounts of money as you scale. If you’re already paying, say, $1 million a year for self-managed resources on the cloud, you have to ask yourself: Are we willing to pay 1.5x-2x that price tag to receive a managed-to-an-extent service?

New technologies (e.g. k8s operators) do provide us with faster provisioning and better resource utilization. There is one thing they cannot solve for, however: precious engineering time spent to thoroughly profile and tune misbehaving components, which in turn makes it very tempting to throw ever more compute resources at the problem. Over time, these inefficiencies accumulate, and as the organization grows, you’ll rack up a pretty significant bill to manage the entire system without an efficient solution. 

Gradual improvements over time

Once your Spark cluster is properly set-up and running well, it can output huge amounts of data quickly when it reaches the result-writing stage. If the cluster is running multiple jobs, these writes come in periodic bursts.

Now, assume you want these results written into external systems, e.g. SQL databases, Redis, Elasticsearch, Cassandra. It’s all too easy for Spark to overwhelm or significantly impair any database with these big writes – I’ve even seen it break a cluster’s internal replication, which is a nightmare scenario.

You can’t really expect Elasticsearch to grow from, say, 200 cores to 1,000 for exactly the duration where you need to index things in bulk, then shrink back to 200 immediately afterwards. Instead, there are various things you can do:

  • Aggressively throttle down the output from Spark — i.e. spend money being near-idle on the Spark side
  • Write and manage a different component to read Spark’s output and perform the indexing (meaning dev time and operations)
  • Over-provision Elasticsearch (= more money)

In other words, not really the elasticity I was hoping for.

Over time, patterns have evolved to alleviate some of these pain: Apache Kafka is frequently used as the “great decoupler”, allowing multiple receiving ends to consume data at their own pace, go down for maintenance, etc. Kafka, though, is another expensive piece of the puzzle that is definitely not as easy to scale or as resilient as the initial hype had me believing.

On the storage front, there have been improvements as well: instead of using HDFS, we switched to S3 for all batch outputs so we don’t need to worry about HDD size or filesystem metadata processes. That means giving up on the once-touted idea of “data locality” which, in hindsight, was a big mismatch. Storage size tends to go only one way: up. At the same time, you want the compute to be as elastic as possible, utilizing as much or as little as you need right now. Marrying both was always quite clumsy, but fortunately, AWS got its act together over time, improved intra-datacenter network performance considerably, and finally sorted out strong consistency in S3 (oh, the pain, the workarounds!). That brought it in line with Google Cloud on these points, making the storage-compute decoupling viable on AWS as well.

That last note is important: the building blocks offered by a cloud provider may encourage a good architecture (or push you away from one).

Don’t forget to create a dev wishlist

This isn’t an article in the style of “Stop using X, just use Y.” I believe you can create a wildly successful product (or fail spectacularly) with any popular technology. The question I tend to focus on instead is: “What constructs do we need to make systems easier, faster, and cheaper?” Here’s my partial wishlist:

  • I want to launch jobs with better isolation from each other – each getting the resources it needs to run unhindered, rather than needing to fight over resources from a very limited pool, avoiding any possible domino effect from a single job derailing.
  • I want the needed resources to be (nearly) instantly available
  • I only want to pay for the actual work done, from the millisecond my actual code started running to the millisecond it ended.
  • I want to push the complexity of orchestrating hardware and software to battle-tested components whose focus is exactly that rather than any applicative logic. Using these lower-level constructs, I could build higher-level orchestration that is way more straightforward. Simple (rather than easy) is robust.
  • I want jobs to run in multiple modes: it could be “serverless” for fast, easy scaling of bursty short-lived work and interactive user requests at the expense of a higher price per compute unit/second. Or, it could be utilizing spot / preemptive instances — somewhat slower to launch and scale, but very cheap for scheduled bulk workloads.

I’m not inventing any new concepts here. The building blocks are essentially readily available these days. The open-source software to take advantage of all these to the max, however, is not so available. To demonstrate what I mean, let’s tackle a real challenge guided by these principles. In part two of this series, I will dive into Funnel Rocket, an open-source query engine that is my attempt at building a solution. The goal was to build something to solve a very specific pain point we needed to address here, at Dynamic Yield, but over time, as we’ve spent more time working on it, I’ve come to realize how it can also become a testbed for so much more. So now, let’s move on to part two.

funnel rocket