A Domain-Driven Notification Microservice — Patterns From Production

Notifications start small. "Send the user an email when their order ships." A function. A library. Done.

A year later, you have email, SMS, push, in-app, Slack, and Microsoft Teams. You have user preferences per channel. You have quiet hours, batching, throttling, and a "do not disturb" mode. You have unsubscribe links and bounce handling. You have analytics on open rates and template-level metrics. You have multi-language templates and timezone-aware scheduling.

What started as a function is now a system. If you keep it as a sprawling collection of sendEmail and sendSlack helpers across your codebase, that system will eat your engineering team alive.

This is the shape of the notification microservice I built (and have rebuilt twice). The pattern isn't novel — it's domain-driven design applied to notifications — but the specifics matter.

The core insight

A notification has three distinct concerns:

What happened in the business — "order placed," "user mentioned," "invoice overdue." This is a domain event.
What kind of message to send — "transactional email," "high-urgency push," "Slack mention." This is a delivery policy.
How to actually send it — "render this template, then call the email provider's API." This is a channel adapter.

Most codebases collapse all three into one function:

async function sendOrderShippedEmail(orderId: string, userId: string) {
  const user = await getUser(userId)
  const order = await getOrder(orderId)
  const html = renderTemplate('order-shipped', { user, order })
  await sendgrid.send({ to: user.email, subject: 'Your order shipped', html })
}

This function knows about the domain event, the message type, and the channel. Three concerns. One function. Each one will change for different reasons, and changes will ripple across all the call sites.

The DDD-shaped version separates them.

Domain events

The business code emits an event. It doesn't know or care how the notification gets delivered.

// In your order service
await eventBus.publish({
  type: 'order.shipped',
  occurredAt: new Date().toISOString(),
  data: {
    orderId: order.id,
    userId: order.userId,
    trackingNumber: order.trackingNumber,
  },
})

This is a fire-and-record operation. The order service is done. It moves on. The event lands in your event bus (BullMQ, Kafka, NATS, whatever).

The notification service consumes events. Its job is to translate "order.shipped" into "a transactional email to the user with this template."

Notification preferences

The user's preferences live in a separate domain. They might be stored in the notification service's database or in a profile service — doesn't matter, as long as they're queryable:

type UserNotificationPreferences = {
  userId: string
  channels: {
    email: { enabled: boolean; address: string }
    sms: { enabled: boolean; number?: string }
    push: { enabled: boolean; deviceTokens: string[] }
    slack: { enabled: boolean; userId?: string }
  }
  perEventType: Record<string, {
    channels: ('email' | 'sms' | 'push' | 'slack')[]
    enabled: boolean
  }>
  quietHours: { start: string; end: string; tz: string } | null
}

When the notification service receives order.shipped, it looks up the user's preferences. The user has email enabled, SMS enabled, push enabled — but for this event type (order.shipped), they've only chosen email. So the service sends one email.

This decoupling is crucial. The business code emits one event. The notification service decides what to do with it based on user preferences. The user can change their preferences without anyone touching the business code.

The dispatcher

The middle layer is a dispatcher that:

Consumes an event.
Looks up the user's preferences.
Decides which channels to deliver on.
For each channel, builds a delivery job and queues it.

type Channel = 'email' | 'sms' | 'push' | 'slack'

class NotificationDispatcher {
  async handle(event: DomainEvent) {
    const userId = (event.data as any).userId
    if (!userId) return

    const prefs = await this.prefsRepo.findByUserId(userId)
    if (!prefs) return

    const eventPrefs = prefs.perEventType[event.type]
    if (!eventPrefs?.enabled) return

    const now = new Date()
    const channels = this.filterByQuietHours(eventPrefs.channels, prefs, now)

    for (const channel of channels) {
      await this.deliveryQueue.add('deliver', {
        eventType: event.type,
        userId,
        channel,
        eventData: event.data,
        scheduledAt: now.toISOString(),
      })
    }
  }

  private filterByQuietHours(
    requested: Channel[],
    prefs: UserNotificationPreferences,
    now: Date
  ): Channel[] {
    if (!prefs.quietHours) return requested
    const isQuiet = isWithinQuietHours(now, prefs.quietHours)
    if (!isQuiet) return requested
    // During quiet hours, only allow non-disruptive channels (e.g., email/in-app)
    return requested.filter(c => c === 'email' || c === 'in-app')
  }
}

The dispatcher is pure routing logic. It doesn't render templates. It doesn't call any provider. It just figures out which channels to deliver on and queues delivery jobs.

Channel adapters

Each channel is a separate worker that consumes jobs from the delivery queue and dispatches to its specific channel.

class EmailDeliveryWorker {
  async handle(job: DeliveryJob) {
    if (job.channel !== 'email') return

    const user = await this.userRepo.findById(job.userId)
    const prefs = await this.prefsRepo.findByUserId(job.userId)
    if (!user || !prefs?.channels.email.enabled) return

    const template = await this.templateRepo.find(job.eventType, 'email')
    const rendered = await this.renderer.render(template, {
      user,
      ...job.eventData,
    })

    await this.emailProvider.send({
      to: prefs.channels.email.address,
      from: rendered.from,
      subject: rendered.subject,
      html: rendered.html,
      text: rendered.text,
      headers: {
        'X-Event-Type': job.eventType,
        'X-User-Id': job.userId,
        'List-Unsubscribe': `<${this.unsubscribeUrl(job.userId, job.eventType)}>`,
      },
    })

    await this.deliveryLog.record({
      userId: job.userId,
      channel: 'email',
      eventType: job.eventType,
      sentAt: new Date().toISOString(),
      provider: this.emailProvider.name,
    })
  }
}

The email worker knows about:

The user (to get their address).
The template store (to find the right template for this event type and channel).
The renderer (to fill in the template).
The email provider (to actually send).

It doesn't know about Slack, SMS, push, or any business logic. If I want to add a new channel, I add a new worker. The dispatcher already knows how to route to it (the channel is just a string in the job).

Templates as first-class entities

A template store is its own small domain. Each template has:

An event type it's for (order.shipped).
A channel it's for (email, sms).
A language (en, de, fr).
A version (so changes are auditable).
The actual template content (HTML, plain text, Markdown).

type Template = {
  id: string
  eventType: string
  channel: Channel
  language: string
  version: number
  subjectTemplate?: string       // for email
  bodyTemplate: string
  format: 'html' | 'markdown' | 'plain'
  createdAt: string
  active: boolean
}

The renderer takes a template and a context object and produces a rendered message. Use a templating library that has a sandboxed mode (Handlebars's strict mode, for example) — you do not want template authors writing arbitrary JS that gets executed during rendering.

class Renderer {
  async render(template: Template, context: Record<string, any>): Promise<Rendered> {
    const compiled = Handlebars.compile(template.bodyTemplate, { strict: true })
    const body = compiled(context)
    const subject = template.subjectTemplate
      ? Handlebars.compile(template.subjectTemplate, { strict: true })(context)
      : undefined
    return { body, subject, format: template.format }
  }
}

Templates being stored in a database (not in code) means non-engineers can edit them. We had marketing folks editing email copy via an admin panel without ever touching a deploy.

The unsubscribe surface

Every notification should be unsubscribable. The dispatcher checks enabled flags before queuing, but the user needs a way to flip those flags. Two patterns:

A preferences page in your app. Standard. Each event type has a checkbox per channel.
A one-click unsubscribe link in every notification. Required by law in many jurisdictions for email marketing, and good UX everywhere.

The unsubscribe link encodes the user ID, the event type, and a signed token. Clicking it flips the preference:

function unsubscribeUrl(userId: string, eventType: string): string {
  const token = sign({ userId, eventType, action: 'unsubscribe' })
  return `https://app.example.com/api/notify/unsubscribe?token=${token}`
}

The endpoint verifies the token, updates the preference, and shows a "you're unsubscribed" page. The same endpoint can power the List-Unsubscribe header for email clients that support one-click unsubscribe.

Observability

Each delivery generates two records:

Delivery log. "We attempted to send X to user Y on channel Z at time T using provider P."
Provider callback. "The provider says message M was delivered (or bounced, or opened, or clicked)."

Both feed into the same table, keyed by a message ID. The observability story collapses into "show me everything that happened for user Y this week," which is what support teams ask for.

Throttling and batching

Two problems show up at scale:

A user gets 50 notifications in 10 minutes because something noisy happened. You need to batch them into a digest.
A celebrity user's actions trigger 10,000 notifications to followers in a burst. You need to throttle.

The dispatcher is where this logic lives. Before queuing a delivery, check a sliding-window counter (Redis-backed):

async handle(event: DomainEvent) {
  // ...existing routing logic...

  for (const channel of channels) {
    const window = `notify:${userId}:${channel}:${eventType}`
    const count = await this.redis.incr(window)
    if (count === 1) await this.redis.expire(window, 60)  // 1-minute window

    if (count > MAX_PER_MINUTE) {
      // Throttled. Queue for batching instead.
      await this.batchQueue.add('batch', { userId, channel, eventType, eventData: event.data })
    } else {
      await this.deliveryQueue.add('deliver', { /* ... */ })
    }
  }
}

The batch worker runs periodically (every 5 minutes or whatever the digest schedule is), collects everything in the batch queue for a user, renders a "digest" template, and sends one combined notification.

What I'd warn the next team about

Don't put rendering in the dispatcher. Keep the dispatcher routing-only. The dispatcher decides which channels to use; the workers decide how to render and send. Mixing them couples your routing logic to your template engine in ways that hurt later.

Use job retries with caution. If the email provider returns a 503, retry. If it returns a 400 with "invalid recipient," do not retry — that's a permanent failure. Different error codes have different retry semantics; encode this in the worker.

Track template performance. Open rate per template per language is a real signal. Templates that score below 10% open rate are usually broken (bad subject line, bad timing, irrelevant content). Surface this in your admin UI.

Make the dead-letter queue visible. Failed deliveries should go somewhere humans can see them. We had three months of bounces piling up before anyone noticed; turned out a customer had typo'd their email and was getting nothing. A weekly dashboard of "delivery failures by user" caught it.

The takeaway

A notification microservice is one of the highest-leverage extractions you can do. The business code becomes simple (emit events). User preferences become a first-class concept. Templates become editable without deploys. New channels become new workers, not new branches in shared functions.

The pattern is more code than sendgrid.send(...). It's also the difference between "we have a notification feature" and "we have a notification platform." If you're shipping more than two notification types and you're tired of touching the same five functions every time product adds a new channel, this extraction pays for itself within a quarter.

推荐订阅源

DEV Community