惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
Tor Project blog
B
Blog RSS Feed
M
MIT News - Artificial intelligence
WordPress大学
WordPress大学
H
Hackread – Cybersecurity News, Data Breaches, AI and More
罗磊的独立博客
GbyAI
GbyAI
N
Netflix TechBlog - Medium
博客园 - 司徒正美
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
宝玉的分享
宝玉的分享
W
WeLiveSecurity
Stack Overflow Blog
Stack Overflow Blog
Y
Y Combinator Blog
SecWiki News
SecWiki News
V
Vulnerabilities – Threatpost
Google DeepMind News
Google DeepMind News
C
CERT Recently Published Vulnerability Notes
T
Tailwind CSS Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
Martin Fowler
Martin Fowler
A
About on SuperTechFans
S
Security @ Cisco Blogs
T
Tenable Blog
C
Check Point Blog
N
News and Events Feed by Topic
S
SegmentFault 最新的问题
The GitHub Blog
The GitHub Blog
C
Cyber Attacks, Cyber Crime and Cyber Security
Attack and Defense Labs
Attack and Defense Labs
美团技术团队
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
C
Cisco Blogs
P
Palo Alto Networks Blog
V
V2EX
博客园 - 聂微东
Project Zero
Project Zero
酷 壳 – CoolShell
酷 壳 – CoolShell
D
Docker
N
News | PayPal Newsroom
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
小众软件
小众软件
Application and Cybersecurity Blog
Application and Cybersecurity Blog
人人都是产品经理
人人都是产品经理
V2EX - 技术
V2EX - 技术
I
Intezer
L
LINUX DO - 最新话题

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Building EDIFlow - Infrastructure Layer: Parsers, Repositories & Data Packages (Part 4)
hello-ediflo · 2026-05-07 · via DEV Community

Series: Building EDIFlow - A Clean Architecture Journey in TypeScript (Part 4/6)
Reading Time: ~12 minutes


Recap — Where We Left Off

In Part 3, we built the Application Layer — Use Cases, Output Ports (interfaces), DTOs, and the UseCaseFactory. Everything depends on abstractions, nothing on implementations.

Now it's time for the Infrastructure Layer — where theory meets reality. This is where IMessageParser becomes EdifactMessageParser, where IMessageStructureRepository becomes FileBasedMessageStructureRepository, and where 126–319 JSON message definitions get loaded at runtime.

┌───────────────────────────────────────────┐
│  🔥 INFRASTRUCTURE LAYER                  │  ← You are here
│  Parsers · Builders · Repositories        │
│                                           │
│  ┌─────────────────────────────────────┐  │
│  │      Application (Use Cases, Ports) │  │
│  │  ┌───────────────────────────────┐  │  │
│  │  │      Domain (Entities)        │  │  │
│  │  └───────────────────────────────┘  │  │
│  └─────────────────────────────────────┘  │
└───────────────────────────────────────────┘

Enter fullscreen mode Exit fullscreen mode


Infrastructure in a Multi-Standard World — Why Three Packages?

In Clean Architecture, the Infrastructure Layer implements the interfaces defined by Domain and Application. But EDIFlow supports four standards (EDIFACT, X12, HIPAA, EANCOM) — so where does the infrastructure code live?

The answer: it's split across three infrastructure packages, each with a clear responsibility:

@ediflow/edifact              → EDIFACT-specific: parser, builder, validator, tokenizer
@ediflow/x12                  → X12-specific: parser, builder, delimiter detection
@ediflow/infrastructure-shared → Standard-agnostic: file loading, repositories, caching

Enter fullscreen mode Exit fullscreen mode

Why not one big infrastructure package?

Because EDIFACT parsing and X12 parsing share zero implementation code. The delimiters are different (+:.' vs *~>), the envelope structure is different (UNB/UNZ vs ISA/GS/ST), the escape rules are different. Putting them together would create a God-package with no cohesion.

Why infrastructure-shared?

This package was primarily created for the CLI tool. The CLI needs to load message definitions for ALL standards — EDIFACT, X12, HIPAA, EANCOM — from a single entry point. The FileBasedMessageStructureRepository doesn't care whether the JSON describes an EDIFACT ORDERS or an X12 850. It can't live in @ediflow/edifact (X12 would depend on it) or @ediflow/x12 (vice versa). So it lives in a shared infrastructure package — mainly consumed by the CLI, but available to anyone who needs file-based data loading regardless of standard.

The dependency graph:

@ediflow/core  ←──  @ediflow/edifact
       ↑                    
       ├─────  @ediflow/x12
       ↑
       └─────  @ediflow/infrastructure-shared  ←──  @ediflow/cli

Enter fullscreen mode Exit fullscreen mode

Every infrastructure package depends on core (for interfaces), never on each other. The CLI depends on all of them to wire everything together.

Now let's see what happens inside each package — starting with the parsing pipeline.


The Parsing Pipeline — Three Steps, Three Classes

Parsing an EDIFACT message isn't one operation — it's a pipeline:

Raw EDI String → Delimiter Detection → Tokenization → Segment Parsing → EDIMessage

Enter fullscreen mode Exit fullscreen mode

Each step is a separate class implementing a separate interface. Here's why, and here's the real code.

Step 1: Delimiter Detection

EDIFACT messages can define custom delimiters via the UNA service string. The first 9 characters tell you which characters are used for components, elements, escaping, and segment termination:

export class EdifactDelimiterDetector implements IDelimiterDetector {
  private static readonly UNA_PREFIX = 'UNA';
  private static readonly UNA_LENGTH = 9;

  detect(message: string): Delimiters {
    if (this.hasUNA(message)) {
      return this.extractFromUNA(message);
    }
    // No UNA? Use EDIFACT defaults: + : . ? '
    return EdifactDelimiterDetector.DEFAULT_DELIMITERS;
  }

  private extractFromUNA(message: string): Delimiters {
    return Delimiters.custom({
      component: message.charAt(3),  // Usually ':'
      element:   message.charAt(4),  // Usually '+'
      decimal:   message.charAt(5),  // Usually '.'
      escape:    message.charAt(6),  // Usually '?'
      segment:   message.charAt(8),  // Usually "'"
    });
  }
}

Enter fullscreen mode Exit fullscreen mode

This matters because real-world EDI partners sometimes use non-standard delimiters. Without this, your parser breaks on the first message from a partner who uses * instead of +.

Step 2: Tokenization

The tokenizer splits the raw string into segment strings, respecting escape characters:

export class EdifactTokenizer implements ITokenizer {
  tokenize(message: string, delimiters: Delimiters): string[] {
    const segments: string[] = [];
    let currentSegment = '';
    let position = 0;

    while (position < message.length) {
      const char = message[position];

      // Skip escaped characters (e.g., ?+ means literal +)
      if (this.isEscapedCharacter(message, position, delimiters)) {
        currentSegment += this.consumeEscapedCharacter(message, position);
        position += 2;
        continue;
      }

      // Segment terminator found — flush current segment
      if (char === delimiters.segment) {
        if (currentSegment.trim().length > 0) {
          segments.push(currentSegment);
        }
        currentSegment = '';
        position++;
        continue;
      }

      currentSegment += char;
      position++;
    }

    return segments;
  }
}

Enter fullscreen mode Exit fullscreen mode

Why a separate class? Because X12 tokenization works differently — segments end with ~, and there's no escape character. Same interface (ITokenizer), completely different implementation.

Step 3: The Message Parser — Orchestrating the Pipeline

The EdifactMessageParser ties everything together:

export class EdifactMessageParser implements IMessageParser {
  constructor(
    private readonly delimiterDetector: IDelimiterDetector,
    private readonly tokenizer: ITokenizer,
    private readonly segmentParser: EdifactSegmentParser
  ) {}

  parse(ediString: string, config?: ParserConfig): EDIMessage {
    this.validateMessage(ediString);

    const delimiters = config?.delimiters || this.delimiterDetector.detect(ediString);
    const segmentStrings = this.tokenizer.tokenize(ediString, delimiters);
    const segments = segmentStrings.map(s => this.segmentParser.parseSegment(s, delimiters));

    const unhSegment = segments.find(s => s.tag === 'UNH');
    const { version, messageType } = this.extractMetadata(unhSegment!, delimiters);

    const message = EDIMessageFactory.create({
      standard: Standard.EDIFACT,
      version,
      messageType
    });

    segments.forEach(segment => message.addSegment(segment));
    return message;
  }

  canParse(ediString: string): boolean {
    return ediString.includes('UNH');
  }
}

Enter fullscreen mode Exit fullscreen mode

Notice: the parser doesn't know tokenization internals. It delegates to ITokenizer and IDelimiterDetector. If we needed a streaming parser for huge messages, we'd swap the tokenizer — zero changes to the parser.


Building — The Reverse Pipeline

Building converts EDIMessage back to a raw string:

export class EdifactMessageBuilder implements IMessageBuilder {
  build(message: EDIMessage, options?: EdifactBuilderOptions): string {
    const delimiters = this.resolveDelimiters(options?.delimiters);
    const format = options?.format || OutputFormat.COMPACT;

    let result = '';
    if (options?.includeUNA) {
      result += delimiters.toUNA();  // "UNA:+.? '"
    }

    const segmentStrings = message.segments.map(seg =>
      this.serializeSegment(seg, delimiters)
    );

    return result + (format === OutputFormat.READABLE
      ? segmentStrings.join(delimiters.segment + '\n')
      : segmentStrings.join(delimiters.segment)) + delimiters.segment;
  }
}

Enter fullscreen mode Exit fullscreen mode

Same interface IMessageBuilder — the X12 builder uses * and ~ instead.


Validation — Builder Pattern for Format-Specific Rules

Validation is composable. The builder lets you pick which rules to apply:

export class EdifactValidationServiceBuilder {
  private service = new ComposableValidationService<EDIMessage>();

  withBasicRules(): this {
    this.service.addRule(new MessageMustHaveSegmentsRule());
    this.service.addRule(new VersionStandardMustMatchRule());
    return this;
  }

  withEDIFACTRules(): this {
    this.service.addRule(new UNBMustBeFirstRule());
    this.service.addRule(new UNZMustBeLastRule());
    this.service.addRule(new EDIFACTSegmentTagFormatRule());
    return this;
  }

  withCustomRule(rule: IValidationRule<EDIMessage>): this {
    this.service.addRule(rule);
    return this;
  }

  // Factory shorthand
  static forEDIFACT(): ComposableValidationService<EDIMessage> {
    return new EdifactValidationServiceBuilder()
      .withBasicRules()
      .withEDIFACTRules()
      .build();
  }
}

Enter fullscreen mode Exit fullscreen mode

Basic rules live in @ediflow/core (format-agnostic). EDIFACT rules live in @ediflow/edifact. X12 rules in @ediflow/x12. Each package only loads what it needs — tree-shaking friendly.


The Repository — Loading 126–319 Message Definitions at Runtime

This is where the data packages come in. Each package (@ediflow/edifact-d20b, @ediflow/x12-004010, ...) contains JSON files that define message structures:

packages/edifact-d20b/data/
  segments.json       # All segment definitions
  elements.json       # All element definitions
  composites.json     # Composite element definitions
  codes/              # Code list values
  messages/
    ORDERS.json       # ORDERS message structure
    INVOIC.json       # INVOIC structure
    DESADV.json       # ...195 message types total

Enter fullscreen mode Exit fullscreen mode

The FileBasedMessageStructureRepository implements IMessageStructureRepository:

export class FileBasedMessageStructureRepository implements IMessageStructureRepository {
  private contextCache = new Map<string, DataPackageContext>();

  constructor(private readonly basePath: string) {}

  async getMessageStructure(standard: string, version: string, messageType: string): Promise<MessageStructureDTO | null> {
    const messageFile = await this.loadMessageFile(standard, version, messageType);
    if (!messageFile) return null;

    const { builder, validator } = await this.getOrCreateContext(standard, version);

    // Validate data package integrity
    const issues = validator.validate(messageFile);
    if (issues.length > 0) {
      throw new DataPackageValidationError(standard, version, messageType, issues);
    }

    return builder.build(messageFile);
  }
}

Enter fullscreen mode Exit fullscreen mode

Key design decisions:

  1. Lazy loading — segments, elements, composites are loaded on first access per version, then cached
  2. Validation — every message file is validated against the data package (segment references, element references)
  3. Package aliases — HIPAA maps to hipaa-x12-005010, EANCOM maps to eancom-2002
  4. Qualifier fallback — HIPAA uses files like 837-Q1.json instead of 837.json

Monorepo Structure — Why 13 Packages?

packages/
  core/                   # Domain + Application (pure, no parsers)
  edifact/                # EDIFACT parser & builder
  x12/                    # X12 parser & validator
  infrastructure-shared/  # FileBasedRepository, loaders, caching
  cli/                    # CLI tool (4 commands)
  edifact-d96a/           # 126 EDIFACT D.96A message definitions
  edifact-d01b/           # EDIFACT D.01B definitions
  edifact-d12a/           # EDIFACT D.12A definitions
  edifact-d20b/           # 195 EDIFACT D.20B definitions
  eancom-2002/            # 50 GS1 retail messages
  x12-004010/             # 293 X12 transaction sets
  x12-006040/             # 319 X12 transaction sets
  hipaa-x12-005010/       # 14 HIPAA transaction sets

Enter fullscreen mode Exit fullscreen mode

We already covered why the infrastructure is split into edifact, x12, and infrastructure-shared above. The data packages follow the same principle: install only what you need. A user working with X12 004010 shouldn't download 195 EDIFACT D.20B definitions. Each data package is independent — small, focused, and npm install on its own.


Lessons Learned

✅ Pipeline pattern for parsing — splitting into delimiter detection, tokenization, and segment parsing made each piece testable in isolation. When we added X12 support, we reused the pattern with different implementations.

✅ Data packages as separate npm packages — users install only what they need. Keeps bundle sizes small.

✅ Repository pattern with lazy loading — loading segments.json + elements.json + composites.json was expensive (~50ms). Caching per version eliminates this on subsequent calls.

✅ Builder pattern for validation — format-specific rules stay in format packages. Core remains agnostic. Adding HIPAA-specific rules? Create a new builder, compose with existing rules.

⚠️ Package aliases — HIPAA and EANCOM don't follow the {standard}-{version} naming convention. The alias map works but isn't elegant. Lesson: decide on naming conventions early.


What's Next — Part 5: Presentation Layer (CLI)

In Part 5, we'll see how the CLI ties everything together — the DI container that wires parsers + repositories + use cases, and how parse, validate, build, and export-schema commands work.

All already built and running in production. Part 5 walks through the real code.

Part 1: Why Clean Architecture?
Part 2: Domain Layer
Part 3: Application Layer
GitHub: @ediflow/core

⭐ If this series is useful — a star on GitHub helps others find it: github.com/ediflow-lib/core


How do you structure data packages in your monorepos? One giant package or many small ones? Drop a comment.