惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

宝玉的分享
宝玉的分享
WordPress大学
WordPress大学
博客园 - 司徒正美
美团技术团队
酷 壳 – CoolShell
酷 壳 – CoolShell
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
小众软件
小众软件
量子位
阮一峰的网络日志
阮一峰的网络日志
Apple Machine Learning Research
Apple Machine Learning Research
有赞技术团队
有赞技术团队
博客园 - 【当耐特】
博客园 - Franky
Jina AI
Jina AI
人人都是产品经理
人人都是产品经理
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
T
Threat Research - Cisco Blogs
D
Darknet – Hacking Tools, Hacker News & Cyber Security
F
Fox-IT International blog
T
ThreatConnect
A
Arctic Wolf
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Last Week in AI
Last Week in AI
C
CERT Recently Published Vulnerability Notes
P
Palo Alto Networks Blog
李成银的技术随笔
Project Zero
Project Zero
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
F
Full Disclosure
H
Hacker News: Front Page
雷峰网
雷峰网
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
S
SegmentFault 最新的问题
S
Schneier on Security
T
Tor Project blog
博客园_首页
月光博客
月光博客
大猫的无限游戏
大猫的无限游戏
博客园 - 聂微东
S
Securelist
C
Comments on: Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Attack and Defense Labs
Attack and Defense Labs
IT之家
IT之家
博客园 - 叶小钗
J
Java Code Geeks
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events

DEV Community

Audit Logs: The Silent Guardian of Every Serious System BetAGracevI I Built a Post-Quantum Cryptographic Identity SDK for AI Agents — Here's Why It Needs to Exist Running Claude Code across multiple repos without losing context There Are Cameras in Every Room of My House. I Put Them There. Why your AI agent loops forever (and how to break the cycle) How does VuReact compile Vue 3's defineSlots() to React? Building a Privacy-First Resume Editor with Typst WASM and React One Soul, Any Model: Portable Memory for Open-Source Agents with .klickd From Pixels to Prescriptions: Building an Autonomous Healthcare Booking Agent with LangGraph MonoGame - A Game Engine for Those Who Love Reinventing the Wheel # Day 24: In Solana, Everything is an Account Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests Mastering Node.js HTTP Module: Build Servers, REST APIs, and Handle Requests RP2040 Wristwatch Tells Time With a Vintage VU Meter Needle observations about models / 2026, may From Video Transcripts to Source-Grounded AI Notes: A Practical Look at Notesnip AI Agent Dev Environment Guide — Real Experience from an AI Living Inside a Server How I Run 7 AI Models 24/7: Multi-Agent Architecture in Practice What exactly changes with the Claude Max plan? I Revived a Broken MLOps Platform — Now It's Self-Service, Policy-Guarded, and Operationally Credible OpenAI's $2M-tokens-for-equity YC deal, decoded Why DMX Infrastructure is Still Stuck in the 90s Agent Series (2): ReAct — The Most Important Agent Reasoning Paradigm Open Source Project (No.73): Sub2API - All-in-One Claude/OpenAI/Gemini Subscription-to-API Relay I Made the Wrong Bet on Event Streaming in Our Treasure Hunt Engine #ai #productivity #chatgpt #python Symbolic Constant Conundrum From Manual RAG to Real Retrieval — Embedding-Based RAG with NVIDIA NIM Building an outbound-only WebSocket bridge for local AI agents Our System's Sins in Ghana: Why We Had to Rethink Digital Product Sales Execution Governance, AI Drift, and the Security Paradox of Runtime Enforcement Differential Pair Impedance: Why USB and HDMI Routing Is a Geometry Problem Small AI database questions can become big scans Claude Code 2.1 Agent View & /goal: Autonomous Dev Guide 2026 Your AI database agent should not see every column Rust's Low-Latency Conquest: Why We Ditched C++ for a Treasure Hunt Engine Floating-point will quietly corrupt your emissions math, and 0.1 + 0.2 already warned you Autonomous Agents: what breaks first (and why that's the real product) [2026-05-23] Agent payments are the new cloud bill footgun ORA-00069 오류 원인과 해결 방법 완벽 가이드 How I Built a Local, Multimodal Gemma 4 Visual Regression & Patch Agent: Closed-Loop Validation, Canvas Pixel Diffing, and Reproducible Benchmarks Pressure-testing Ota on Supabase: from setup prose to executable repo readiness VPC CNI en EKS: cómo dejar de pagar nodos que no usás The Future of Text Analysis: Introducing TechnoHelps Semantic Engine I built a Chrome Extension that saves product images + context directly to Google Drive & Sheets 95+ browser-based dev tools that never touch a server Running Qwen 2.5 Coder 14B Locally in Cursor with Ollama From a 10,000-line OpenSearch export script to a log analysis tool Ghost Bugs Cost $40K: A Neural Debugging Postmortem SECPAC: A Lightweight CLI Tool to Password-Protect Your Environment Variables 🚀 PasteCheck v1.7 + v1.8 — Hints that tell you what to fix, and a nudge panel that tells you where to start 8 Real Ways Developers Make Money in 2026 (Ranked by Effort) I built a free AI-powered Git CLI that writes your commit messages for you sds-converter: Converting Safety Data Sheets to MHLW Standard JSON with Rust and LLMs OpenLiDARViewer: A Browser-Based LiDAR and Point-Cloud Viewer Local-First Browser Tools: What You Should Not Upload Online Why most freelancers undercharge (and the maths behind fixing it) We built a mahjong dangerous-tile predictor calibrated on 4.97M real hands Building a Chord Progression Generator in the Browser — Music Theory in JS, Sound via Web Audio API tutorial #10: 148 Opens, 0 Replies — How My Forge Cold Email v1 Completely Failed 9 in 10 Docker Compose files skip the basic security flags How to Forward Android SMS to Telegram Automatically I built the first security scanner for MCP servers — here's what I found Building an Interplanetary Quantum Logic Engine in Rust/Ovie From AI Code Generation to AI System Investigation I gave Gemini 3.5 Flash a CVE-fix PR to review. It found another bug in the same file. When I Realized We Were Throwing Away Half Our Engine's Potential TokenJuice and the 20-Minute Cron: Inside OpenHuman’s Aggressive Context-Harvesting Engine CodeDNA: AI Codebase Archaeologist Built with Gemma 4 Thinking Mode Building a semantic search API in Go with Meilisearch April 2026 DigitalOcean Tutorials: Inference Optimization and AI Infrastructure Looking for DTMF transceiver module Moving Beyond "Tribal Software": Why the Singularity Demands the Interplanetary Hybrid Human Use SVGIcons as a Claude Custom Connector to Find Icons Faster DMARC Is Now a Proper Internet Standard: What Changed in RFC 9989/9990/9991 OpenTelemetry Is Now a CNCF Graduate — and It's Coming for Your AI Stack OpenHuman Follows OpenClaw’s Rise, But With an Obsidian Brain O erro mais caro em programas Solana: PDA sem bump check Build a Live Flight Radar in a Single HTML File DuckDB 1.5.3 Adds Quack Client-Server, SQLite Gets Cypher Graph Extension Custom Copilot Agents: Building Domain-Expert AI Teammates with Skills, MCP Tools, and Custom Knowledge RTX 5090 Cooling, BeeLlama VRAM Opts, Resizable BAR Performance Gains This week in Cursor + .NET — 3 rules + 4 essays (week ending May 22, 2026) RAG Architecture with n8n + PostgreSQL (pgvector) + Ollama Gemma4 on AWS EC2 Keep Your Taste I Built chanprobe Because My Go Queues Were Invisible Building a Live Solana TPS Meter with OrbitFlare's TypeScript SDK Using Gemma 4 to Analyze Bitcoin’s Next 5, 15, and 60 Minutes Security news weekly round-up - 22nd May 2026 When Stress Disguises Itself as Rational Planning (Bite-size Article) A Domain-Driven Notification Microservice — Patterns From Production I Built KubeCrash: Learn Kubernetes by Diagnosing Real Incidents The Real-World Test: How Gemini’s New Interface Won Over My Wife and Mother-in-Law (Who Are Totally Non-Tech) Running a Full Multi-Stage Intrusion Simulation. Every Detection Fired. Spec sheets aren't capabilities: a Day-1 Gemma 4 eval on Telugu vision Design a Clean Form with Floating Labels in Bootstrap 5 Your MCP Server Is Probably Overprivileged - Here's a Scanner For It I built a free developer tools site that works entirely in your browser Maatru: An agentic Telugu literacy app for kids, built with Gemma 4
Open-source SDS tooling for Japanese MHLW compliance: the gap nobody filled
kent-tokyo · 2026-05-23 · via DEV Community

In March 2025, Japan's Ministry of Health, Labour and Welfare (MHLW) published a structured JSON schema for Safety Data Sheet data exchange. The schema covers roughly 200 deeply nested fields and is intended to standardize how SDS information moves between chemical management systems.

Most SDS tooling was not built for this.

What makes Japan's SDS requirements different

Japan's SDS requirements come from two laws: the Industrial Safety and Health Act (ISAH, 労働安全衛生法) and the Chemical Substances Control Law (化審法). Both mandate SDS for regulated chemicals, with format requirements governed by JIS Z 7253 — Japan's implementation of the UN Globally Harmonized System (GHS).

JIS Z 7253 follows the standard 16-section GHS structure. In principle, any GHS-compliant SDS satisfies the content requirements. What makes Japanese compliance distinct is a digital layer: the MHLW schema specifies how SDS content should be structured as machine-readable data, with field-level granularity that PDF documents cannot capture.

How GHS looks different by country

GHS uses a "building block" approach — each country adopts the elements it chooses. The result is that the same GHS-aligned document varies by jurisdiction:

Country/Region Standard GHS basis Notable difference
Japan JIS Z 7253:2019 GHS Rev. 6 MHLW digital schema; revised to GHS Rev. 9 in Dec 2025
United States OSHA HazCom 2012 GHS Rev. 3 Updated to GHS Rev. 7 in 2024
European Union CLP Regulation GHS-aligned Stricter on environmental hazards
China GB 13690-2009 GHS Rev. 4 equivalent Moving to GB 30000.1-2024 (GHS Rev. 8), mandatory from August 2025
Taiwan CNS 15030 GHS-aligned

Japan-specific regulatory fields

The MHLW schema includes fields with no equivalent in EU REACH or US OSHA HazCom formats. These are the main reason international SDS tooling does not cover the schema out of the box:

Law Example fields What they capture
Chemical Substances Control Law (化審法) CaSCL.ClassificationStatus, CaSCL.RegistrationNumber Regulatory classification and registration numbers under this law
Industrial Safety and Health Act (安衛法) ISHAct.PublicationOfName, ISHAct.Notification Name disclosure and notification obligations
Poisonous and Deleterious Substances Control Law ControlledSubstancesAct.Applicability Whether the substance is classified as poison, deleterious, or specific poison
PRTR Law Chemical release and transfer reporting obligations

Section 15 (Regulatory Information) is the most complex section in the schema — it contains separate subsections for each of these laws, each with its own field structure.

Why this matters now: the 2022 law revision

The MHLW published the schema in 2025, but the driver was a 2022 amendment to the Industrial Safety and Health Act. The amendment shifted Japan's chemical substance regulation from a prescriptive model (government designates specific hazardous substances) to an autonomous management model (companies assess and manage risk themselves).

The practical impact:

Enforcement date Change
April 2023 Shift to autonomous management model — all substances with confirmed GHS hazard classifications brought progressively into scope
April 2024 SDS must now specify concentration ranges numerically (not just qualitatively)
April 2025 Protective equipment mandatory for substances with skin/eye hazards
April 2027 Risk assessment obligations expand to all regulated substances

With risk assessment coverage expanding significantly, companies need to process SDS data faster and more accurately. Manual PDF entry does not scale. The JSON schema is the infrastructure layer for automating this.

Where existing tools stop

Commercial SDS platforms

The major SDS authoring platforms — Sphera, EcoOnline, Chemwatch, Verisk 3E — have broad international coverage. Japanese is typically a supported output language. What they do not provide, as far as I have found, is export to the MHLW JSON schema. They produce Word or PDF output in the correct section structure, which satisfies the document requirement but not the structured data exchange requirement.

Japanese-market products like SDS Meister and SmartSDS support MHLW JSON output, but their PDF-to-JSON conversion coverage is limited — they are primarily SDS authoring tools, not bulk conversion tools for incoming supplier documents.

Open-source options

Tool Language MHLW JSON PDF → JSON Approach
sds_parser Python No Yes Regex, per-manufacturer rules
tungsten Python No Yes Rule-based, English-only
sds-converter Rust Yes Yes LLM-based extraction

sds_parser and tungsten solve a different problem: extracting SDS data in English, for specific known manufacturer formats. Neither targets the MHLW schema.

The format inconsistency problem

Even within JIS Z 7253-compliant documents, format varies by manufacturer:

Source of variation Example
Section heading labels "2. 危険有害性の要約" (JIS Z 7253) vs "2. Hazard(s) identification" (OSHA HazCom) vs "第2部分 危险性概述" (GB/T 16483) — all mean the same thing
Section order The 16 sections can appear in any order the manufacturer chooses
Concentration notation "≥95%", "1〜5%", "約100%", "企業秘密" (trade secret) all need different handling
Language mixing Japanese SDS documents regularly contain English chemical names and CAS numbers

A rule-based parser must enumerate every variant. In practice, manufacturer-specific headings add another layer of variation on top of the standard differences.

The schema itself

Two properties of the MHLW schema are worth knowing before implementing against it.

Section 3 (composition) is the hardest part

Section 3 stores component information as a repeating array. Each component object has nested fields for chemical identity, concentration range, and hazard classification. The same data appears differently depending on whether the source document covers a pure substance, a mixture, or a trade secret formulation.

{
  "Composition": {
    "CompositionAndConcentration": [
      {
        "ChemicalIdentity": {
          "CASNumber": "64-17-5",
          "ISHActNotificationNumber": "2-396"
        },
        "ConcentrationRange": {
          "ConcentrationRangeFrom": 95.0,
          "ConcentrationRangeTo": 100.0,
          "ConcentrationRangeUnit": "%"
        },
        "TradeSecretFlag": false
      }
    ]
  }
}

Enter fullscreen mode Exit fullscreen mode

Typos locked into v1.0

The schema contains field name errors that are now part of the specification:

HumanExposureAndEmergencyMeasuress  ← trailing double-s
TestGuidline                        ← missing 'e' (not Guideline)
Desclaimer                          ← transposed letters (not Disclaimer)
gazetteNo                           ← lowercase first character

Enter fullscreen mode Exit fullscreen mode

Correcting these would break all existing implementations, so they cannot be fixed in v1.0. An implementation that normalizes these to standard English spellings will fail schema validation.

sds-converter

I built sds-converter to address the MHLW schema gap. It handles both directions: PDF/DOCX/XLSX to MHLW JSON, and MHLW JSON to a JIS Z 7253-compliant Word document.

The core approach: rather than enumerating format variants with rules, the tool passes raw section text and the corresponding MHLW schema fields to an LLM and asks it to map values. The LLM handles heading label variation naturally. The output is validated against the schema before writing.

cargo install sds-converter

# PDF → MHLW JSON
sds-converter to-json --input input.pdf --output output.json

# MHLW JSON → JIS Z 7253 Word document
sds-converter to-docx --input output.json --output result.docx --lang ja

Enter fullscreen mode Exit fullscreen mode

The LLM backend is pluggable — Claude, GPT, Gemini, Mistral, Groq, or local models via Ollama. A --quality flag adjusts cost versus accuracy for batch workloads.

Known limitations:

Issue Status
Scanned PDFs without a text layer Not supported — requires upstream OCR
Section 3 tables with merged cells Extraction sometimes fails on complex DOCX layouts
Precision fields mixed with "not measured" entries Occasional type errors in Section 9 output

These are open problems, not design decisions.

The open gap

The MHLW schema represents a real need for anyone handling chemical compliance in Japan at volume. Commercial tools cover the authoring side; the bulk conversion of incoming supplier PDFs to structured data has no open-source solution targeting this schema — other than sds-converter, which I developed and which is the only implementation I am aware of.

The repository is open. Contributions on the extraction side — particularly Section 3 table handling — are welcome. If you work in cheminformatics or chemical compliance and have approached the MHLW compliance problem differently, I would be interested to hear it.