惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

WordPress大学
WordPress大学
L
LINUX DO - 热门话题
D
DataBreaches.Net
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
P
Proofpoint News Feed
The Register - Security
The Register - Security
N
Netflix TechBlog - Medium
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
博客园 - 司徒正美
J
Java Code Geeks
Engineering at Meta
Engineering at Meta
Y
Y Combinator Blog
月光博客
月光博客
Hugging Face - Blog
Hugging Face - Blog
Google DeepMind News
Google DeepMind News
Vercel News
Vercel News
M
MIT News - Artificial intelligence
The Cloudflare Blog
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
V
V2EX
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Threatpost
I
Intezer
Recent Announcements
Recent Announcements
博客园 - 【当耐特】
Google DeepMind News
Google DeepMind News
H
Hackread – Cybersecurity News, Data Breaches, AI and More
N
News and Events Feed by Topic
L
Lohrmann on Cybersecurity
小众软件
小众软件
雷峰网
雷峰网
L
LINUX DO - 最新话题
Application and Cybersecurity Blog
Application and Cybersecurity Blog
aimingoo的专栏
aimingoo的专栏
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
博客园 - 叶小钗
P
Privacy & Cybersecurity Law Blog
博客园 - Franky
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
F
Fortinet All Blogs
阮一峰的网络日志
阮一峰的网络日志
T
The Exploit Database - CXSecurity.com
G
Google Developers Blog
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
P
Privacy International News Feed
V
Vulnerabilities – Threatpost
Scott Helme
Scott Helme
Last Week in AI
Last Week in AI

Recent Commits to openclaw:main

test: merge chat side-result checks · openclaw/openclaw@ddd2c2a test: merge cron history checks · openclaw/openclaw@f7eb746 test: merge responsive navigation shell checks · openclaw/openclaw@c2e4b47 docs(changelog): add codex oauth fixes · openclaw/openclaw@628e6cd test: merge navigation routing cases · openclaw/openclaw@5d8cecb Tests: mock channel registry bundled fallback · openclaw/openclaw@2b08233 Secrets: avoid broad web search discovery for single plugin config · openclaw/openclaw@a464f59 test: merge config view browser checks · openclaw/openclaw@20cf511 fix(status): align oauth health with runtime · openclaw/openclaw@eed7116 feat: add macOS screen snapshots for monitor preview (#67954) thanks … · openclaw/openclaw@f377db1 fix: report shared auth scopes in hello-ok (#67810) thanks @BunsDev · openclaw/openclaw@0b6c39b Auto-reply: avoid eager bundled route fallback · openclaw/openclaw@3ea1bf4 Tests: narrow session binding contract setup · openclaw/openclaw@54e4e16 fix(macOS): enable undo/redo in webchat composer text input (#34962) · openclaw/openclaw@00951dc Tests: speed up channel setup promotion · openclaw/openclaw@82b529a Docs: refresh agent instructions · openclaw/openclaw@5775fe2 fix(auth): serialize OAuth refresh across agents to fix #26322 (#67876) · openclaw/openclaw@8e79080 test: allow ollama public surface boundary test · openclaw/openclaw@7d4f1a6 Docs: add test performance guardrails · openclaw/openclaw@89706d3 Tests: restore context-engine usage proof · openclaw/openclaw@e4c4f95 Tests: slim context engine runtime coverage · openclaw/openclaw@74c198f ci: retry failed custom checkouts · openclaw/openclaw@0ee5baf test: trim duplicate provider auth onboarding cases · openclaw/openclaw@1ffc02e matrix: fix sessions_spawn --thread subagent session spawning (#67643) · openclaw/openclaw@1ce2596 test: reduce auth choice fixture churn · openclaw/openclaw@857b9cd test: mock health status config boundaries · openclaw/openclaw@9d5ab4a test: mock onboard config io boundary · openclaw/openclaw@299694d test: mock legacy state plugin boundaries · openclaw/openclaw@2713089 test: mock channel install boundaries · openclaw/openclaw@b945248 test: mock doctor preview channel boundaries · openclaw/openclaw@b1a3ad4 test: trim doctor command hotspots · openclaw/openclaw@c66f16a test: isolate agent auth and spawn hotspots · openclaw/openclaw@9285935 test: stabilize MCP startup disposal race · openclaw/openclaw@dd9d2eb test: merge browser contract server suites · openclaw/openclaw@5817a76 test: narrow ollama provider discovery setup · openclaw/openclaw@a0d9598 build: declare qa-lab aimock runtime dependency · openclaw/openclaw@24431e5 test: speed up safe-bins exec harness · openclaw/openclaw@ee856ab test: preserve tool helpers in embedded runner mocks · openclaw/openclaw@acd86a0 refactor: move memory embeddings into provider plugins · openclaw/openclaw@77e6e4c test: reuse system-run temp fixtures · openclaw/openclaw@7e9ff0f test: trim hotspot wait overhead · openclaw/openclaw@12a59b0 Check: avoid duplicate boundary prep · openclaw/openclaw@baf11b8 test: reduce hotspot fixture overhead · openclaw/openclaw@3a59edd feat(ui): overhaul settings and slash command UX (#67819) thanks @Bun… · openclaw/openclaw@2cfb660 QA Matrix: exit cleanly on failure · openclaw/openclaw@42805d2 QA Matrix: isolate scenario coverage · openclaw/openclaw@7e659e1 Matrix: refresh crypto bootstrap state · openclaw/openclaw@94081d8 QA Lab: add provider registry · openclaw/openclaw@bb7e982 Matrix: add plugin changelog · openclaw/openclaw@4acab55 test: trim more hotspot overhead · openclaw/openclaw@f485311 test: trim remaining hotspot tests · openclaw/openclaw@6ba8626 test: narrow hotspot mocks · openclaw/openclaw@dbc8179 test: isolate gemini embedding request helpers · openclaw/openclaw@cd330f5 test: trim memory and mcp hotspots · openclaw/openclaw@fd48dfa test: slim provider registry mocks · openclaw/openclaw@2e08c77 test: harden Parallels update smoke · openclaw/openclaw@1a98090 feat: default Anthropic to Opus 4.7 · openclaw/openclaw@628b454 fix: harden node-host shell payload mutability checks · openclaw/openclaw@75c551e fix: land node-host approval binding for native binaries (#66731) (th… · openclaw/openclaw@29919bb CI: add daily schedule to CodeQL workflow (#67645) · openclaw/openclaw@69d25f5 fix(gateway): capture config hash after plugin auto-enable to prevent… · openclaw/openclaw@8c11210 fix: repair sanitized replay tool results before send (#67620) (thank… · openclaw/openclaw@c3c7a99 fix: restrict HTML timeout short-circuit to transient statuses · openclaw/openclaw@de129a6 fix: keep TUI watchdog bound to active run (#67401) (thanks @xantorres) · openclaw/openclaw@3525273 Gateway/skills: dedupe skills prefix-match + drop dead fallback on log · openclaw/openclaw@d7f489f Extensions/lmstudio: back off inference preload after consecutive fai… · openclaw/openclaw@b555214 TUI/streaming: add watchdog that resets the activity indicator after … · openclaw/openclaw@f44ab20 Agents/tool-loop: enable unknown-tool stream guard by default · openclaw/openclaw@36ed367 Gateway/skills: invalidate session skills snapshot on config write · openclaw/openclaw@b23d59a fix: classify HTML provider error pages correctly (#67642) (thanks @s… · openclaw/openclaw@e588e90 fix(skills): remove unused model-usage import (#67641) · openclaw/openclaw@55f05df docs(changelog): credit codex fix superseded PRs · openclaw/openclaw@e485f24 fix(openai-codex): normalize stale transport metadata in resolution a… · openclaw/openclaw@90801ba CI: pin Docker-related GitHub Actions (#67632) · openclaw/openclaw@f697b01 Android: modernize WebView and discovery API usage (#67627) · openclaw/openclaw@44a6e50 fix(deps): bump hono to 4.12.14 and @hono/node-server to 1.19.14 (GHS… · openclaw/openclaw@fbccc18 fix(deps): bump dompurify to 3.4.0 (#67614) · openclaw/openclaw@2c2dc00 CI: add explicit permissions to all workflow jobs (fixes code-scannin… · openclaw/openclaw@01b7516 fix: register bundled TTS providers and route overrides correctly (#6… · openclaw/openclaw@6ea3cdd fix: align host tilde paths with OS home (#62804) (thanks @stainlu) · openclaw/openclaw@ecfaf64 fix: flush creds queue before reconnect socket open (#67464) (thanks … · openclaw/openclaw@405c63f fix: strip standalone <function> tool call tags from visible text (#6… · openclaw/openclaw@78df859 fix(agents): preserve cli session metadata before transcript persist … · openclaw/openclaw@898fd04 docs(changelog): move cli transcript entry · openclaw/openclaw@c1817c6 fix(agents): normalize cli transcript api field · openclaw/openclaw@3a3fae0 docs(changelog): note cli transcript persistence · openclaw/openclaw@6c343f1 fix(agents): persist cli transcript turns · openclaw/openclaw@b8ef507 fix(msteams): harden security-sensitive flows (#65841) · openclaw/openclaw@c56b56e [Dashboard] Fix exec approval modal overflow for long command content… · openclaw/openclaw@053c5b0 Docs: remove QA changelog entry · openclaw/openclaw@7fd5771 QA: fix private runtime source loading (#67428) · openclaw/openclaw@d5933af docs(gateway): correct protocol.md schema path, hello-ok example, aut… · openclaw/openclaw@489404d CI: pin Node 22 runners to 22.18.0 · openclaw/openclaw@4ffa621 models.authStatus: normalize provider ids + tighten env-backed escape… · openclaw/openclaw@f2fdb9d Update CHANGELOG.md · openclaw/openclaw@7694a92 test(parallels): clean up npm update guard jobs · openclaw/openclaw@045ea7b Plugins: prefer scanDir override paths · openclaw/openclaw@b2974da fix(dreaming): default storage.mode to "separate" so phase blocks sto… · openclaw/openclaw@8c392f0 fix(memory-core): skip dreaming transcript ingestion via session stor… · openclaw/openclaw@a1b01f0 fix: dedupe replayed exec.finished node events (#67281) · openclaw/openclaw@5dcf526
fix(telegram): guard UTF-16 surrogate pairs in outbound chunkers (#93… · openclaw/openclaw@df87b40
Nas01010101 · 2026-06-17 · via Recent Commits to openclaw:main

File tree

  • packages/markdown-core/src

Original file line numberDiff line numberDiff line change

@@ -424,4 +424,47 @@ describe("markdownToTelegramHtml", () => {

424424

it("fails loudly when tag overhead leaves no room for text", () => {

425425

expect(() => splitTelegramHtmlChunks("<b><i><u>x</u></i></b>", 10)).toThrow(/tag overhead/i);

426426

});

427+
428+

it("does not split an astral char across the chunk boundary", () => {

429+

// Emoji surrogate pair straddles index 10 (limit): high at 9, low at 10.

430+

const input = `${"A".repeat(9)}😀${"B".repeat(20)}`;

431+

const chunks = splitTelegramHtmlChunks(input, 10);

432+

expect(chunks.length).toBeGreaterThan(1);

433+

expect(chunks.join("")).toBe(input);

434+

for (const chunk of chunks) {

435+

expect(containsLoneSurrogate(chunk)).toBe(false);

436+

}

437+

});

438+
439+

it("keeps an astral char whole when a positive limit starts on its pair", () => {

440+

expect(splitTelegramHtmlChunks("A😀B", 1)).toEqual(["A", "😀", "B"]);

441+

});

442+
443+

it("keeps astral chars whole in rendered Markdown chunks", () => {

444+

const chunks = markdownToTelegramChunks("A😀B", 1);

445+
446+

expect(chunks.map((chunk) => chunk.text)).toEqual(["A", "😀", "B"]);

447+

for (const chunk of chunks) {

448+

expect(containsLoneSurrogate(chunk.html)).toBe(false);

449+

expect(containsLoneSurrogate(chunk.text)).toBe(false);

450+

}

451+

});

427452

});

453+
454+

function containsLoneSurrogate(text: string): boolean {

455+

for (let index = 0; index < text.length; index += 1) {

456+

const code = text.charCodeAt(index);

457+

const isHigh = code >= 0xd800 && code <= 0xdbff;

458+

const isLow = code >= 0xdc00 && code <= 0xdfff;

459+

if (isHigh) {

460+

const next = text.charCodeAt(index + 1);

461+

if (!(next >= 0xdc00 && next <= 0xdfff)) {

462+

return true;

463+

}

464+

index += 1;

465+

} else if (isLow) {

466+

return true;

467+

}

468+

}

469+

return false;

470+

}

Original file line numberDiff line numberDiff line change

@@ -1070,11 +1070,30 @@ function findTelegramHtmlEntityEnd(text: string, start: number): number {

10701070

return text[index] === ";" ? index : -1;

10711071

}

10721072
1073+

// Never return a split index that lands between a UTF-16 surrogate pair, or

1074+

// both chunks would carry a lone surrogate that re-encodes to U+FFFD. If the

1075+

// pair starts the segment, keep it whole so chunking still advances.

1076+

function clampToSurrogateBoundary(text: string, index: number): number {

1077+

const high = text.charCodeAt(index - 1);

1078+

const low = text.charCodeAt(index);

1079+

const splitsPair =

1080+

index > 0 && high >= 0xd800 && high <= 0xdbff && low >= 0xdc00 && low <= 0xdfff;

1081+

if (!splitsPair) {

1082+

return index;

1083+

}

1084+

return index > 1 ? index - 1 : index + 1;

1085+

}

1086+
10731087

function findTelegramHtmlSafeSplitIndex(text: string, maxLength: number): number {

10741088

if (text.length <= maxLength) {

10751089

return text.length;

10761090

}

10771091

const normalizedMaxLength = Math.max(1, Math.floor(maxLength));

1092+

const splitIndex = findTelegramHtmlEntitySafeSplitIndex(text, normalizedMaxLength);

1093+

return clampToSurrogateBoundary(text, splitIndex);

1094+

}

1095+
1096+

function findTelegramHtmlEntitySafeSplitIndex(text: string, normalizedMaxLength: number): number {

10781097

const lastAmpersand = text.lastIndexOf("&", normalizedMaxLength - 1);

10791098

if (lastAmpersand === -1) {

10801099

return normalizedMaxLength;

Original file line numberDiff line numberDiff line change

@@ -0,0 +1,57 @@

1+

// Telegram tests cover plain-text chunk-splitting behavior.

2+

import { describe, expect, it } from "vitest";

3+

import { splitTelegramPlainTextChunksForTests } from "./send.js";

4+
5+

function containsLoneSurrogate(text: string): boolean {

6+

for (let index = 0; index < text.length; index += 1) {

7+

const code = text.charCodeAt(index);

8+

const isHigh = code >= 0xd800 && code <= 0xdbff;

9+

const isLow = code >= 0xdc00 && code <= 0xdfff;

10+

if (isHigh) {

11+

const next = text.charCodeAt(index + 1);

12+

if (!(next >= 0xdc00 && next <= 0xdfff)) {

13+

return true;

14+

}

15+

index += 1;

16+

} else if (isLow) {

17+

return true;

18+

}

19+

}

20+

return false;

21+

}

22+
23+

describe("splitTelegramPlainTextChunks", () => {

24+

it("does not split an astral char across the chunk boundary", () => {

25+

// Emoji surrogate pair straddles index 10 (limit): high at 9, low at 10.

26+

const input = `${"A".repeat(9)}😀${"B".repeat(20)}`;

27+

const chunks = splitTelegramPlainTextChunksForTests(input, 10);

28+

expect(chunks.length).toBeGreaterThan(1);

29+

expect(chunks.join("")).toBe(input);

30+

for (const chunk of chunks) {

31+

expect(containsLoneSurrogate(chunk)).toBe(false);

32+

}

33+

});

34+
35+

it("does not hang when limit=1 and text starts with an astral char", () => {

36+

// Regression: with limit=1 the clamp would return start (no advance),

37+

// causing the while-loop to spin forever. The surrogate pair must be

38+

// emitted as a unit (2 code units) so the loop always advances.

39+

const input = "😀X";

40+

const chunks = splitTelegramPlainTextChunksForTests(input, 1);

41+

expect(chunks.join("")).toBe(input);

42+

for (const chunk of chunks) {

43+

expect(containsLoneSurrogate(chunk)).toBe(false);

44+

}

45+

});

46+
47+

it("does not hang when limit=1 and an astral char appears mid-string at a chunk boundary", () => {

48+

// 'A' + emoji: with limit=1, second iteration starts at index 1 (high

49+

// surrogate) — same stall condition as above, now mid-string.

50+

const input = "A😀B";

51+

const chunks = splitTelegramPlainTextChunksForTests(input, 1);

52+

expect(chunks.join("")).toBe(input);

53+

for (const chunk of chunks) {

54+

expect(containsLoneSurrogate(chunk)).toBe(false);

55+

}

56+

});

57+

});

Original file line numberDiff line numberDiff line change

@@ -179,14 +179,40 @@ function resolveTelegramMessageIdOrThrow(

179179

throw new Error(`Telegram ${context} returned no message_id`);

180180

}

181181
182+

// Pull a chunk end back off a UTF-16 surrogate pair so neither chunk carries a

183+

// lone surrogate that re-encodes to U+FFFD. Mirrors the guard in

184+

// bot/native-quote.ts `truncateUtf16Safe`; shared by both plain-text splitters.

185+

//

186+

// `start` is the beginning of the current chunk — the return value is

187+

// guaranteed to be > start, so callers that loop on `start = end` always

188+

// advance. When clamping would land on `start` (i.e. the surrogate pair begins

189+

// exactly at `start`), we emit both surrogates together (end = start + 2)

190+

// rather than emitting a lone surrogate or stalling.

191+

function surrogateSafeChunkEnd(text: string, end: number, start: number): number {

192+

const high = text.charCodeAt(end - 1);

193+

const low = text.charCodeAt(end);

194+

const splitsPair = end > 0 && high >= 0xd800 && high <= 0xdbff && low >= 0xdc00 && low <= 0xdfff;

195+

if (!splitsPair) {

196+

return end;

197+

}

198+

const clamped = end - 1;

199+

// Guard: never return an index that would stall the loop. If clamped equals

200+

// start the surrogate pair's high unit is the very first char of this chunk;

201+

// emit both surrogates together instead of splitting or stalling.

202+

return clamped > start ? clamped : start + 2;

203+

}

204+
182205

function splitTelegramPlainTextChunks(text: string, limit: number): string[] {

183206

if (!text) {

184207

return [];

185208

}

186209

const normalizedLimit = Math.max(1, Math.floor(limit));

187210

const chunks: string[] = [];

188-

for (let start = 0; start < text.length; start += normalizedLimit) {

189-

chunks.push(text.slice(start, start + normalizedLimit));

211+

let start = 0;

212+

while (start < text.length) {

213+

const end = surrogateSafeChunkEnd(text, start + normalizedLimit, start);

214+

chunks.push(text.slice(start, end));

215+

start = end;

190216

}

191217

return chunks;

192218

}

@@ -209,12 +235,19 @@ function splitTelegramPlainTextFallback(text: string, chunkCount: number, limit:

209235

remainingChunks === 1

210236

? remainingChars

211237

: Math.min(normalizedLimit, Math.ceil(remainingChars / remainingChunks));

212-

chunks.push(text.slice(offset, offset + nextChunkLength));

213-

offset += nextChunkLength;

238+

const end = surrogateSafeChunkEnd(text, offset + nextChunkLength, offset);

239+

chunks.push(text.slice(offset, end));

240+

offset = end;

214241

}

215242

return chunks;

216243

}

217244
245+

// Test-only handle: the plain-text splitter is internal, but its surrogate-safe

246+

// chunk boundary needs direct behavior coverage.

247+

export function splitTelegramPlainTextChunksForTests(text: string, limit: number): string[] {

248+

return splitTelegramPlainTextChunks(text, limit);

249+

}

250+
218251

function logTelegramOutboundSendOk(params: TelegramOutboundSuccessLogParams): void {

219252

const parts = [

220253

"telegram outbound send ok",

Original file line numberDiff line numberDiff line change

@@ -43,6 +43,17 @@ describe("telegramPlugin outbound", () => {

4343

expect(telegramOutbound.chunker?.(text, 4000)).toEqual([text]);

4444

});

4545
46+

it("keeps astral characters whole at positive configured chunk limits", () => {

47+

clearTelegramRuntime();

48+
49+

expect(telegramOutbound.chunker?.("A😀B", 1)).toEqual(["A", "😀", "B"]);

50+

expect(telegramOutbound.chunker?.("A😀B", 1, { formatting: { parseMode: "HTML" } })).toEqual([

51+

"A",

52+

"😀",

53+

"B",

54+

]);

55+

});

56+
4657

it("preserves markdown tables for the configured delivery renderer", () => {

4758

clearTelegramRuntime();

4859

const text = ["| Name | Value |", "|------|-------|", "| A | 1 |"].join("\n");

Original file line numberDiff line numberDiff line change

@@ -42,6 +42,23 @@ function scanParenAwareBreakpoints(text: string): { lastNewline: number; lastWhi

4242

return { lastNewline, lastWhitespace };

4343

}

4444
45+

/**

46+

* Keeps UTF-16 chunk boundaries from separating a supplementary-plane character.

47+

* A one-unit positive limit still needs to emit an entire surrogate pair.

48+

*/

49+

export function avoidTrailingHighSurrogateBreak(text: string, start: number, end: number): number {

50+

if (

51+

end >= text.length ||

52+

text.charCodeAt(end - 1) < 0xd800 ||

53+

text.charCodeAt(end - 1) > 0xdbff ||

54+

text.charCodeAt(end) < 0xdc00 ||

55+

text.charCodeAt(end) > 0xdfff

56+

) {

57+

return end;

58+

}

59+

return end - 1 > start ? end - 1 : end + 1;

60+

}

61+
4562

/**

4663

* Splits plain text into size-bounded chunks at readable boundaries.

4764

*

@@ -66,7 +83,11 @@ export function chunkText(text: string, limit: number): string[] {

6683

// Prefer block boundaries, then spaces, then a hard size cut when no

6784

// readable breakpoint exists inside this window.

6885

const breakOffset = lastNewline > 0 ? lastNewline : lastWhitespace;

69-

const end = breakOffset > 0 ? cursor + breakOffset : windowEnd;

86+

const end = avoidTrailingHighSurrogateBreak(

87+

text,

88+

cursor,

89+

breakOffset > 0 ? cursor + breakOffset : windowEnd,

90+

);

7091

chunks.push(text.slice(cursor, end));

7192

cursor = end;

7293

while (cursor < text.length && /\s/.test(text[cursor] ?? "")) {

Original file line numberDiff line numberDiff line change

@@ -85,6 +85,28 @@ describe("renderMarkdownIRChunksWithinLimit", () => {

8585

expect(chunks.every((chunk) => chunk.rendered.length <= 1)).toBe(true);

8686

});

8787
88+

it("keeps astral characters whole when a positive limit reaches their pair", () => {

89+

const chunks = renderMarkdownIRChunksWithinLimit({

90+

ir: markdownToIR("A😀B"),

91+

limit: 1,

92+

renderChunk: (chunk) => chunk.text,

93+

measureRendered: (rendered) => rendered.length,

94+

});

95+
96+

expect(chunks.map((chunk) => chunk.source.text)).toEqual(["A", "😀", "B"]);

97+

});

98+
99+

it("keeps astral characters whole when rendered size requires a retry split", () => {

100+

const chunks = renderMarkdownIRChunksWithinLimit({

101+

ir: markdownToIR("A😀"),

102+

limit: 3,

103+

renderChunk: (chunk) => (chunk.text === "A😀" ? "too long" : chunk.text),

104+

measureRendered: (rendered) => rendered.length,

105+

});

106+
107+

expect(chunks.map((chunk) => chunk.source.text)).toEqual(["A", "😀"]);

108+

});

109+
88110

it("treats Infinity as no size cap and returns a single chunk", () => {

89111

const text = "one two three four five six seven eight nine ten";

90112

const ir = markdownToIR(text);

Original file line numberDiff line numberDiff line change

@@ -1,3 +1,4 @@

1+

import { avoidTrailingHighSurrogateBreak } from "./chunk-text.js";

12

// Markdown Core module implements render aware chunking behavior.

23

import {

34

chunkMarkdownIR,

@@ -127,10 +128,11 @@ function findLargestChunkTextLengthWithinRenderedLimit<TRendered>(

127128

// Rendered length is not guaranteed to be monotonic after escaping/link or

128129

// file-reference rewriting, so test exact candidates from longest to shortest.

129130

for (let candidateLength = currentTextLength - 1; candidateLength >= 1; candidateLength -= 1) {

130-

const candidate = sliceMarkdownIR(chunk, 0, candidateLength);

131+

const safeCandidateLength = avoidTrailingHighSurrogateBreak(chunk.text, 0, candidateLength);

132+

const candidate = sliceMarkdownIR(chunk, 0, safeCandidateLength);

131133

const rendered = options.renderChunk(candidate);

132134

if (options.measureRendered(rendered) <= renderedLimit) {

133-

return candidateLength;

135+

return safeCandidateLength;

134136

}

135137

}

136138

return 0;

@@ -215,7 +217,7 @@ function findMarkdownIRPreservedSplitIndex(text: string, start: number, limit: n

215217

if (lastAnyWhitespaceBreak > start) {

216218

return resolveWhitespaceBreak(lastAnyWhitespaceBreak, lastAnyWhitespaceRunStart);

217219

}

218-

return maxEnd;

220+

return avoidTrailingHighSurrogateBreak(text, start, maxEnd);

219221

}

220222
221223

function splitMarkdownIRPreserveWhitespace(ir: MarkdownIR, limit: number): MarkdownIR[] {

Original file line numberDiff line numberDiff line change

@@ -604,6 +604,10 @@ describe("chunkMarkdownTextWithMode", () => {

604604

expect(chunks.every((chunk) => !/[\uD800-\uDBFF]$/u.test(chunk))).toBe(true);

605605

expect(chunks.every((chunk) => !/^[\uDC00-\uDFFF]/u.test(chunk))).toBe(true);

606606

});

607+
608+

it("keeps an astral character whole when a positive hard limit starts on its pair", () => {

609+

expect(chunkMarkdownTextWithMode("A😀B", 1, "length")).toEqual(["A", "😀", "B"]);

610+

});

607611

});

608612
609613

describe("resolveChunkMode", () => {

Original file line numberDiff line numberDiff line change

@@ -16,7 +16,7 @@ export function avoidTrailingHighSurrogateBreak(text: string, start: number, end

1616

return end;

1717

}

1818

const adjusted = end - 1;

19-

return adjusted > start ? adjusted : end;

19+

return adjusted > start ? adjusted : end + 1;

2020

}

2121
2222

export function chunkTextByBreakResolver(