URL Regex Validation: what can go wrong?

NodeJS Security & NodeJS Secure Coding's Blog

2024-09-08 · via NodeJS Security & NodeJS Secure Coding's Blog

Many developers might rely on regular expressions to validate URLs they receive as user input from forms or other APIs, but this approach can lead to produce bad URL regex patterns that result in security vulnerabilities.

Surely it’s common to use regex to validate URLs such as users submitting their social media profiles, or providing images via URL links but let’s deep dive into a CVE that was identified for an npm library that was using a regex pattern to validate URLs and learn from this mistake.

URL Regex Validation

The node-forge npm package gets almost 20 million downloads a week and provides a native implementation for TLS and other cryptographic functions in JavaScript.

This npm package also includes a utility parseUrl function, as follows (in the vulnerable code version):

util.parseUrl = function(str) {
  var regex = /^(https?):\/\/([^:&^\/]*):?(\d*)(.*)$/g;
  regex.lastIndex = 0;
  var m = regex.exec(str);
  var url = (m === null) ? null : {
    full: str,
    scheme: m[1],
    host: m[2],
    port: m[3],
    path: m[4]
  };
  if(url) {
    url.fullHost = url.host;
    if(url.port) {
      if(url.port !== 80 && url.scheme === 'http') {
        url.fullHost += ':' + url.port;
      } else if(url.port !== 443 && url.scheme === 'https') {
        url.fullHost += ':' + url.port;
      }
    } else if(url.scheme === 'http') {
      url.port = 80;
    } else if(url.scheme === 'https') {
      url.port = 443;
    }
    url.full = url.scheme + '://' + url.fullHost;
  }
  return url;
};

What could go wrong with this URL regex validation code?

The Vulnerability: CVE-2022-0122

The function from the node-forge packaged called parseUrl makes use of a regex pattern that is vulnerable to a URL redirection. If an attacker provides a URL with a backslash after the protocol, such as https:/\/\/\, the parseUrl function will interpret the URL as a relative path and redirect the user to an untrusted site.

The vulnerable regex pattern used in the parseUrl function is ^(https?):\/\/([^:&^\/]*):?(\d*)(.*)$ creates room for URL confusion and redirection to an untrusted site due to the backslash after the protocol in the URL provided by the attacker.

Based on the proof-of-concept provided by the researcher, we can explore a scenario where the parseUrl function is used to parse a URL provided by an attacker:

var forge = require("node-forge");
var url = forge.util.parseUrl("https:/\/\/\www.github.com/foo/bar");
console.log(url);

What do you expect the url output to be?

If we use the native new URL() function in JavaScript to parse the provided URL in the above POC example, we get the following output:

var url = new URL("https:/\/\/\www.github.com/foo/bar");
console.log(url);
// Output:
URL {
  href: 'https://www.github.com/foo/bar',
  origin: 'https://www.github.com',
  protocol: 'https:',
  username: '',
  password: '',
  host: 'www.github.com',
  hostname: 'www.github.com',
  port: '',
  pathname: '/foo/bar',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}

The new URL() function correctly parses the provided URL, however the parseUrl function in the node-forge npm package fails to parse the URL correctly and redirects the user to an untrusted site:

{
  full: 'https://',
  scheme: 'https',
  host: '',
  port: 443,
  path: '/www.github.com/foo/bar',
  fullHost: ''
}

In the above example path should be set to /foo/bar, however it is set to /www.github.com/foo/bar which is incorrect and leads to URL redirection to an untrusted site.

URL Regex Validation Impact

The impact of this vulnerability is that an attacker can provide a malicious URL with a backslash after the protocol, and result not just in an open redirect vulnerability but also have an impact such as phishing attacks, social engineering attacks, or further yet a Server-side Request Forgery (SSRF) attack if the URL is used to make requests to internal services and the developer relies on the parseUrl function to parse the URL and validate it.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

NodeJS Security & NodeJS Secure Coding's Blog

URL Regex Validation

The Vulnerability: CVE-2022-0122

URL Regex Validation Impact