A quirk in the SUNBURST DGA algorithm

The Cloudflare Blog

The day my ping took countermeasures Announcing Claude Compliance API support with Cloudflare CASB Announcing Claude Managed Agents on Cloudflare Project Glasswing: what Mythos showed us Our billing pipeline was suddenly slow. The culprit was a hidden bottleneck in ClickHouse Browser Run: now running on Cloudflare Containers, it’s faster and more scalable When "idle" isn't idle: how a Linux kernel optimization became a QUIC bug Building For The Future How Cloudflare responded to the “Copy Fail” Linux vulnerability When DNSSEC goes wrong: how we responded to the .de TLD outage Code Orange: Fail Small is complete. The result is a stronger Cloudflare network Introducing Dynamic Workflows: durable execution that follows the tenant Post-quantum encryption for Cloudflare IPsec is generally available Agents can now create Cloudflare accounts, buy domains, and deploy Shutdowns, power outages, and conflict: a review of Q1 2026 Internet disruptions Making Rust Workers reliable: panic and abort recovery in wasm‑bindgen Moving past bots vs. humans Building the agentic cloud: everything we launched during Agents Week 2026 The AI engineering stack we built internally — on the platform we ship Orchestrating AI Code Review at scale Introducing the Agent Readiness score. Check to see if your site is agent-ready Shared Dictionaries: compression that keeps up with the agentic web Redirects for AI Training enforces canonical content Unweight: how we compressed an LLM 22% without sacrificing quality Agents that remember: introducing Agent Memory Agents Week: network performance update Introducing Flagship: feature flags built for the age of AI Cloudflare’s AI Platform: an inference layer designed for agents Building the foundation for running extra-large language models AI Search: the search primitive for your agents Deploy Postgres and MySQL databases with PlanetScale + Workers Artifacts: versioned storage that speaks Git Email for agents - Cloudflare Email Service now in public beta Project Think: building the next generation of AI agents on Cloudflare Introducing Agent Lee - a new interface to the Cloudflare stack Register domains wherever you build: Cloudflare Registrar API now in beta Browser Run: give your agents a browser Rearchitecting the Workflows control plane for the agentic era Add voice to your agent Managed OAuth for Access: make internal apps agent-ready in one click Securing non-human identities: automated revocation, OAuth, and scoped permissions Scaling MCP adoption: Our reference architecture for simpler, safer and cheaper enterprise deployments of MCP Secure private networking for everyone: users, nodes, agents, Workers — introducing Cloudflare Mesh Building a CLI for all of Cloudflare Durable Objects in Dynamic Workers: Give each AI-generated app its own database Agents have their own computers with Sandboxes GA Dynamic, identity-aware, and secure Sandbox auth Welcome to Agents Week 500 Tbps of capacity: 16 years of scaling our global network From bytecode to bytes- automated magic packet generation Cloudflare targets 2029 for full post-quantum security How we built Organizations to help enterprises manage Cloudflare at scale Why we're rethinking cache for the AI era Our ongoing commitment to privacy for the 1.1.1.1 public DNS resolver Introducing EmDash — the spiritual successor to WordPress that solves plugin security Introducing Programmable Flow Protection: custom DDoS mitigation logic for Magic Transit customers Cloudflare Client-Side Security: smarter detection, now open to everyone How we use Abstract Syntax Trees (ASTs) to turn Workflows code into visual diagrams A one-line Kubernetes fix that saved 600 hours a year Sandboxing AI agents, 100x faster Inside Gen 13- how we built our most powerful server yet Launching Cloudflare’s Gen 13 servers- trading cache for cores for 2x edge compute performance Powering the agents: Workers AI now runs large models, starting with Kimi K2.5 Introducing Custom Regions for precision data control Standing up for the open Internet- why we appealed Italy’s Piracy Shield fine From legacy architecture to Cloudflare One Announcing Cloudflare Account Abuse Protection: prevent fraudulent attacks from bots and humans Slashing agent token costs by 98% with RFC 9457-compliant error responses AI Security for Apps is now generally available Building a security overview dashboard for actionable insights Investigating multi-vector attacks in Log Explorer Translating risk insights into actionable protection: leveling up security posture with Cloudflare and Mastercard Fixing request smuggling vulnerabilities in Pingora OSS deployments Active defense: introducing a stateful vulnerability scanner for APIs Complexity is a choice. SASE migrations shouldn’t take years. From the endpoint to the prompt: a unified data security vision in Cloudflare One Ending the "silent drop": how Dynamic Path MTU Discovery makes the Cloudflare One Client more resilient A QUICker SASE client: re-building Proxy Mode How Automatic Return Routing solves IP overlap Always-on detections: eliminating the WAF “log versus block” trade-off Mind the gap: new tools for continuous enforcement from boot to login Stop reacting to breaches and start preventing them with User Risk Scoring Defeating the deepfake: stopping laptop farms and insider threats Moving from license plates to badges: the Gateway Authorization Proxy Evolving Cloudflare’s Threat Intelligence Platform: actionable, scalable, and ETL-less Introducing the 2026 Cloudflare Threat Report See risk, fix risk: introducing Remediation in Cloudflare CASB How Cloudy translates complex security into human action From reactive to proactive: closing the phishing gap with LLMs Modernizing with agile SASE: a Cloudflare One blog takeover Beyond the blank slate: how Cloudflare accelerates your Zero Trust journey The truly programmable SASE platform Toxic combinations: when small signals add up to a security incident We deserve a better streams API for JavaScript The most-seen UI on the Internet? Redesigning Turnstile and Challenge Pages ASPA: making Internet routing more secure Bringing more transparency to post-quantum usage, encrypted messaging, and routing security How we rebuilt Next.js with AI in one week Cloudflare One is the first SASE offering modern post-quantum encryption across the full platform Cloudflare outage on February 20, 2026

Cloudflare Team · 2020-12-18 · via The Cloudflare Blog

On Wednesday, December 16, the RedDrip Team from QiAnXin Technology released their discoveries (tweet, github) regarding the random subdomains associated with the SUNBURST malware which was present in the SolarWinds Orion compromise. In studying queries performed by the malware, Cloudflare has uncovered additional details about how the Domain Generation Algorithm (DGA) encodes data and exfiltrates the compromised hostname to the command and control servers.

Background

The RedDrip team discovered that the DNS queries are created by combining the previously reverse-engineered unique guid (based on hashing of hostname and MAC address) with a payload that is a custom base 32 encoding of the hostname. The article they published includes screenshots of decompiled or reimplemented C# functions that are included in the compromised DLL. This background primer summarizes their work so far (which is published in Chinese).

RedDrip discovered that the DGA subdomain portion of the query is split into three parts:

<encoded_guid> + <byte> + <encoded_hostname>

An example malicious domain is:

7cbtailjomqle1pjvr2d32i2voe60ce2.appsync-api.us-east-1.avsvmcloud.com

Where the domain is split into the three parts as

Encoded guid (15 chars)	byte	Encoded hostname
7cbtailjomqle1p	j	vr2d32i2voe60ce2

The work from the RedDrip Team focused on the encoded hostname portion of the string, we have made additional insights related to the encoded hostname and encoded guid portions.

At a high level the encoded hostnames take one of two encoding schemes. If all of the characters in the hostname are contained in the set of domain name-safe characters "0123456789abcdefghijklmnopqrstuvwxyz-_." then the OrionImprovementBusinessLayer.CryptoHelper.Base64Decode algorithm, explained in the article, is used. If there are characters outside of that set in the hostname, then the OrionImprovementBusinessLayer.CryptoHelper.Base64Encode is used instead and ‘00’ is prepended to the encoding. This allows us to simply check if the first two characters of the encoded hostname are ‘00’ and know how the hostname is encoded.

These function names within the compromised DLL are meant to resemble the names of legitimate functions, but in fact perform the message encoding for the malware. The DLL function Base64Decode is meant to resemble the legitimate function name base64decode, but its purpose is actually to perform the encoding of the query (which is a variant of base32 encoding).

The RedDrip Team has posted Python code for encoding and decoding the queries, including identifying random characters inserted into the queries at regular character intervals.

One potential issue we encountered with their implementation is the inclusion of a check clause looking for a ‘0’ character in the encoded hostname (line 138 of the decoding script). This line causes the decoding algorithm to ignore any encoded hostnames that do not contain a ‘0’. We believe this was included because ‘0’ is the encoded value of a ‘0’, ‘.’, ‘-’ or ‘_’. Since fully qualified hostnames are comprised of multiple parts separated by ‘.’s, e.g. ‘example.com’, it makes sense to be expecting a ‘.’ in the unencoded hostname and therefore only consider encoded hostnames containing a ‘0’. However, this causes the decoder to ignore many of the recorded DGA domains.

As we explain below, we believe that long domains are split across multiple queries where the second half is much shorter and unlikely to include a ‘.’. For example ‘www2.example.c’ takes up one message, meaning that in order to transmit the entire domain ‘www2.example.c’ a second message containing just ‘om’ would also need to be sent. This second message does not contain a ‘.’ so its encoded form does not contain a ‘0’ and it is ignored in the RedDrip team’s implementation.

The quirk: hostnames are split across multiple queries

A list of observed queries performed by the malware was published publicly on GitHub. Applying the decoding script to this set of queries, we see some queries appear to be truncated, such as grupobazar.loca, but also some decoded hostnames are curiously short or incomplete, such as “com”, “.com”, or a single letter, such as “m”, or “l”.

When the hostname does not fit into the available payload section of the encoded query, it is split up across multiple queries. Queries are matched up by matching the GUID section after applying a byte-by-byte exclusive-or (xor).

Analysis of first 15 characters

Noticing that long domains are split across multiple requests led us to believe that the first 16 characters encoded information to associate multipart messages. This would allow the receiver on the other end to correctly re-assemble the messages and get the entire domain. The RedDrip team identified the first 15 bytes as a GUID, we focused on those bytes and will refer to them subsequently as the header.

We found the following queries that we believed to be matches without knowing yet the correct pairings between message 1 and message 2 (payload has been altered):

Part 1 - Both decode to “www2.example.c”r1q6arhpujcf6jb6qqqb0trmuhd1r0ee.appsync-api.us-west-2.avsvmcloud.comr8stkst71ebqgj66qqqb0trmuhd1r0ee.appsync-api.us-west-2.avsvmcloud.com

Part 2 - Both decode to “om”0oni12r13ficnkqb2h.appsync-api.us-west-2.avsvmcloud.comulfmcf44qd58t9e82h.appsync-api.us-west-2.avsvmcloud.com

This gives us a final combined payload of www2.example.com

This example gave us two sets of messages where we were confident the second part was associated with the first part, and allowed us to find the following relationship where message1 is the header of the first message and message2 is the header of the second:

_Base32Decode(message1) XOR KEY = Base32Decode(message2)_

The KEY is a single character. That character is xor’d with each byte of the Base32Decoded first header to produce the Base32Decoded second header. We do not currently know how to infer what character is used as the key, but we can still match messages together without that information. Since A XOR B = C where we know A and C but not B, we can instead use A XOR C = B. This means that in order to pair messages together we simply need to look for messages where XOR’ing them together results in a repeating character (the key).

Base32Decode(message1) XOR Base32Decode(message2) = KEY

Looking at the examples above this becomes

	Message 1	Message 2
Header	r1q6arhpujcf6jb	0oni12r13ficnkq
Base32Decode (binary)	101101000100110110111111011 010010000000011001010111111 01111000101001110100000101	110110010010000011010010000 001000110110110100111100100 00100011111111000000000100

We’ve truncated the results slightly, but below shows the two binary representations and the third line shows the result of the XOR.

101101000100110110111111011010010000000011001010111111011110001010011101110110010010000011010010000001000110110110100111100100001000111111110000011011010110110101101101011011010110110101101101011011010110110101101101

We can see the XOR result is the repeating sequence ‘01101101’meaning the original key was 0x6D or ‘m’.

We provide the following python code as an implementation for matching paired messages (Note: the decoding functions are those provided by the RedDrip team):

# string1 is the first 15 characters of the first message
# string2 is the first 15 characters of the second message
def is_match(string1, string2):
    encoded1 = Base32Decode(string1)
    encoded2 = Base32Decode(string2)
    xor_result = [chr(ord(a) ^ ord(b)) for a,b in zip(encoded1, encoded2)]
    match_char = xor_result[0]
    for character in xor_result[0:9]:
        if character != match_char:
            return False, None
    return True, "0x{:02X}".format(ord(match_char))

The following are additional headers which based on the payload content Cloudflare is confident are pairs (the payload has been redacted because it contains hostname information that is not yet publicly available):

Example 1:

vrffaikp47gnsd4a
aob0ceh5l8cr6mco

xorkey: 0x4E

Example 2:

vrffaikp47gnsd4a
aob0ceh5l8cr6mco

xorkey: 0x54

Example 3:

vvu7884g0o86pr4a
6gpt7s654cfn4h6h

xorkey: 0x2B

We hypothesize that the xorkey can be derived from the header bytes and/or padding byte of the two messages, though we have not yet determined the relationship.

Update (12/18/2020):

Erik Hjelmvik posted a blog explaining where the xor key is located. Based on his code, we provide a python implementation for converting the header (first 16 bytes) into the decoded GUID as a string. Messages can then be paired by matching GUID’s to reconstruct the full hostname.

def decrypt_secure_string(header):
    decoded = Base32Decode(header[0:16])
    xor_key = ord(decoded[0])
    decrypted = ["{0:02x}".format(ord(b) ^ xor_key) for b in decoded]
    return ''.join(decrypted[1:9])

Updated example:

	Message 1	Message 2
Header	r1q6arhpujcf6jb	0oni12r13ficnkq
Base32Decode Header (hex)	b44dbf6900cafde29d05	d920d2046da7908ff004
Base32Decode first byte (xor key)	0xb4	0xd9
XOR result (hex)	00f90bddb47e495629	00f90bddb47e495629

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

The Cloudflare Blog

Background

The quirk: hostnames are split across multiple queries

Analysis of first 15 characters

Update (12/18/2020):