
























On October 1, 2024, TIBCO Mashery, an enterprise API management platform leveraged by some of the world’s most recognizable brands, experienced a significant outage. At around 7:10 AM ET, users began encountering SSL connection errors that appeared straightforward at first glance.
Internet Sonar, one of the tools in our Internet Performance Monitoring (IPM) arsenal, successfully captured the incident. While other solutions may have missed it, Internet Sonar was able to pinpoint the issue because it monitors all the layers of the Internet Stack, including DNS, SSL, response times, and reachability from “eyeball” user networks. This comprehensive view revealed that the root cause wasn’t an SSL failure, but a DNS misconfiguration affecting access to key services.

The SSL error in the browser (as shown in the image below)shows that the certificate is pointing to pantheonsite.io.

Looking at the details in the Catchpoint platform, we observed the same issue.
While attempting to connect to developer.mashery.com, DNS resolution occurred, but the connection pointed to an IP that didn’t identify the Mashery domain. In some regions, the connection was still working and returning the correct certificate.


The correct SSL handshake should have seen mashery.com as the Common Name or Subject Alternative Name (CN/SAN) as we can see in the screenshot below.

Since this was related to an SSL error, an SSL test confirmed that we failed to connect to the site because the CN/SAN did not match.

Initially, users encountered SSL errors caused by connections being directed to a Pantheon IP (23.185.0.3) instead of the expected AWS ELB IPs (54.160.170.229, 54.235.15.197, and 44.211.103.199). This misrouting occurred due to recursive resolvers connecting to"ns65.worldnic.com" and "ns66.worldnic.com." In contrast, alternative DNS resolvers like 8.8.8.8 (Google) and 1.1.1.1 (Cloudflare) correctly directed traffic to AWS.
We can see the same in this DNS Experience Test record below:

This chart shows how it should have been working (some cities where it was working correctly):

Query using Google DNS Resolver –8.8.8.8

Crucially, the issue manifested differently across various geographical locations, likely due to geo-IP based configurations affecting how DNS records were served. This variability underlines the importance of a global monitoring strategy. Relying solely on a cloud instance would never have captured the full scope of the problem. It’s imperative to get close to eyeball networks to truly understand how users experience services across regions.
As the hours progressed, we witnessed DNS errors such as '101 Not Implemented,' 'Query Refused,' and 'Server Failure' from different parts of the world, indicating ongoing changes within the system. Catchpoint’s DNS monitoring captured these issues, and after almost 4.5 hours, the problem began to resolve as changes propagated correct Name Servers and A Records globally.

For users relying on Mashery for seamless API management, this incident had serious consequences. Requests were inconsistently routed—some ending up at Fastly IPs instead of the intended AWS ELB—potentially leading to service disruptions. This highlights the fragile nature of Internet infrastructure, where a single DNS misconfiguration can immediately impact user experience and service availability.
What seemed like a simple SSL error at first quickly revealed a much bigger issue. The incident exposed critical weaknesses in DNS reliability, SSL configurations, and CDN performance. It’s yet another reminder that the Internet is deeply interconnected, and problems can appear in one region while other areas remain unaffected, making global visibility and proactive monitoring essential.
The Mashery outage reveals a crucial lesson: SSL errors can be just the tip of the iceberg. The real issue often lies deeper, like in this case, with a DNS misconfiguration. If DNS isn’t properly configured or monitored, the entire system can fail, and what seems like a simple SSL error can spiral into a much bigger problem.
This incident is a wake-up call. The interconnected nature of the Internet means that a single point of failure—like DNS—can disrupt services across the globe. Geographic differences only make it harder to detect and resolve these issues, which is why a global monitoring strategy is essential. To truly safeguard against the fragility of the Internet, you need full visibility into every layer of the Internet Stack, from DNS to SSL and beyond.
This incident underscores the necessity of monitoring every layer of the Internet Stack—DNS, SSL, CDN, and third-party services. By using robust IPM tools like Internet Sonar, companies can achieve resilience across all these dependencies.
Internet Sonar provides:


Today it was Mashery; tomorrow, it could be your service. The need for strong, continuous monitoring practices cannot be overstated.
Check out our demo hub to see Internet Sonar at work, or contact us to learn more.
On October 1, 2024, TIBCO Mashery, an enterprise API management platform leveraged by some of the world’s most recognizable brands, experienced a significant outage. At around 7:10 AM ET, users began encountering SSL connection errors that appeared straightforward at first glance.
Internet Sonar, one of the tools in our Internet Performance Monitoring (IPM) arsenal, successfully captured the incident. While other solutions may have missed it, Internet Sonar was able to pinpoint the issue because it monitors all the layers of the Internet Stack, including DNS, SSL, response times, and reachability from “eyeball” user networks. This comprehensive view revealed that the root cause wasn’t an SSL failure, but a DNS misconfiguration affecting access to key services.

The SSL error in the browser (as shown in the image below)shows that the certificate is pointing to pantheonsite.io.

Looking at the details in the Catchpoint platform, we observed the same issue.
While attempting to connect to developer.mashery.com, DNS resolution occurred, but the connection pointed to an IP that didn’t identify the Mashery domain. In some regions, the connection was still working and returning the correct certificate.


The correct SSL handshake should have seen mashery.com as the Common Name or Subject Alternative Name (CN/SAN) as we can see in the screenshot below.

Since this was related to an SSL error, an SSL test confirmed that we failed to connect to the site because the CN/SAN did not match.

Initially, users encountered SSL errors caused by connections being directed to a Pantheon IP (23.185.0.3) instead of the expected AWS ELB IPs (54.160.170.229, 54.235.15.197, and 44.211.103.199). This misrouting occurred due to recursive resolvers connecting to"ns65.worldnic.com" and "ns66.worldnic.com." In contrast, alternative DNS resolvers like 8.8.8.8 (Google) and 1.1.1.1 (Cloudflare) correctly directed traffic to AWS.
We can see the same in this DNS Experience Test record below:

This chart shows how it should have been working (some cities where it was working correctly):

Query using Google DNS Resolver –8.8.8.8

Crucially, the issue manifested differently across various geographical locations, likely due to geo-IP based configurations affecting how DNS records were served. This variability underlines the importance of a global monitoring strategy. Relying solely on a cloud instance would never have captured the full scope of the problem. It’s imperative to get close to eyeball networks to truly understand how users experience services across regions.
As the hours progressed, we witnessed DNS errors such as '101 Not Implemented,' 'Query Refused,' and 'Server Failure' from different parts of the world, indicating ongoing changes within the system. Catchpoint’s DNS monitoring captured these issues, and after almost 4.5 hours, the problem began to resolve as changes propagated correct Name Servers and A Records globally.

For users relying on Mashery for seamless API management, this incident had serious consequences. Requests were inconsistently routed—some ending up at Fastly IPs instead of the intended AWS ELB—potentially leading to service disruptions. This highlights the fragile nature of Internet infrastructure, where a single DNS misconfiguration can immediately impact user experience and service availability.
What seemed like a simple SSL error at first quickly revealed a much bigger issue. The incident exposed critical weaknesses in DNS reliability, SSL configurations, and CDN performance. It’s yet another reminder that the Internet is deeply interconnected, and problems can appear in one region while other areas remain unaffected, making global visibility and proactive monitoring essential.
The Mashery outage reveals a crucial lesson: SSL errors can be just the tip of the iceberg. The real issue often lies deeper, like in this case, with a DNS misconfiguration. If DNS isn’t properly configured or monitored, the entire system can fail, and what seems like a simple SSL error can spiral into a much bigger problem.
This incident is a wake-up call. The interconnected nature of the Internet means that a single point of failure—like DNS—can disrupt services across the globe. Geographic differences only make it harder to detect and resolve these issues, which is why a global monitoring strategy is essential. To truly safeguard against the fragility of the Internet, you need full visibility into every layer of the Internet Stack, from DNS to SSL and beyond.
This incident underscores the necessity of monitoring every layer of the Internet Stack—DNS, SSL, CDN, and third-party services. By using robust IPM tools like Internet Sonar, companies can achieve resilience across all these dependencies.
Internet Sonar provides:


Today it was Mashery; tomorrow, it could be your service. The need for strong, continuous monitoring practices cannot be overstated.
Check out our demo hub to see Internet Sonar at work, or contact us to learn more.
This is some text inside of a div block.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。