BigScoots Network Packet Loss
Updates
The underlying issue has been completely resolved overnight. Our monitoring is no longer detecting any issues, so if you continue to see any problems please email us and/or BigScoots. Thanks!
BigScoots is working with Cloudflare to resolve the remaining routing issue on the small subset of sites still impacted by the original issue.
In the meantime, I’ve scheduled our monitoring email notifications to resume after 7am Pacific on Wednesday – my hope is that they’ll have this fully resolved long before then. (If you have any other concerns before then, please email us – we can also check the monitoring results directly, without relying on the email notifications.)
Not much else we can do at this point, so we’ll check back in the morning. Goodnight!
From BigScoots:
“It seems we are still experiencing routing issues within the US, this is #1 priority right now, we will also need to reload core configs on some networking gear as its currently running on rescue configs to bring this back online.”
The main network has been restored. Sites should be coming back online now, and hopefully will stay that way. Will continue to monitor.
BigScoots is continuing to work on the issue. In the meantime, here’s a quick recap of this evening’s events.
Shortly after 4pm Pacific this afternoon, a small subset of sites hosted by BigScoots started to report outages.
BigScoots responded quickly to diagnose and try to resolve the issue. After several hours of troubleshooting, it appeared to be an issue related to Magic Transit (The Firewall and DDoS protection from Cloudflare). The system was dropping packets when it shouldn’t have been. So while the servers were still working, the traffic wasn’t reaching them intermittently.
BigScoots began working directly with Cloudflare support. Only a small subset of sites continued to be reported up/down repeatedly.
Around 8:20pm, I disabled the email notifications from our uptime monitoring system, since it didn’t seem helpful to keep sending notifications for an ongoing issue.
A few minutes later, BigScoots initiated a reboot of some of their equipment in order to resolve the networking issue. This should have taken just a couple of minutes; however, the system has not yet come back online.
(This explains why, if we’re providing uptime monitoring for your site, you did not receive an email notification this evening after everything went down.)
BigScoots has a team onsite and is working as quickly as possible to restore service. We’ll continue to post updates here as soon as we have more information.
BigScoots needed to restart some equipment to apply changes. Unfortunately the restart is taking longer than expected. BigScoots confirms they’re working as fast as they can to resolve all the issues.
We’ve stopped our downtime alert emails from going out to our clients for the next few hours. No point in tormenting everyone with more of those notifications right now!
An update from Scott at BigScoots:
“Sorry everyone - I know our network upgrade was something many were looking forward to, including us! (bigscoots). There is an isolated issue at the moment where Cloudflare is not seeing a small piece of our network and routing traffic appropriately. It isn’t affecting many, but for those that it is, the result is that you are getting enough packet loss to have your site appear as down.
“Been working hard since the second it happened, with CF and also internally. We expect a resolution very soon and a detailed follow up shortly after.”
Update from BigScoots:
“We’ve identified routing issues within the US still but most other GEO locations are fine, still working hard to fully resolve this incident.”
From BigScoots:
“Most services have been restored, we are still working to restore service to several VPS nodes.”
Update From BigScoots:
There is 1 route of 5 down affecting 1 VPS node (previously 3). If you are travelling in on that route, packet loss may cause it to look down. It will be resolved very shortly.
BigScoots is experiencing network packet loss on some services in their network. It appears to be impacting sites on 2 VPS nodes (out of approximately 200). They they are currently investigating the cause and working on restoring full service.
← Back