2024-05-30 – Improving reliability on our infrastructure

Outage updates (archive)
  • May 30th, 14:30 CEST: Estimated outage time.

  • May 30th, 15:50 CEST: Equestria.dev starts investigating.

  • May 30th, 18:45 CEST: The cause is found to be a router outage.

  • May 31st, 19:35 CEST: Our ISP reports the outage has been resolved, which is not true.

  • June 1st, 14:50 CEST: We have restored our status page with details about the outage.

  • June 2nd, 9:30 CEST: The release of Delta 5.3.12 to production environments is delayed to June 4th due to the outage.

  • June 2nd, 19:50 CEST: Our ISP has announced that the outage should be resolved tomorrow at 18:00 CEST. An Equestria.dev team will be available on-site at 19:00 CEST.

  • June 2nd, 21:30 CEST: Update of our warrant canary has been delayed due to the outage.

  • June 3rd, 19:30 CEST: Our ISP has moved estimated resolution time to tomorrow morning. We will start deploying alternative options if the outage is still not resolved by 18:00 CEST.

  • June 4th, 18:30 CEST: The outage is still not resolved and we are now looking into alternative options to help recover some of our infrastructure. This may take a few days.

  • June 4th, 19:00 CEST: Starshine (our main website) is back online, now hosted on Vercel. We had to remove the "Status" page (it now links to status.equestria.dev instead) to make this possible.

  • June 4th, 20:15 CEST: We have regained access to dabssi and hudgens, albeit without internet access for dabssi. Both servers are behind a firewall so impossible to use in production, but this gives us access to precious resources.

  • June 4th, 21:00 CEST: We have recovered files from the Trusted Core Network, meaning we are now ready to move it to NL1. NL1 will be deployed tomorrow.

  • June 5th, 16:00 CEST: As announced yesterday, we are now deploying NL1. The priority is to make it host the Trusted Core Network.

  • June 5th, 18:00 CEST: Our Trusted Core Network has been restored on NL1, meaning we now have full regular access to servers. We will configure more applications on NL1 later.

  • June 7th, 19:00 CEST: Our FR1 router is now back and we are sending a team on-site to reconfigure hardware that requires reconfiguring.

  • June 7th, 21:45 CEST: The outage is now fully resolved (aside from koshy, which is considered low priority) and all services are available again. We will proceed with the GitHub migration in the upcoming weeks.

Errata: The outage is now over, thanks for bearing with us. We have decided to postpone discontinuing the DE1 datacenter as well as transferring hudgens to a virtual instance (which might be cancelled); other planned maintenances stay the same. We will discuss migration to GitHub and, if approved, migrate our source code over the next few weeks.


We failed. Our entire infrastructure is down – once again.

Every now and then, the router in our FR1 datacenter goes down for a few days. This is something that has happened at least two times before and we now need to mitigate this issue more than ever.

Due to the fact that our Trusted Core Network server is hosted in FR1, we do not have access to any of our servers; we rely entirely on automated maintenance tasks to keep what's still up alive. Let's talk about the few issues we intend to fix:

  • Trusted Core Network will be moved to NL1: We picked FR1 as our prefered datacenter for TCN as it is the one with the highest bandwidth. It became obvious that, with such a critical piece of equipment, reliability should absolutely be prioritized over speed. We will therefore move TCN to the NL1 datacenter, managed by Scaleway, which can only deliver 100 Mbps but with extreme reliability.

  • Source code will be moved to GitHub: Our GitLab instance, which is now getting increasingly harder to manage, will be discontinued, also allowing us to free resources on servers. Active projects will be moved to GitHub, and archives will be moved to a dedicated server hosted using cgit. The package registry will most likely be discontinued for the time being.

  • Our new website (version 14) will be hosted on Vercel for maximum availability. Additionally, any software that is locally installed on user devices should continue working as it should.

We will continue to keep making our FR1 datacenter as reliable as it can possibly be and we will be looking for alternative network technologies we can use in case of a main router outage. Maintenance downtimes might occur and will be announced on the home page.

We again would like to express our sincerest apologies for the current incidents, and we hope to retain your continued trust and support. We will keep everyone updated as the incident progresses and would like to confirm that no data leak or security or privacy issue is involved in this outage.

Last updated