Over the past few weeks, Discord has had a number of outages. Two of those were particularly severe, on the 7th and 8th, resulting in hours of downtime each.
This is clearly unacceptable, and we’re sorry. We understand how important Discord is to each of you — whether you’re talking to friends, playing games together, or building your communities — and these kinds of interruptions are just not okay. We want you to be able to rely on us to be there for you, and we’ve let you down.
The TLDR is that we’ve grown a lot this past year (in terms of users and features), and we did not stay ahead of that growth in a couple of places that have been behind the recent outages.
To address this,
- As of last week, we’ve reprioritized most of Discord’s infrastructure engineering team to be part of a new Reliability Strike Team. Among them are some of our most senior engineers.
- The goal of this team is to focus on improving the stability and performance of our platform for the rest of the year and through 2020.
- The team has already worked through the weekend and shipped dozens of improvements, including adding more capacity to many of our systems and reducing load on our main user database by about 50%.
- We believe that, with these improvements already deployed and more to come this week, the majority of the incidents that have been happening will not continue to happen.
- The work does not stop here. We will continue to double down on stability investments throughout the future.
Discord is supposed to just work — wherever you are, whatever you’re doing. That’s our goal, and we haven’t been hitting that. We’re sorry, and it’s our promise to you that we’re doing everything we can (and making progress!) towards fixing this.
I know this blog post is brief and light on detail (because we’ve been busy fixing things!) but stay tuned. We will write up and share the technical details of these recent outages, what we’ve been doing, talk about the details of our infrastructure plans for next year.
Thanks for reading.