Engineering & Developers

How Discord handles push request bursts of over a million per minute with Elixir’s GenStage

Jesse Howarth

December 12, 2016

Discord has seen tremendous growth. To handle this growth, our engineering team has had the pleasure of figuring out how to scale the backend services.

One piece of technology we’ve seen great success with is Elixir’s GenStage.

The perfect storm: Overwatch and Pokémon GO

This past summer, our mobile push notification system started having a struggle. /r/Overwatch’s Discord had just passed 25,000 concurrent users, Pokémon GO Discords were popping up left and right, and burst notifications became a real issue.

Burst notifications brought the entire push notification system to a slow and sometimes a halt. Push notifications would arrive late or not arrive at all.

GenStage to the rescue

After a bit of investigation, we determined the main bottleneck was sending push notifications to Google’s Firebase Cloud Messaging service.

We realized we could immediately improve the throughput by sending push requests to Firebase via XMPP rather than HTTP.

Firebase XMPP is a bit trickier than HTTP. Firebase requires that each XMPP connection has no more than 100 pending requests at a time. If you have 100 requests in flight, you must wait for Firebase to acknowledge a request before sending another.

Since only 100 requests can be pending at a time, we needed to design our new system such that XMPP connections do not get overloaded in burst situations.

From an initial glance, it seemed that GenStage would be a perfect fit for our problem.

GenStage

So what’s GenStage?

GenStage is a new Elixir behaviour for exchanging events with back-pressure between Elixir processes. [0]

What does that mean, really? Basically, it gives you the tools needed to make sure no part of your system gets overloaded.

In practice, a system with GenStage behaviours normally has several stages.

Stages are computation steps that send and/or receive data from other stages.

When a stage sends data, it acts as a producer. When it receives data, it acts as a consumer. Stages may take both producer and consumer roles at once.

Besides taking both producer and consumer roles, a stage may be called “source” if it only produces items or called “sink” if it only consumes items. [1]

The approach

We split the system into two GenStage stages. One source and one sink.

Stage 1 — the Push Collector: The Push Collector is a producer that collects push requests. There is currently one Push Collector Erlang process per machine.
Stage 2 — the Pusher: The Pusher is a consumer that demands push requests from the Push Collector and pushes the requests to Firebase. It only demands 100 requests at a time to ensure it does not go over Firebase’s pending request limit. There are many Pusher Erlang processes per machine.

Back-pressure and load-shedding with GenStage

GenStage has two key features that aide us during bursts: back-pressure and load-shedding.

Back-pressure

In the Pusher, we use GenStage’s demand functionality to ask the Push Collector for the maximum number of requests the Pusher can handle. This ensures an upper bound on the number of push requests the Pusher has pending. When Firebase acknowledges a request, the Pusher demands more from the Push Collector.

The Pusher knows the exact amount the Firebase XMPP connection can handle and never demands too much. The Push Collector never sends a request to a Pusher unless the Pusher asks for one.

Load-shedding

Since the Pushers put back-pressure on the Push Collector, we now have a potential bottleneck at the Push Collector. Super-duper huge bursts might overload the Push Collector.

GenStage has another built-in feature to handle this: buffered events.

In the Push Collector, we specify how many push requests to buffer. Normally the buffer is empty, but about once a month in catastrophic situations it comes in handy.

If there are way too many messages moving through the system and the buffer fills up then the Push Collector will shed incoming push requests. This comes for free from GenStage by simply specifying the buffer_size option in the init function of the Push Collector.

With these two features we are able to handle burst notifications.

The code (the important parts, at least)

Below is example code of how we set up our stages. For simplicity, we removed a lot of failure handling for when connections go down, Firebase returns errors, etc.

You can skip the code if you just want to view the results of the system.

Push Collector (the producer)

	defmodule GCM.PushCollector do
	use GenStage

	# Client

	def push(pid, push_requests) do
	GenServer.cast(pid, {:push, push_requests})
	end

	# Server

	def init(_args) do
	# Run as producer and specify the max amount
	# of push requests to buffer.
	{:producer, :ok, buffer_size: @max_buffer_size}
	end

	def handle_cast({:push, push_requests}, state) do
	# Dispatch the push_requests as events.
	# These will be buffered if there are no consumers ready.
	{:noreply, push_requests, state}
	end

	def handle_demand(_demand, state) do
	# Do nothing. Events will be dispatched as-is.
	{:noreply, [], state}
	end
	end

view raw push_collector.ex hosted with ❤ by GitHub

Pusher (the consumer)

	defmodule GCM.Pusher do
	use GenStage
	# The maximum number of requests Firebase allows at once per XMPP connection
	@max_demand 100

	defstruct [
	:producer,
	:producer_from,
	:fcm_conn_pid,
	:pending_requests,
	]

	def start_link(producer, fcm_conn_pid, opts \\ []) do
	GenStage.start_link(__MODULE__, {producer, fcm_conn_pid}, opts)
	end

	def init({producer, fcm_conn_pid}) do
	state = %__MODULE__{
	next_id: 1,
	pending_requests: Map.new,
	producer: producer,
	fcm_conn_pid: fcm_conn_pid,
	}
	send(self, :init)
	# Run as consumer
	{:consumer, state}
	end

	def handle_info(:init, %{producer: producer}=state) do
	# Subscribe to the Push Collector
	GenStage.async_subscribe(self, to: producer, cancel: :temporary)
	{:noreply, [], state}
	end

	def handle_subscribe(:producer, _opts, from, state) do
	# Start demanding requests now that we are subscribed
	GenStage.ask(from, @max_demand)
	{:manual, %{state \| producer_from: from}}
	end

	def handle_events(push_requests, _from, state) do
	# We got some push requests from the Push Collector.
	# Let’s send them.
	state = Enum.reduce(push_requests, state, &do_send/2)
	{:noreply, [], state}
	end

	# Send the message to FCM, track as a pending request
	defp do_send(push_request, %{fcm_conn_pid: fcm_conn_pid, pending_requests: pending_requests}=state) do
	{message_id, state} = generate_id(state)
	xml = PushRequest.to_xml(push_request, message_id)
	:ok = FCM.Connection.send(fcm_conn_pid, xml)
	pending_requests = Map.put(pending_requests, message_id, push_request)
	%{state \| pending_requests: pending_requests}
	end

	# FCM response handling
	defp handle_response(%{message_id: message_id}=response, %{pending_requests: pending_requests, producer_from: producer_from}=state) do
	{push_request, pending_requests} = Map.pop(pending_requests, message_id)

	# Since we finished a request, ask the Push Collector for more.
	GenStage.ask(producer_from, 1)

	%{state \| pending_requests: pending_requests}
	end

	defp generate_id(%{next_id: next_id}=state) do
	{to_string(next_id), %{state \| next_id: next_id + 1}}
	end
	end

view raw pusher.ex hosted with ❤ by GitHub

An example incident

Below is a real incident the new system handled. The top graph is the number of push requests per second flowing through the system. The bottom graph is the number of push requests buffered by the Push Collector.

Note: These graphs are taken from the first incident after we deployed the new system. Our pushes per second have more than doubled since then. Also, we only send pushes when users are not active on the application.

Order of events:

~17:47:00 — The system is nominal.
~17:47:30 — We start receiving a burst of messages. The Push Collector has a small blip in its buffer count as the Pusher’s react. Shortly after, the buffer goes down for a little bit.
~17:48:50 — The Pushers cannot send messages to Firebase faster than they are coming in, so the buffer in the Push Collector starts filling up.
~17:50:00 — The Pusher Collector buffer starts peaking and sheds some requests.
~17:50:50 — The Pusher Collector buffer stops peaking.
~17:51:30 — The requests peak then slow down.
~17:52:30 — The system is completely back to normal.

During this entire incident there was no noticeable impact to the system or users. Obviously a few notifications were dropped. If a few notifications weren’t dropped, the system may never have recovered, or the Push Collector might have fallen over. We find this to be an acceptable compromise for something like notifications.

Elixir’s Success

At Discord we have been very happy using Elixir/Erlang as a core technology of our backend services. We are pleased to see additions such as GenStage that build on the rock solid technologies of Erlang/OTP.

We are looking for brave souls to help solve problems like these as Discord continues to grow. If you love games and these types of problems make you super excited, we’re hiring! Check out our available positions here.

How Discord handles push request bursts of over a million per minute with Elixir’s GenStage

The perfect storm: Overwatch and Pokémon GO

GenStage to the rescue

GenStage

The approach

Back-pressure and load-shedding with GenStage

Back-pressure

Load-shedding

The code (the important parts, at least)

Push Collector (the producer)

Pusher (the consumer)

An example incident

Elixir’s Success

related articles

Discord Update: March 25, 2025 Changelog

Revamped Overlay & Refreshed Desktop Give Game Time a Boost

Discord Patch Notes: March 11, 2025

Discord Patch Notes: February 3, 2025

Discord Update: December 19, 2024 Changelog

Gift Ideas for the Dedicated Discord User in Your Life

Discord Patch Notes: December 5, 2024

Discord Update: November 18, 2024 Changelog

Celebrate Arcane’s Second Season with a new Shop Collection

Discord Patch Notes: November 1, 2024

Set Out for a Discord Adventure! Check Out Our Roll20 Adventure & D&D Shop Collection

Discord Patch Notes: October 1, 2024

Discord Update: September 26, 2024 Changelog

Discover More Ways to Play with Apps – Now Anywhere on Discord!

Legacy Shop Favorites Emerge from The Vault for a First Anniversary Encore!

Discord Patch Notes: August 30, 2024

Discord Update: August 28, 2024 Changelog

Queue Up Your Playlists on Discord with the Amazon Music Listening Party Activity!

Discord Patch Notes: August 1, 2024

Now Available: See What’s Happening on Discord, Directly from your Xbox console

Discord Update: July 26, 2024 Changelog

WHO LIVES ON YOUR PROFILE FOR ALL TO SEE? 🎶 SPONGEBOB, IN THE SHOP!

Discord Patch Notes: July 1, 2024

Discord Update: June 20, 2024 Changelog

How to Join Discord Calls Directly From Your PS5® — No Phone Needed!

Feast Your Monit-eyes on Today's Exciting Developer Updates!

Discord Patch Notes: May 2024

Refining Discord’s Mobile Experience With Your Feedback

Discord Update: May 13, 2024 Changelog

Discord Patch Notes: April 2024

Discord Update: April 3, 2024 Changelog

Lock in. Stand out. VALORANT arrives in the Shop.

Discord Update: March 5, 2024 Changelog

Discord Update: December 13, 2023 Changelog

Improving Our Mobile Experience

Discord Update: October 19, 2023 Changelog

Avatar Decorations & Profile Effects: Collect and Keep the Newest Styles

Discord Update: September 13, 2023 Changelog

Now Available: Stream Your Xbox Games Directly to Discord

Discord Update: July 29, 2023 Changelog

Meme Up Some Fun with Remix

Discord Update: June 22, 2023 Changelog

Server Subscriptions Just Got Super Powered: Introducing Media Channels, Tier Templates and more!

Discord Update: May 22, 2023 Changelog

Evolving Usernames on Discord

Discord Update: April 14, 2023 Changelog

Welcome Your New Members Easily with Community Onboarding

Introducing Discord Voice Messages

April Showers Bring Super-Cool Nitro Powers

New to Discord Nitro: Super Reactions Make Your Emoji Burst to Life

Ready Your Airhorns! 🎺 Discord Soundboard is Coming Your Way

Discord Update: March 20, 2023 Changelog

Now in Nitro: Bring Your Vibe to Discord with New Themes

Discord Activities: Play Games and Watch Together

Discord is Your Place for AI with Friends

Now Available: Use Discord Voice Chat on Your PlayStation®5 Console

Discord Update: February 20, 2023 Changelog

Introducing Video, Screen Share, and Text Chat Support for Stage Channels

Discord Update: January 25, 2023 Changelog

Make Your Connection: Connected Accounts Get a Huge Functionality Boost

Announcing Server Subscriptions and the Creator Portal, Now Open to More Communities

Discord Update: November 1, 2022 Changelog

Attention Server Owners: The App Directory is Here!

Introducing Discord Nitro Basic

Blocking Spam Gets Easier Thanks to New AutoMod and Safety Tools

Forum Channels: A Space for Organized Conversations