Meet DAVE: Discord’s New End-to-End Encryption for Audio & Video

Last year, we announced that we were experimenting with new encryption protocols and technologies for audio and video calls on Discord. After extensive experimenting, designing, developing, and auditing, we’re excited to announce Discord’s audio and video end-to-end encryption (“E2EE A/V” or “E2EE” for short), which we like to refer to as our DAVE protocol.

Discord is committed to protecting the privacy and data of the roughly 200 million people who use our platform every month. As we continue to be a place that helps our users deepen friendships around games and shared interests, we are thrilled to be launching more secure and private voice and video calls.

Today, we’ll start migrating voice and video in DMs, Group DMs, voice channels, and Go Live streams to use E2EE. You will be able to confirm when calls are end-to-end encrypted and perform verification of other members in those calls.

We’d like to explain why we’re bringing E2EE A/V to Discord, share our design and implementation goals, and provide a high-level technical overview of how the new protocol works.

What We’ve Been Up To

When it comes to building a secure and trusted E2EE A/V protocol, transparency is key. To support this, we’re releasing the DAVE protocol whitepaper (discord/dave-protocol) and the libraries our clients use to implement it (discord/libdave). Moving forward, any changes to either the protocol or our code will be reflected in those repositories.

In the past few months, we collaborated closely with Trail of Bits, a renowned independent cybersecurity firm, to conduct a thorough review of both the design and implementation of DAVE in our code base. With DAVE’s launch, Trail of Bits is publishing their findings from both the design review and implementation review.

Safety is intertwined with our product and policies. While audio and video will be end-to-end encrypted, messages on Discord will continue to follow our content moderation approach and are not end-to-end encrypted. The E2EE A/V protocol was designed from the outset to be compatible with additional safety features that support the E2EE experience.

Our Goals

The design and implementation of the DAVE protocol were informed by five key goals.

Truly private conversations

During E2EE A/V calls, no one but the participants can access the contents of ongoing audio and video conversations. Outsiders, including Discord itself, never know the media encryption keys.

E2EE media encryption keys are different for each call, and for each specific group within the call at a point in time. When the participants join or leave a call, keys are changed and members cannot decrypt media that was sent before they joined or after they left.

An open and effective protocol

We want an E2EE A/V protocol that is publicly auditable and achieves the goals we set out for it. To that end, the protocol is detailed in our whitepaper and open-source library, uses industry-standard sub-protocols and cryptographic algorithms, and had its design and implementation externally audited by Trail of Bits.

Our first-party clients and the open-source library support out-of-band verifications of individual call participants and of the E2EE A/V call state as a whole.

Broad platform support

DAVE is compatible with all of our supported clients and nearly all of our voice and video spaces. Our latest desktop and mobile clients already support this upgrade, and we plan to extend support to the rest of our clients next year.

To transmit real-time audio and video, Discord uses WebRTC. When it comes to web clients, we are limited by the WebRTC API availability in browsers, which poses a unique challenge to supporting E2EE A/V. This is why DAVE leverages the WebRTC encoded transform API with a codec-aware send-side transform, which creates compatibility with WebRTC’s handling of Discord’s supported codecs.

Transparent to our users

Discord’s high quality, robust, low latency voice and video is not compromised by the introduction of E2EE A/V. Everyone should continue to experience Discord calls as they always have: chatting with friends without needing to think about the underlying technology and protocols.

As we begin rolling out DAVE, or when we make protocol updates in the future, we will automatically shift users to our new protocol versions. As people hop in and out of calls the underlying protocol version can change, but members of the call will not notice a disruption in what they see or hear.

E2EE A/V will eventually become the default for voice and video in DMs, Group DMs, voice channels, and Go Live streams on Discord. We want to seamlessly enable E2EE A/V for all of Discord’s users and their many devices, without requiring them to manage identity keys or select a primary device.

Scalable and performant

We want all calls on Discord, no matter the number of participants, to be eligible for E2EE. Audio and video conversations shouldn’t be forced to “downgrade” to transport-only encryption because of their scale.

We want to deliver decoded media to call participants as quickly as possible. However, negotiating a shared key through a multi-party key exchange takes time. We aim for the “initial time-to-media” from E2EE to be as small as possible, in the ballpark of a few hundred milliseconds for a reasonably sized Discord call.

We evaluated multiple key exchange protocols before selecting Messaging Layer Security. We strongly believe that its group-based approach is a better fit for Discord’s scalability and performance requirements than pairwise alternatives.

How It Works

At a high level, below are the four main components that come together to form the DAVE protocol.

WebRTC Encoded Transforms

DAVE uses the WebRTC encoded transform API. This allows us to insert a frame transformation function on both the sending and receiving side, encrypting after encode on the send side and decrypting before decode on the receive side.

In this frame transformation, each frame is encrypted or decrypted with a per-sender symmetric key. This key is known to all participants of the audio and video session but crucially is unknown to any outsider who is not a member of the call, including Discord.

Graphic visually showing the media frame process from encode all the way through the steps to decode using the DAVE frame transformers.

WebRTC requires that the transformed frame go through the WebRTC codec-specific packetizer after our encrypting frame transformer and then through the WebRTC codec-specific depacketizer before our decrypting frame transformer. This means any data that the packetizer or depacketizer expects to read must not be encrypted, and that special sequences reserved by codecs cannot appear unexpectedly in the transformed frame. As WebRTC is updating their API to be more compatible with E2EE, we expect this requirement to eventually be removed. We look forward to the future associated DAVE protocol update which will greatly simplify our frame transformers.

To address these current challenges appropriately for each supported codec, our send-side encrypting frame transformer is codec-aware. It is responsible for identifying the ranges of codec metadata that must remain unencrypted and for validating that the produced ciphertext does not contain any reserved sequences of bytes which would be problematic for the given codec.

This results in a transformed frame that can be correctly packetized and then depacketized by WebRTC and that arrives at the receiver’s decrypting frame transformer without modification.

Key Exchange: Messaging Layer Security

The protocol uses Messaging Layer Security (MLS) for group key exchange. We selected MLS because it provides a scalable mechanism for groups to update shared keys.

With DAVE, the client is a member of an underlying MLS group from which they can extract a per-sender media encryption key known to all of the members of the group. Our existing voice gateway now additionally serves the role of MLS delivery service and external sender, routing messages amongst group members and proposing when group members should be added or removed.

When participants join or leave a voice or video session on Discord, the group moves to a new “epoch,” and all of the per-sender keys change. A new member of the group cannot decrypt any media sent in the previous epochs, and a leaving member of the group cannot decrypt any media sent in future epochs.

Discord’s existing transport encryption for audio and video between the client and our selective forwarding unit (SFU) is retained, ensuring only audio and video from authenticated call participants is forwarded. While the SFU still processes all packets for the call, audio or video data inside each packet is end-to-end encrypted and undecryptable by the SFU.

Participants of the MLS group for a given epoch can compare an exported secret called the “epoch authenticator.” Discord clients display the call’s epoch authenticator as a string of numbers referred to as the Voice Privacy Code. Each Go Live stream associated with the call displays its epoch authenticator as a Stream Privacy Code.

The epoch authenticator is different for each epoch and changes whenever participants join or leave calls. By comparing these codes out-of-band, participants of the MLS group can verify that they all have the same MLS group state and that no one is being impersonated.

Identity Key Pairs and User Verification

We selected an MLS ciphersuite with Elliptic Curve Digital Signature Algorithm (ECDSA) signature keys, for compatibility with WebCrypto and to enable a future improvement to our persistent key storage: non-extractable keys provided by Trusted Platform Modules.

Each call participant generates an ECDSA P256 identity key pair and shares the public key with other call members before joining the underlying MLS group. Each device generates its own key pair, and there is no synchronization of private keys between a user’s devices.

During a call, each pair of users can perform an out-of-band comparison of their Verification Code to ensure that the other participant is the person they expect and not an impersonating attacker.

By default, identity key pairs are ephemeral and re-generated for each call. This means that the pairwise Verification Code changes for a pair of users across different calls or when somebody re-joins the same call (e.g., leaving a voice channel and rejoining a few minutes later).

Each user can choose to use a persistent identity key pair for each device they use to communicate on Discord, meaning that they will always present the same identity public key across all E2EE A/V calls with that device. This allows others to store a persistent verification for them and see them as verified across multiple E2EE A/V calls.

While the persistent identity key pair provides for a better verification experience, it necessarily shows other participants that you’re using the same device across multiple E2EE A/V calls. By making this opt-in we believe we struck the right balance: implementing a reasonable default privacy approach for Discord users while still offering the kind of E2EE verification experience that security-conscious users might expect.

Protocol Version and Group Transitions

For a call to use E2EE, every member of the call must support the E2EE protocol. During the rollout phase, a single non-supporting member being present forces the call to transport-only encryption. The call will automatically “upgrade” to E2EE if that member disconnects. We’ve built new user experience flows to show when calls are end-to-end encrypted and when they are not.

The voice gateway negotiates protocol and MLS epoch transitions between the members of a given call, to ensure an uninterrupted audio and video experience. We worked to make these transitions feel completely seamless: whether it’s a call transitioning in and out of E2EE, changing E2EE protocol versions, or adding and removing participants.

When a protocol or member change is required, the voice gateway announces this to all participants and coordinates any MLS group initialization, cleanup, or change. Clients report to the voice gateway once they are ready to complete a given transition, and the voice gateway announces that the transition can be executed once all members are ready.

When the transition is executed, call members start sending media for the new group’s protocol context. During these time-bounded transition phases, call members can temporarily process audio and video for either the previous or the current group’s protocol context. This ensures that the stream of audio and video received does not have any interruptions while the protocol context of the group changes underneath.

For a much more technical and detailed breakdown of the protocol, you might enjoy diving into the protocol whitepaper.

Creating a Turnkey User Experience

On the surface, voice and video on Discord remain the same great experience that millions of concurrent users rely on every day. To support the implementation of E2EE A/V, we’re rolling out user interface changes to view when voice and video calls are end-to-end encrypted and to help call members use Verification Codes to perform out-of-band verifications of members in E2EE calls. Check out our new help center article to learn more.

Discord user interface showing the voice or video call details including a new Privacy tab which indicates the call is end-to-end encrypted.

Opening the details view for your E2EE audio or video call shows a new Privacy tab. This tab contains a Voice Privacy Code, which displays an exported secret from the underlying MLS group–that “epoch authenticator” we mentioned earlier. It will change as users join and leave the call, and it can be compared out-of-band to ensure that nobody in the call is being impersonated.

Discord user interface of a user’s verification codes for a call.

You can also view a pairwise Verification Code for each of the other users in your audio and video call, and can undergo an out-of-band verification process to confirm that the other user is who you expect. The successful completion of this process will locally store the public identity key for the verified user. The stored verification of another device may or may not persist between multiple calls, and this depends on whether the identity key presented by the other device is “persistent.”

What Happens Next?

We know that this is a significant change for our external developer community, and from the start, we’ve prioritized making this transition as easy as possible. For more information, the protocol whitepaper, open-source library, and our updated voice websocket documentation are all now available.

If you’d like to review the protocol and provide your feedback, we recommend diving into the protocol whitepaper and the Trail of Bits design review and implementation review. In addition, our HackerOne program now also includes monetary rewards for successful vulnerability reports related to the DAVE protocol.

We understand that not everyone uses Discord in the same way, and people have different expectations of privacy in each space they’re in. As we continue to work to protect the privacy of our users, we’ll also keep investing heavily in safety features, technologies, and systems that put users in control of their experience.

Finally, while we’ve spent the last year testing and refining the DAVE protocol, we’ll continue to update our documentation as we expand support to other surfaces and identify opportunities for improvement.