December 15, 2023

How We Leverage Machine Learning to Fight Child Sexual Abuse Material

In November, Discord and other leading tech companies joined together to launch Lantern, a pioneering initiative to thwart child predators who try to elude detection on our platforms. Our goal is simple: We are working to protect children on Discord and the internet more broadly.

Discord has a zero-tolerance policy for child sexual abuse material (CSAM). This work is a priority for us. We’ve invested in hiring and partnering with industry-leading child safety experts and in developing innovative tools that make it harder to distribute CSAM. We’ve recently developed a new model for detecting novel forms of CSAM. We use this model on Discord as another layer of detection so we can stop bad actors. We have made this model open-source and we look forward to further developing it and sharing it with other companies, including those in the Tech Coalition. Bad actors aren't beholden to one platform—when one platform becomes too difficult for them to evade detection, they can move on to another and try there. We want to help to put an end to that.

Today, we’re at a significant moment where advancements in machine learning, paired with industry collaboration, are strengthening our collective efforts to combat this threat.

We’re bridging a key gap in CSAM detection

To understand how the tech industry is working to solve these challenges, it’s helpful to understand the technology itself.

The industry standard for CSAM detection is a tool called PhotoDNA, which was developed by Microsoft and donated to the National Center for Missing & Exploited Children (NCMEC)—the nationally recognized hub for reporting suspected online child sexual exploitation in the US.

When imagery depicting child sexual abuse is reported to NCMEC, PhotoDNA is used to create a unique digital signature (or “hash”) of that image. That hash—and not the illegal imagery itself—is stored in a database that serves as a reference point for online detection efforts.

At Discord, we proactively scan images using hashing algorithms (including PhotoDNA) to detect known CSAM. For example, if someone uploads an image that matches against known CSAM that has been previously reported, we automatically detect it, remove it, and flag it for internal review. When we confirm that someone attempted to share illegal imagery, we remove the content, permanently ban them from our platform, and report them to NCMEC, who subsequently works with law enforcement.

There are more than 6.3 million hashes of CSAM in NCMEC’s hash database and more than 29 million hashes of CSAM contained in databases from around the world. That gives the tech industry a strong foundation for stopping the spread of this harmful material on our platforms. But there’s one key limitation: it only detects exact or similar copies of images that are already logged in the database. This means that predatory actors will try to exploit this weakness with unknown CSAM that has yet to be seen or logged, indicating that there is either a likelihood of active abuse, or the images were synthetically generated using AI.

Machine learning helps us stop CSAM at scale

Unknown CSAM has continually plagued our industry and our team felt it was important to advance our own detection capabilities. We started with an open-source model that can analyze images to understand what they are depicting. Our engineers experimented and found that we could use this model to not only determine how similar one image is to another, but due to the semantic understanding of the model, it also demonstrated promising results for novel CSAM detection.

When we find a match, we follow the same process that we use for PhotoDNA matches: A human reviews it to verify if it is an instance of CSAM. If it is, it’s reported to NCMEC and added to the PhotoDNA database. From that point, the newly verified image can be detected automatically across platforms. This speeds up the review process, expands the database of known CSAM, and makes everyone better and faster at blocking CSAM over time. This is a major advancement in the effort to detect, report, and stop the spread of CSAM.

At a recent Tech Coalition hackathon, engineers from Discord and other tech companies collaborated on ways we can work together to identify new forms of CSAM that evade existing detection methods. During this collaboration, we were able to work to make the AI-powered detection mechanism we built open-source. At Discord, we believe safety shouldn’t be proprietary or a competitive advantage. Our hope is that by making this technology open source, we can bring new capabilities and protections to all tech platforms, especially newer companies that don’t yet have established trust and safety teams.

This is part of our work alongside the Lantern program, which is committed to stopping people from evading CSAM detection across platforms. Together, we’re working to increase prevention and detection methods, accelerate identification processes, share information on predatory tactics, and strengthen our collective reporting to authorities.

Ultimately, this is an ideal use of machine learning technology. It’s important to note, though, that human expertise and judgment is still involved. Our expert staff continues to review anything that’s not an exact match with PhotoDNA, because the stakes are high and we need to get it right. What matters most is that the combined efforts of machine learning and human expertise are making it harder than ever for predatory actors to evade detection.

We’re working to build a safer internet. It’s in our DNA.

Safety is a top priority at Discord. It’s reflected in our personnel: roughly 15% of our staff is dedicated to safety. These teams work to ensure our platform, features, and policies result in a positive and safe place for our users to hang out and build community.

We’re stronger together

The safety landscape is continually shifting. There are always new challenges emerging, and no single entity can tackle them alone.

Our industry can lead by sharing tools, intelligence, and best practices. We also need to work collaboratively with policymakers, advocacy groups, and academics to fully understand emerging threats and respond to them. Initiatives like Lantern and organizations like the Tech Coalition represent the kind of collaborative efforts that make it possible for us all to better stay ahead of bad actors and work together towards a universally safe online experience for everyone.