React is a programming model that makes it easy to build scalable and performant apps. We use React at Discord.
We love React.
However, back in November 2018, on a what we’ll say was a rainy day for the sake of setting a mood, we employed a seemingly routine React upgrade.
This upgrade led us on an unexpected journey through our codebase, the Chrome memory inspector, and the React source code.
So what happened?
We tried to upgrade React from 16.5 to 16.7. After we shipped the upgrade, we started getting reports of users experiencing Discord using a lot of memory — something somewhere was leaking.
At that time, we briefly investigated and identified an issue with our React upgrade, which we quickly reverted.
Upgrading to React 16.7 wasn’t our priority at the time; it wasn’t a huge deal for us to revert. The memory leak was no longer and Discord carried on.
When React 16.8 was released with hooks, Discord engineers wanted to use those hooks. We love hooks. We really wanted this upgrade.
However, we never figured out what went wrong with the failed 16.7 upgrade attempt. If the problem returned, we would start leaking again. For us to upgrade to 16.8 in good conscience, we needed to solve the memory leak.
Here is where we get into the nitty gritty.
Tracking down the memory leak
Although it wasn’t always consistent, we could observe the memory leak by watching the V8 memory inspector while switching back and forth between two Discord channels. This is a route transition and causes many React components to unmount and mount.
A lot of time was spent in V8’s memory inspector. Memory cannot be freed as long as any “live” object holds a reference to it. In the above screenshot, you can see the leaking memory from a channel navigation.
The biggest offenders are React Fiber nodes (represented as minified Vr in the screenshot) and detached DOM elements. While we expected to find the answer at the end of the chain, it was often things like V8 internals that we didn’t understand.
We tried bisecting the app by commenting out swaths of the app to see if we could isolate the issue. We upgraded legacy code and removed the use of deprecated lifecycle methods. We learned some best practices, including not referencing DOM elements directly in props and state. We audited our older dependencies like Animated and TransitionGroup.
We also dug into the React source. The memory leak coincided with the introduction of React’s Fiber architecture. We observed that the memory retain graph usually consisted of long nextEffect chains and pointers from DOM nodes to Fiber nodes and vice versa.
If you’re curious, this is a good article that goes in depth on the Fiber implementation. We opened a pull request which cleared the nextEffect and stateNode references on unmount. This fork mitigated but didn’t completely fix our memory issue.
At one point, we discovered weird behavior with react-async-component where it seemed to re-render every nested child component on update. We fixed it by replacing it with React.lazy.
We tested the change and it seemed like the memory leak was gone. We felt we could safely ship 16.8.
So we rolled out to production and started to use hooks internally. Delicious.
This seems familiar…
Unfortunately for us, but thrillingly for you, the reader of this blog post, the story doesn’t end there.
A few reports of memory usage started trickling in again after the upgrade.
Although harder to reproduce, we were convinced the memory issue was still alive after our fix.
Discord using a lot of memory is unacceptable. The last thing we want is Discord causing a frame rate drop. We needed to find a fix immediately or roll back the React 16.8 upgrade. We had started using hooks so we wanted to avoid rolling back if possible, but couldn’t let this upgrade continue to hurt users.
At Discord, we employ a number of principles to solve problems. One is timeboxing — allocating a fixed amount of time for an activity. Although the React upgrade started as a simple task, it turned into one of unknown scope and energy.
Timeboxing is a great risk management tool. Curiosity was a driving motivator in this investigation. Timeboxing serves as a counterbalance and avoids following a hard problem down the rabbit hole. Or growing leaking memory hole. Whatever image serves you best.
We were near the end of our timebox. Continuing to poke around the codebase and upgrading components were unlikely to solve the problem. Instead, we revisited our React fork and added a patch to clear the DOM-to-fiber references on unmount.
This is most definitely a hack but it seemed to fix the memory leak. Phew.
We tested more thoroughly and asked some of the most active Discord Bug Hunters to go gangbusters on a custom build. The before-and-after was obvious.
While we would love to continue the investigation, we had to move on. Finding the root cause of the memory leak continues to be an unbounded activity.
A forked React, although not perfect, allows us to stay on 16.8, use hooks, and not compromise Discord’s performance.
The React team has been responsive in addressing these issues. Although there are a number of people reporting React memory issues, many have been unable to identify the root issue and create isolated repros. Sebastian from the React core team has opened an umbrella Github issue to track memory problems.
A little later, we were able to create a sandbox repro for leaking effect pointers. We’re happy to say that a portion of our fork (clearing nextEffect pointers) has made it upstream and should be a part of the React 16.9 release. The other changes require more investigation.
Some findings we found include:
- It is best practice not to reference DOM nodes and React instances in props or state. Use a container (e.g. ref) instead. Here is a repro of the issue: https://codesandbox.io/s/compassionate-fermat-3cq5b
- It’s important to fix leaks however small as they can potentially cascade with the current React Fiber implementation. Here is an example leak we found and fixed in react-dnd.
- Although not specific to React, we observed that video elements were being leaked in Chrome. We solved this by wrapping all video elements in a custom component. We should be able to remove this soon as a fix is in Chromium master.
We still don’t know exactly why we experienced memory leaks — we’ll hopefully answer that another day. With our fork, we’re confident that React will not cause the memory leaks we saw.
Discord will keep performing as it should and we get to keep using hooks ❤