Despite being a voice and text chat app, Discord sees over a hundred million images passing through its tubes every day. While we wish it was as simple as sending them out across the tubes to your friends, delivering these images creates some pretty large technical problems. Linking directly to images would leak users’ IP addresses to image hosts, and large images use up lots of bandwidth. To circumvent these problems, Discord needs an intermediary service to fetch images on behalf of users and then resize them to reduce bandwidth usage.
Enter the Image Proxy
To handle this job, we created a Python service creatively named Image Proxy. It fetched images from remote URLs and then used the pillow-simd package to do the heavy lifting of image resizing. Pillow-simd is wonderfully fast and uses x86 SSE instructions to accelerate resizing where it can. The Image Proxy would receive a HTTP request containing a URL to fetch, resize, and finally respond with the image.
On top of this, we setup a caching layer that would try to keep resized images around in memory and respond directly from cache when it could. A HAProxy layer routes requests based on a URL hash to the Nginx caching layer. The cache performs request coalescing in order to minimize the number of resize transformations required. This combination of cache and proxy was enough to scale our image proxy up well into millions of users.
Still, as Discord grew, the Image Proxy started to show signs of strain. The biggest problem was the Image Proxy did not have an even workload distribution, which hampered its throughput. Image proxying requests saw a wide variance of response times, with some taking multiple seconds to complete. We likely could have addressed this behavior in Image Proxy, but we had been experimenting with using more Go, and it seemed like a good place to try Go out.
And Then Came Media Proxy
It’s not an easy decision to rewrite a service that’s already working. Thankfully, Image Proxy was relatively simple, and comparing results between it and the replacement service would be straightforward. In addition to serving requests faster, the new service would also get new features, including the ability to get the first frame from .mp4 and .webm videos — hence, Media Proxy.
We began by benchmarking existing image resizing packages for Go and quickly discovered something disheartening. While Go is generally a faster language than Python, none of the resizing packages we could find could beat Pillow-simd consistently in performance. Most of the work done by the Image Proxy was in the image transcoding and resizing, so this would be a significant bottleneck in Media Proxy if it were slower. Go might be a little faster at handling HTTP, but if it can’t resize images quickly, the extra speedup would be lost to the extra time spent resizing.
We decided to double down and put together our own image resizing package for Go. We had seen some promise when we benchmarked one Go package that wrapped OpenCV, but that package didn’t support all of the features we wanted. We created our own Go image resizer, named Lilliput, which has its own Cgo wrapper on top of OpenCV. Lilliput has been made with careful consideration toward not creating garbage in Go. Lilliput’s OpenCV wrapper does almost everything we want, though we still had to fork OpenCV slightly before we were happy with it. In particular, we wanted to be able to inspect image headers before deciding whether to start decompressing them, since this would allow us to immediately refuse to resize any that are too large.
Lilliput uses the existing and mature C libraries for image compression and decompression (e.g. libjpeg-turbo for JPEG, libpng for PNG) and OpenCV’s fast vectorized resizing code. We added fasthttp to handle our concurrent HTTP client and server requirements. Finally with this combination we had a service which steadily beat Image Proxy in synthetic benchmarks. When comparing lilliput to pillow-simd, we found that lilliput performed as well as or better than pillow-simd in the use cases we care about.
The early code was not without issues. Initially, Media Proxy would leak 16 bytes on each request. This is a small enough loss that it takes quite a while to manifest, especially when testing at a small scale. Compounding the issue, Media Proxy keeps large static pixel buffers around for resizing purposes. It uses two of these buffers per CPU, so on a 32-core host its initial memory usage is several GB. It would take hours for Media Proxy to be restarted due to exhausting all of the system memory during testing. This was a long enough duration that it was hard to tell if we actually had a memory leak or if runtime usage was just putting us over the limit.
Eventually, we concluded that there must indeed be some kind of memory leak. We weren’t sure whether the leak was present in Go or C++, and reviewing the code failed to turn up the source of the leak. Fortunately, Xcode ships with an outstanding memory profiler — the Leaks tool in Instruments. This tool revealed the size of the leak and approximately where it was occurring. This was enough of a hint that further review allowed us to identify and fix the leak.
We encountered another showstopper bug in Media Proxy. Sometimes it would respond with strangely corrupted images where half of the image would be correct and the other half would appear “glitched”. We initially suspected that we might have been decoding partially retrieved images or somehow calling OpenCV incorrectly. This bug occurred infrequently and was hard to diagnose.
In order to deal with this, we developed a high throughput request simulator that provided image URLs that linked to an HTTP server within the simulator, so that the simulator would act both as requesting client and hosting server. The simulator randomly delayed its responses in order to provoke this strange image corrupting behavior from Media Proxy. With a reliable reproduction, we were able to isolate components in Media Proxy until we discovered a race condition on the output buffer that contained the resized image. We had been writing one image to this buffer and then writing another to the same buffer before the first had finished being put back on the network. The glitched images we had seen were actually two JPEGs written on top of one another.
Another way to discover bugs in a complex system is fuzzing, which is a technique that generates random inputs and sends them into a system. This can cause the system to exhibit strange behavior or crash, and since our system needs to be resilient against all inputs, we decided to utilize this important technique while testing. AFL is an exceptionally good fuzzer, so we picked it and ran it against Lilliput, which revealed several crashes due to uninitialized variables.
After fixing the above bugs, we were confident enough to ship Media Proxy to production and were happy to find that our work had paid off. Media Proxy needed 60% fewer server instances to handle as many requests as Image Proxy while completing requests with much less variance in latency. Profiling shows that more than 90% of CPU time in this new service is spent performing image decompression, resizing, and compression. These libraries are already highly optimized, suggesting further gains would not be easily achievable. Additionally, the service creates almost no garbage at runtime.
Today, Media Proxy operates with a median per-image resize of 25ms and a median total response latency of 85ms. It resizes more than 150 million images every day. Media Proxy runs on an autoscaled GCE group of n1-standard-16 host type, peaking at 12 instances on a typical day.
Putting the Media in Media Proxy
After we had static images working, we wanted to support animated GIF resizing as well, which OpenCV would not handle for us. We decided to add another Cgo wrapper on top of giflib to Lilliput so that it could resize full GIFs, as well as output the first frame as PNG.
Resizing GIFs turned out to be somewhat challenging as the GIF standard specifies per-frame palettes of 256 colors, but the resizer operates in RGB space. We decided to preserve each frame’s palette rather than attempting to recompute new palettes. In order to convert RGB back into palette indices, we gave Lilliput a simple lookup table that crushes some of the RGB bits and uses the result as a key into a palette index table. This performs well and preserves the original colors, though it does mean that Lilliput can only create a GIF from a source GIF.
We also patched giflib so that it would be easier to decode just a single frame at a time. This allows us to decode one frame, resize it, and then encode and compress it before moving on to the next, reducing the memory footprint of the GIF resizer. This does add some complexity to Lilliput as it must preserve some GIF state from frame to frame, but having more predictable memory usage in Media Proxy seems like a clear advantage.
Lilliput’s giflib wrapper fixed a number of issues we had previously seen in Image Proxy’s GIF resizing as giflib gave us full control of the image resizing process. A significant number of our Nitro users had uploaded animated GIF avatars which would have glitches or transparency errors when resized by the Image Proxy but which worked perfectly through Media Proxy. In general, we found that image resizers had problems with some aspects of the GIF format and produced visual glitches for frames with transparency or partial frames. Creating our own wrapper allows us to address these issues as we encounter them.
Finally, we gave Lilliput a Cgo wrapper on libavcodec so that it could freeze the first frame from MP4 and WEBM videos. This functionality will allow Media Proxy to give previews of user posted videos so that users can decide from the preview whether they want to play the video. Freezing the first frame of videos was one of the remaining blockers for us to add an in-client video player for videos in message attachments and links.
More Open Source
Now that we’re satisfied with Media Proxy, we’re releasing Lilliput under MIT license. We hope that this package will be useful for anybody who needs a performant image resizing service, and that this post will help others build new Go packages.
We are hiring, so come join us if this type of stuff tickles your fancy.