How Facebook Cut Down the Crashes in Its iOS App

Facebook used a process-of-elimination approach, including internal tooling and migration to new technologies, to reduce the crashes in its iOS app.

Facebook logo

The engineering team at Facebook is constantly tuning its systems and apps to run more smoothly and better suit users, particularly those on mobile devices.

To that end the Facebook team set out to reduce the number of crashes in the Facebook iOS app to increase its reliability.

“In the past, most of the crashes have been due to programmatic errors, and they always came with a stack trace that blamed the line in the code that caused the crash and always offered a hint as to what the issue might be,” said a post on the Facebook engineering blog written by Ali Ansari and Grzegorz Pstrucha, two engineers at Facebook.

According to the post, Facebook witnessed a drop in its measured crash rate but also noticed from App Store reviews that the community was still frustrated with the app crashing.

“We dug into the user reports and began to theorize that out-of-memory events (OOMs) might be happening,” the post said. “OOMs occur when the system runs low on memory and the OS kills the app to reclaim memory. It can happen whether the app is in the foreground or the background. We refer to these internally as FOOMs and BOOMs, respectively — it's just a bit more fun to say that the app went BOOM!”

Facebook eventually fixed the problem with a combination of internal tooling, migration to the newest iOS technologies and some additional cleverness that helped the company accurately measure crash and reliability problems in the first place.

To get a handle on how often their app was being terminated due to OOM crashes, Facebook started counting all the known paths through which the application could be terminated and then logging them, the blog said. The question the team looked into was “What can cause the application to start up?” The company came up with six reasons why an app could need to start up:

* The app was upgraded.

* The app called exit or abort.

* The app crashed.

* A user swiped up to force-exit the application.

* The device restarted (which includes an OS upgrade).

* The app ran out of memory (an OOM) in the background or the foreground.

“By process of elimination, looking for instances that didn't fall into the other cases, we could then figure out when an OOM had occurred,” the post said. “We also kept track of when the app backgrounded and foregrounded so that we could accurately break down OOMs into BOOMs and FOOMs, respectively.”

The logging showed that there was a higher rate of OOMs on devices with less memory, “which was expected and reassuring since the application process was more likely to be evicted on a constrained-memory device,” the post said.

The team’s first effort to reduce the number of OOMs was to attempt to shrink the memory footprint of the app by proactively de-allocating memory as quickly as they could, as soon as it was no longer needed.

Fixing the leaks led to some reduction in the OOM crash rate, but not the significant reduction the team was hoping for. So, “Next, we dived into the memory profiler in Apple's Instruments application and noticed a repeated pattern of UIWebView allocating a lot of memory once the application opened any Web page. We also found that the memory was often not reclaimed, even after the user navigated away from the page and the web view was closed.”