Parsing JSON Safely
February, 2021
In this article we’re going to look at how to make sure that your app can handle an unexpected server response correctly. We’ll start with a simple, type-safe way to extract data from a payload, and later we’ll use the Objective-C runtime to instantiate our data models from JSON automatically.
But first...
A word on defensive programming
I don’t just want to focus on how to prevent these specific crashes, but also to talk about the mindset of defensive programming.
The stability and security of your app depends on how defensively you code against different kinds of potential bugs or vulnerabilities.
Like a lot of things in life, defensiveness is a trade-off. You don’t want your app crashing all the time because you made a lot of optimistic assumptions, but you also don’t want to be overly paranoid and spend a lot of time protecting against situations that almost never happen.
Some different defensive programming scenarios:
- Should you add assertions to your code to ensure that required conditions are met? Yes. Assertions are cheap and effective at finding “this should never happen” problems.
- Should you design your APIs to be simple, to have sensible defaults, and to disallow invalid usage? Sure. The earlier you can catch developer mistakes, the better. Assume no-one will read the documentation!
- Should you validate user input before processing it? Definitely. Keep in mind that the vast majority of bad input will be typos or other innocent mistakes, so you don’t always have to be ultra-strict. Checking that a phone number has an area code is fine, but trying to validate email addresses with some hugely complicated regex is only going to block some real emails eventually.
- How about defending against a malicious user with a jailbroken device? Usually not worth it. With physical access to unlocked hardware, all bets are off security-wise. There’s very little you can do to protect against this, so it’s rarely worth going to extreme lengths to do so. Just sticking with Apple’s security best practices is enough.
- What about sanity checks on the OS, like making sure
alloc
/init
doesn’t returnnil
? Nope. If memory allocation fails when creating a new object, then the device is out of memory or something else is very wrong, and your app is about to crash anyway. There’s next to nothing you can do to recover from this, so why spend time protecting against it?
- Should you ensure that the server is sending a response in the right format? Absolutely! Even if you control the server that your app talks to, you still shouldn’t blindly trust that it will always send what you expect. Server code and databases can change, and the developers making those changes might not be aware what format your app is expecting to receive.
Let’s first look at an example of handling server responses in an unsafe manner, and then some ways we can improve on it.
Unsafe payload processing
Let’s say you’re implementing a login method in your app. The user enters their credentials, which are then sent to your server for validation. If successful, the server responds with a payload like this:
{
"id": 123,
"name": "Joe User",
"is_premium": true,
"lang_skills": "objc,swift,python"
}
Seems pretty straightforward to parse — we might use something like this:
NSUInteger userId = [payload[@"id"] unsignedIntegerValue];
NSString *name = payload[@"name"];
BOOL isPremium = [payload[@"is_premium"] boolValue];
NSArray *langs = [payload[@"lang_skills"] componentsSeparatedByString:@","];
However there are several ways this code could realistically fail at some point:
- The
id
field in your user databases changes from a number to a string, maybe to use UUIDs instead.NSString
doesn’t have a method calledunsignedIntegerValue
... crash! 💥 - The server is rewritten in a new language which treats
false
asnull
. Theis_premium
field now sometimes containsNSNull
, which of course crashes when you try to callboolValue
on it. - A backend developer changes the
lang_skills
field from a comma-separated string into an array of strings. Now the call tocomponentsSeparatedByString
on anNSArray
will crash.
Even worse, maybe the scripting language used on your backend doesn’t use strict typing, and sends all kinds of unexpected values, such as 0
or null
instead of empty strings.
What’s the big deal?
The failure scenarios above might seem contrived and unlikely — why would they change the data format without telling us? — but these are not just hypothetical. They may not happen often, but when they do they can potentially have a huge impact.
In May 2020, Facebook made a small change to the backend data format for their mobile SDK. Subsequently, their SDK caused many major apps to start crashing immediately on launch. The developers of these crippled apps, which have a combined hundreds of millions of users, could do nothing until Facebook fixed the issue.
The code in the Facebook SDK that parsed the payload looks like this:
if (restrictiveParams[eventName][@"is_deprecated_event"]) {
...
}
The flawed assumption was that restrictiveParams[eventName]
would always be a dictionary, so that using a subscript on it would be fine. The problem happened when that particular field was changed to a boolean on the server...
-[__NSCFBoolean objectForKeyedSubscript:]: unrecognized selector
Oops.
This is definitely not defensive programming, and is frankly inexcusable for code this critical.
Actually there are two bugs on this single line! It checks whether the
is_deprecated_event
value is nil or not, instead of testing whether it is true or false as a boolean value.
The app that I work on also used to be plagued with these malformed payload problems, which caused millions of easily-preventable crashes over a few years.
We can do better than this when parsing our payloads.
Update
Since I started writing this article, the same issue happened again in July 2020, where an unexpected payload caused millions more crashes in the Facebook SDK! 😳
This time, it was an object that was assumed to be an NSDictionary
but was actually NSNull
, which of course crashed when count
was called on this object — almost exactly like an example I gave above. Here’s the Github issue with details of the second incident.
Simple type checking
So how do we prevent this happening in our code? Well we can just check the type of each field before using it:
NSUInteger userId = 0;
id userIdVal = payload[@"id"];
if ([userIdVal isKindOfClass:[NSNumber class]]) {
userId = [userIdVal unsignedIntegerValue];
} else {
// report an error
}
But that would get tedious very quickly. Can we centralise this type checking logic? Why not encapsulate it into NSDictionary
itself?
Typed dictionary values
Normally [NSDictionary objectForKey:]
returns an id
or nil
, regardless of what type of object is stored with that key. We can create some extension methods to NSDictionary
which will only return an object if it is a specific type, or nil
otherwise.
We’ll start with two methods to read strings and numbers:
@interface NSDictionary (TypedValues)
- (nullable NSString *)stringForKey:(NSString *)key;
- (nullable NSNumber *)numberForKey:(NSString *)key;
@end
@implementation NSDictionary (TypedValues)
- (NSString *)stringForKey:(NSString *)key {
id val = self[key];
return [val isKindOfClass:NSString.class]? val : nil;
}
- (NSNumber *)numberForKey:(NSString *)key {
id val = self[key];
return [val isKindOfClass:NSNumber.class]? val : nil;
}
@end
Note that dictionary keys can be any object that conforms to NSCopying, but we’ll only accept strings because that’s all that JSON supports.
Now we can parse the payload like this:
NSUInteger userId = [payload numberForKey:@"id"].unsignedIntegerValue;
NSString *name = [payload stringForKey:@"name"];
BOOL isPremium = [payload numberForKey:@"is_premium"].boolValue;
NSArray *langs = [payload stringForKey:@"lang_skills"] componentsSeparatedByString:@","];
Just using these simple helper methods has made our JSON parsing much safer. Processing this response payload can now practically never crash our app, no matter what corrupted data the server throws at us (assuming that you checked that payload
was actually a dictionary first).
You still have to ensure that you didn’t get any unexpected nil
values, but at least you’ve got the ability to check that and act on it, instead of assuming it’ll be fine.
Strict versus lenient parsing
The way we’ve implemented the helper methods is quite strict when it comes to data types. Meaning that the numberForKey
method will only return a value if it is represented in the JSON as a number type.
But what if the server starts returning numeric user IDs encoded in strings? For example “123”
instead of 123
. Is that reason enough to fail the parsing, or could we be a little more lenient? For numbers, we could attempt to extract a numeric value from the payload, regardless of the actual format, like so:
- (NSInteger)integerForKey:(NSString *)key {
return [self numberForKey:key].integerValue ?:
[self stringForKey:key].integerValue;
}
This method will first test if the JSON value is a number, otherwise check for a string and parse that as an integer. If neither is true, zero will be returned, which might be a reasonable default for missing or invalid data in that field.
Sometimes however this “looser” parsing can have unexpected side effects. Say the user ID field contained the value “123ABC”
. You would probably want your app to raise an error because this format is unknown, but integerForKey
would happily return 123
and discard the rest of the string.
So there isn’t one perfect solution to decide between strict or lenient parsing, it depends on your particular situation. It might be that strict parsing is better for you because it could highlight problems earlier, instead of silently handling them.
Note that Swift’s
JSONDecoder
is very strict, and will fail unless it finds the exact data type it expected. If our payload above contained“is_premium”: 1
, this would throw an exception in Swift because it only acceptstrue
orfalse
for boolean fields.
Automated safe parsing
Typed dictionary access is a good, simple solution for a small app, or when you can ensure that payloads will only ever be used via these methods. For the app I work on, using typed access resulted in a huge drop in the number of crashes. But people aren’t perfect and habits are hard to break, and over time more unsafe parsing crept back into the code.
I figured that the risk could be eliminated entirely if the network layer prevented any access to the raw JSON payloads.
What if, instead of manually having to parse the response JSON ourselves, we declared the payload format as a data model? So for the login example above, we would have:
@interface LoginResult: JSONModel
@property (readonly) NSUInteger userId;
@property (readonly) NSString *name;
@property (readonly) BOOL isPremium;
@property (readonly) NSString *languageSkills;
@end
The JSONModel
superclass should be able to automatically instantiate any subclass from a JSON payload for us. Our network API would change from something like this:
- (void)request:(NSURL *)url
completion:(void(^)(id result))completion;
To instead return a JSONModel
subclass which we specify:
- (void)request:(NSURL *)url
asType:(Class)resultType // must be a subclass of JSONModel
completion:(void(^)(__kindof JSONModel *result))completion;
By doing this we’ll have completely wiped out payload crashes, since it will no longer be possible to access the raw JSON in an unsafe way.
So how can we automatically create an object from a JSON structure? Well as you might have guessed, we can do that using the Objective-C runtime to introspect the properties of the class. 👍
Implementing JSONModel
We’ll be “filling in” a new JSONModel
subclass instance with the JSON data using these steps:
- Introspect the subclass to find out which properties it has
- For each property, check if the JSON payload has a field with the same name
- Check that the property type matches the JSON value
- If the name and type matches, set the property value
The introspection step would normally be the trickiest part, but luckily we already solved that in Inspecting Objective-C Properties 😄
We’ll be using the ClassProperty
class from that article, and also TypeEncoding
from Type Encodings Explained, so make sure you understand how they work first.
Our JSONModel
class will need a method to create an instance from a JSON dictionary. It should return nil
if the JSON object does not contain all the values of the correct types.
@implementation JSONModel
+ (instancetype)parsedFromJSON:(id)jsonObj {
if ([jsonObj isKindOfClass:NSDictionary.class] == NO) {
return nil;
}
NSDictionary *jsonDict = (NSDictionary *)jsonObj;
// Create a new instance of the subclass to fill in
JSONModel *obj = [self new];
// Introspect our properties
NSArray<ClassProperty *> *props = [self classProperties];
That’s step 1 done. Now we have to iterate over each property and see if there’s a corresponding JSON value with the same type. Objects parsed by NSJSONSerialization
can only return a small number of types, so we don’t need to handle tons of cases like we did in Implementing KVC.
The possible types in JSON are NSDictionary
, NSArray
, NSString
, NSNumber
and NSNull
. For now, we’ll just check if the value type and our property type are the exact same kind of class, and if so, use KVC to set the value:
for (ClassProperty *prop in props) {
id jsonVal = jsonDict[prop.name];
if (jsonVal == nil) {
NSLog(@"No value found for property %@", prop.name);
return nil;
}
if (prop.type.classType && [jsonVal isKindOfClass:prop.type.classType]) {
[obj setValue:jsonVal forKey:prop.name];
} else {
NSLog(@"Couldn't load value for %@", prop.name);
}
}
return obj;
}
That’s basically all there is to automatic JSON parsing! But we’re not quite there yet. If you tried running this with our example login result payload above with [LoginResult parsedFromJSON:]
, it wouldn’t work.
This is because userId
and isPremium
are not class types, but scalars (NSUInteger
and BOOL
), so would fail when we test for prop.type.classType
. We’ll need to handle scalar types specially.
Supporting scalar types
We can check if a property has a simple numeric type by testing if isIntType
(which includes BOOL
) or isFloatType
on prop.type
. However, this value would be parsed from the JSON payload as an NSNumber
, so how can we convert that to say, NSUInteger userId
? Do we need to handle each of the dozen scalar types after all?
Well KVC handles unboxing of scalar values automatically, so again we’re in luck! We can simply call setValue:forKey:
with the NSNumber
and it will convert it to the correct scalar type for us.
Let’s add another case into that type check before the else
:
} else if ((prop.type.isIntType || prop.type.isFloatType)
&& [jsonVal isKindOfClass:NSNumber.class]) {
[obj setValue:jsonVal forKey:prop.name];
} else {
// ...
So now we’re able to handle JSON values of any class or scalar type.
The LoginResult
object is still not being parsed correctly though. The error message is “No value found for property userId”.
Oh wait... the field name in the JSON payload is “id”, not “userId” 😅
We’ll need to support properties that have different names to the JSON keys.
Mapping JSON fields to properties
Custom conversion between payload keys and model fields is quite common, since JSON keys typically use snake_case and ObjC properties are camelCase. Swift’s Codable
provides this functionality explicitly via the CodingKeys
enum, or automatically as an option of JSONDecoder
. We can do it with a simple string-to-string mapping.
This mapping will be specific to each JSONModel
subclass, so we could implement it using a class method which can be overridden:
+ (NSString *)jsonKeyForProperty:(NSString *)propName {
// don't do any mapping by default
return propName;
}
Now we can change this line from above:
// id jsonVal = jsonDict[prop.name];
NSString *jsonKey = [self jsonKeyForProperty:prop.name];
id jsonVal = jsonDict[jsonKey];
In our LoginResult
class, we can specify the mapping like so:
@implementation LoginResult
+ (NSString *)jsonKeyForProperty:(NSString *)propName {
NSDictionary *mapping = @{
@"userId": @"id",
@"isPremium": @"is_premium",
@"languageSkills": @"lang_skills",
};
return mapping[propName] ?: [super jsonKeyForProperty:propName];
}
@end
Now calling [LoginResult parsedFromJSON:]
will return a fully populated LoginResult
object!
NSDictionary *payload = @{
@"id": @123,
@"name": @"Joe User",
@"is_premium": @YES,
@"lang_skills": @"objc,swift,python",
};
LoginResult *result = [LoginResult parsedFromJSON:payload];
for (ClassProperty *prop in LoginResult.classProperties) {
NSLog(@"%@ = %@", prop.name, [result valueForKey:prop.name]);
}
// Prints:
//
// userId = 123
// name = Joe User
// isPremium = 1
// languageSkills = objc,swift,python
Setting readonly properties
Sharp-eyed readers might be asking here “hang on, the properties in LoginResult
are readonly
, how can setting their values like this work??” 👀
Good question! As it turns out, KVC has another handy trick, which is to set the ivar directly if no setter method exists:
If no simple accessor is found, and if the class method
accessInstanceVariablesDirectly
returnsYES
, look for an instance variable with a name like_<key>
,_is<Key>
,<key>
, oris<Key>
, in that order. If found, set the variable directly with the input value (or unwrapped value) and finish.
— Key-Value Coding Programming Guide
This means that when we call setValue:forKey:
on a read-only property, KVC will notice there is no matching setter, and put the value straight into the ivar backing the property instead. So it turns out that “read only” doesn’t mean much and you can write to any readonly
property from outside a class 😱
Of course, you can break all kinds of things when you use a dynamic language like Objective-C if you really want to. In this case, you can just override accessInstanceVariablesDirectly
to return NO
.
Wrapping up
In this article we’ve implemented a basic JSON parser using runtime reflection, but there are other handy features you could add:
- Support for lenient decoding, such as extracting numbers from strings, or skipping a property if a matching value is not found.
- Other data types such as
NSURL
orNSDate
can be automatically parsed from strings and numbers in the JSON. - Safe parsing of typed arrays, for example testing that a
NSArray<NSString *>
property actually contains only strings. This one is a little tricker, because details of generics are not available to the runtime, so you’d have to specify what type an array is meant to contain in the model. - Support for nested objects. If the type of a property happens to be a subclass of
JSONModel
, we should automatically populate that from a nested JSON dictionary. - Error handling for different situations, like when a JSON value is found but it’s the wrong type.
- More robust property setting, like using try/catch around
setValue:forKey:
.
If any Facebook engineers are reading, feel free to use this in your SDKs… in fact we’d really appreciate if you did 😂
Useful links
- A Bugsnag article about the Facebook SDK crash
- The buggy Facebook SDK source
- The Github ticket about the SDK crash (edit: the first SDK crash!)
- The Github ticket about the second SDK crash
- Accessor patterns in Apple’s KVC Programming Guide
The source code from this post can be found on Github. If you spot a bug, please create an issue on there.
Any comments or questions about this post? ✉️ nick @ this domain.
— Nick Randall