Home > Enterprise >  How to discover if a c-string can be encoded to NSString with a given encoding
How to discover if a c-string can be encoded to NSString with a given encoding

Time:10-21

I am trying to implement code that converts const char * to NSString. I would like to try multiple encodings in a specified order until I find one that works. Unfortunately, all the initWith... methods on NSString say that the results are undefined if the encoding doesn't work.

In particular, (sometimes) I would like to try first to encode as NSMacOSRomanStringEncoding which never seems to fail. Instead it just encodes gobbledygook. Is there some kind of check I can perform ahead of time? (Like canBeConvertedToEncoding but in the other direction?)

CodePudding user response:

Instead of trying encodings one by one until you find a match, consider asking NSString to help you out here by using [NSString stringEncodingForData:encodingOptions:convertedString:usedLossyConversion:], which, given string data and some options, may be able to detect the encoding for you, and return it (along with the actual decoded string).

Specifically for your use-case, since you have a list of encodings you'd like to try, the encodingOptions parameter will allow you to pass those encodings in using the NSStringEncodingDetectionSuggestedEncodingsKey.

So, given a C string and some possible encoding options, you might be able to do something like:

NSString *decodeCString(const char *source, NSArray<NSNumber *> *encodings) {
    NSData * const cStringData = [NSData dataWithBytesNoCopy:(void *)source length:strlen(source) freeWhenDone:NO];
    
    NSString *result = nil;
    BOOL usedLossyConversion = NO;
    NSStringEncoding determinedEncoding = [NSString stringEncodingForData:cStringData
                                                          encodingOptions:@{NSStringEncodingDetectionSuggestedEncodingsKey: encodings,
                                                                            NSStringEncodingDetectionUseOnlySuggestedEncodingsKey: @YES}
                                                          convertedString:&result
                                                      usedLossyConversion:&usedLossyConversion];
    
    /* Decide whether to do anything with `usedLossyConversion` and `determinedEncoding. */
    return result;
}

Example usage:

NSString *result = decodeCString("Hello, world!", @[@(NSShiftJISStringEncoding), @(NSMacOSRomanStringEncoding), @(NSASCIIStringEncoding)]);
NSLog(@"%@", result); // => "Hello, world!"

If you don't 100% care about using only the list of encodings you want to try, you can drop the NSStringEncodingDetectionUseOnlySuggestedEncodingsKey option.


One thing to note about the encoding array you pass in: although the documentation doesn't promise that the suggested encodings are attempted in order, spelunking through the disassembly of the (current) method implementation shows that the array is enumerated using fast enumeration (i.e., in order). I can imagine that this could change in the future (or have been different in the past) so if this is somehow a hard requirement for you, you could theoretically work around it by repeatedly calling stringEncodingForData:encodingOptions:convertedString:usedLossyConversion: one encoding at a time in order, but this would likely be incredibly expensive given the complexity of this method.

  • Related