Superscript of a String-CodePudding

I have a string of "28th and 8th. I will be working on this task". I want to super script the "th". I created a func, but this func only works fine if the last 3 letters of the first date and the second date are the different. In case they are the same, my func goes wrong

var exampletString = "28th and 8th. I will be working on this task"
var convertedAttrString = NSMutableAttributedString(string: exampletString)
func applySuperscriptAttributes(substring: String, originalString: String, convertedString: NSMutableAttributedString){
    if let subRange = originalString.range(of: substring){
        let convertedRange = NSRange(subRange, in: originalString)
        convertedString.setAttributes([.font: UIFont.systemFont(ofSize: 10, weight: .regular)], range: NSRange(location: convertedRange.location   convertedRange.length - 2, length: 2))
    }
}
applySuperscriptAttributes(substring: "28th", originalString: exampletString, convertedString: convertedAttrString)
applySuperscriptAttributes(substring: "8th", originalString: exampletString, convertedString: convertedAttrString)
// The result is "28th and 8th ..." with the "8th" not super script

CodePudding user response：

You are using the method String.range(of:). That method finds the FIRST occurrence of the substring in the target string.

The string "8th" appears as part of the first string "28th", which has already been been converted to superscripted.

You will need to write code that intelligently parses your string by words, looking for the word "8th", not a string of characters "8th" that could be in the middle of another word.

Edit:

There are actually lots of gotchas, edge cases, and tricky situations to deal with.

The first, which hung you up, is that you want to make sure your search string doesn't appear in the middle of a word.

@Bram's answer using regular expressions is one way to solve this problem, or at least part of this problem.

The regular expression bit d stands for "one or more digits." Then Bram inserted your suffix, "th", after that. So that regex will match a sequence of one or more digits followed by a suffix. Because the d part matches any number of digits, it matches 8, 28, or 20000008. It doesn't get fooled by numbers who's last digit is an 8, like your code does.

However. Bram's regular expression does not make sure that your number plus suffix is not in the middle of a larger word. If you had a sentence "Words words x8thz words words 8th" It would happily match the "8th" in the middle of the "word" "x8thz".

You could easily add to the regular expression so that it only detected 8th and other "digits followed by a suffix" cases if they were enclosed with "white space", but then what if the very first or last part of your string is match? The string "5th of his name" has a 5th in it, and it has white space after it, but not before it.

Even trickier is punctuation. Is punctuation part of a word? Is it white-space? The answer is "it's complicated." Example: Is ' a single quote, wrapping a word or phrase, or is it an apostrophe that is part of the word?

Composing regular expression to handle all of those cases is hard. Very hard.

The Foundation framework has a (rather old) string parsing function, enumerateSubstrings(in:options:using:) that lets you step through substrings of a larger string. It has an options parameter that lets you enumerate your string various ways, including .byWords. The .byWords option is quite smart, and handles very complex case for figuring out what is part of a word and what is whitespace/punctuation.

However, it is a function of the old Foundation class NSString, and is written in Objective-C. That makes it a bit of a pain to use. You have to cast your String to an NSString to use it. Worse, it takes a closure that handles each substring that's enumerated, and one of the parameters to that closure is an Objective-C boolean passed by reference. You have to set that parameter to true to get the enumeration to stop. Swift maps that parameter to an unsafe mutable pointer to a bool, or UnsafeMutablePointer<ObjCBool>. Those are a pain to deal with.

Here is some sample code that finds occurrences of "8th" in a sentence using enumerateSubstrings(in:options:using:):

extension String {
    func nsRangeOfWord(_ word: String, printWords: Bool = false) -> NSRange? {
        let range = NSRange(location: 0, length: count)

        var result: NSRange? = nil
        (self as NSString).enumerateSubstrings(in: range, options: .byWords) { (substring, wordRange, _, stop) -> () in
            if let aWord = substring {
                if printWords {
                    print(aWord)
                }
                if aWord == word {
                    result = wordRange
                    stop.pointee = true
                }
            }
        }
        return result
    }
}

func searchForWord(_ word: String, inSentence sentence: String) {
    print("Searching string '\(sentence)' for word '\(word)'")
    if let foundRange = sentence.nsRangeOfWord(word) {
        print("Found String '"   (sentence as NSString).substring(with: foundRange)   "' at range \(foundRange)")
    } else {
        print("Word '\(word)' Not found in string '\(sentence)'")
    }

}
let simpleSentence = "28th and 8th. I will be working on this task"
let trickySentence = "28th and x8thm 8th. I will be working on this task"
let complexSentence = "The dog said 'The crux of the biscuit is the apostrophe.' And the man said 'You can't say that! You isn't! You... doesn't. You wouldn't. You couldnt; you shouldn't! 18th and 8th.'"

searchForWord("8th", inSentence: simpleSentence)
searchForWord("8th", inSentence: trickySentence)
searchForWord("8th", inSentence: complexSentence)

The String extension nsRangeOfWord(_:printWords:):

func nsRangeOfWord(_ word: String, printWords: Bool = false) -> NSRange?

Attempts to find the first occurrence of your word in the target String, as a separate word. if it finds your word, it returns the NSRange in the string where it is found.

It is not fooled by either "2008th" or "8thing".

It is also smart enough to treat punctuation as a word delimiter, and detect the difference between ' when it's used as quotes vs when it's used as an apostrophe.

In the fragment "The dog said 'The..." the dog's quote is enclosed in single-quotes. The bit 'The is the word "the" with a preceding single quote. That ' is not part of the word. However, in the word isn't, the ' is an apostrophe that IS part of the word.

However, unlike the regex function, this code doesn't automatically find any string of digits followed by a suffix like "th". It would take more work to do that.

A more modern function built into iOS/MacOS is the Natural Language framework. It is much more powerful than either RegEx or the NSString function enumerateSubstrings(in:options:using:), and includes the ability to break up natural language into words, or "tokenize" the text.

Natural language code very similar to the nsRangeOfWord() function above looks like this:

extension String {
    func nsRange(from range: Range<String.Index>) -> NSRange {
        let startPos = self.distance(from: self.startIndex, to: range.lowerBound)
        let endPos = self.distance(from: self.startIndex, to: range.upperBound)
        return NSMakeRange(startPos, endPos - startPos)
    }

    func rangeOfWord(_ word: String, printWords: Bool = false) ->  Range<String.Index>? {

        let tokenizer = NLTokenizer(unit: .word)
        tokenizer.string = self
        var result:  Range<String.Index>? = nil

        print("Searching string \"\(self)\"")
        tokenizer.enumerateTokens(in: self.startIndex..<self.endIndex) { tokenRange, _ in
            let aWord = self[tokenRange]
            if printWords {
                print("\"\(aWord)\"")
            }
            if aWord == word {
                result = tokenRange
                return false
            }
            return true
        }
        return result
    }
}

func naturalLanguageSearchForWord(_ word: String, inSentence sentence: String, printWords:Bool = false) {
    print("Using natural language to search for word `\(word)`\nin sentence '\(sentence)'")
    if let wordRange = sentence.rangeOfWord(word, printWords: printWords) {
        let nsRange: NSRange = sentence.nsRange(from: wordRange)
        print("Found word \(sentence[wordRange]) at offset \(nsRange.location), length \(nsRange.length)")
    } else {
        print("not found")
    }

}

naturalLanguageSearchForWord("8th", inSentence: simpleSentence)
naturalLanguageSearchForWord("8th", inSentence: trickySentence)
naturalLanguageSearchForWord("8th", inSentence: complexSentence)

It yields the same results as the code based on enumerateSubstrings(in:options:using:)

The Natural Language tokenizer returns ranges as String ranges (type Range<String.Index>.) That is the "Swifty" way to deal with ranges in String objects, and for purely Swift code that uses Strings, it is preferred. However, your code to build and modify NSMutableAttributedString uses old Objective-C Foundation functions that want NSRanges. I therefore created an extension to String that will convert a String range to an NSRange.

CodePudding user response：

You can use regex to solve this issue.

func applySuperscriptAttributes(substring: String, originalString: String, convertedString: NSMutableAttributedString) {
    let regex = try! NSRegularExpression(pattern: "\\d \(substring)")
    let matches = regex.matches(in: originalString, range: NSRange(location: 0, length: originalString.count))

    matches.forEach { match in
        let convertedRange = match.range
        convertedString.setAttributes([.font: UIFont.systemFont(ofSize: 10, weight: .regular)], range: NSRange(location: convertedRange.location   convertedRange.length - substring.count, length: substring.count))
    }
}

It also has support for additional 1st, 2nd, 3rd suffixes:

let st = NSMutableAttributedString(string: "The 1st stast 181st")
applySuperscriptAttributes(substring: "st", originalString: output.string, convertedString: st)
let nd = NSMutableAttributedString(string: "The 2nd ndand 181nd")
applySuperscriptAttributes(substring: "nd", originalString: output.string, convertedString: nd)
let rd = NSMutableAttributedString(string: "The 3rd rdard 181rd")
applySuperscriptAttributes(substring: "rd", originalString: output.string, convertedString: rd)
let th = NSMutableAttributedString(string: "The 4th thath 28th")
applySuperscriptAttributes(substring: "th", originalString: output.string, convertedString: th)