I have a quite complex string that looks something like:
{infobox Country
|country = France
|map = <map lat='47' lng='1.5' zoom='5' view='0' height='320' country='France'/>
|language = French (regional languages: Alsatian, Occitan, Breton, Corsican, Basque, Catalan, ...)
|capital = [[Paris]]
|pop = 65,8 million
|currency = Euro (€)
|hitch = <rating country='fr' />
|BW = FR
}
What I want to do is to extract the data, ideally the format of a dictionary where let's say "country" is a key and "France" is a value. The same way for any other key.
How can I approach this? The only idea I have is to do it by using some string slicing and playing with indexes, but it feels like a terrible approach. There has to be some better way of doing this. Is there some way that I can model this the way json file can be done?
CodePudding user response:
If what you have is structured text, with a formal definition of what it means, then what you're looking at is a language, and it's just a case of writing a parser for that language. A parser being a function that can convert the input string into a T or describe the error(s). There are lots of different ways to write parsers, (mapping from strings to other data types is extremely common) but perhaps the best one is parser combinators, a topic that has been covered by both objc.io and pointfree.co
CodePudding user response:
I would first split it on the "|" and then create key/value pairs using the first "=" as delimiter
//Drop the surrounding {} and split the text on |
let rows = input.dropFirst().dropLast().split(separator: "|")
Special case for the first element "infobox..."
let title = rows.first
Then use reduce
to create the dictionary
let dictionary = rows.dropFirst().reduce(into: [String:String]()) {
guard let index = $1.firstIndex(of: "=") else { return }
let key = String($1[$1.startIndex..<index]).trimmingCharacters(in: .whitespaces)
let value = String($1[$1.index(after: index)..<$1.endIndex]).trimmingCharacters(in: .whitespacesAndNewlines)
$0[key] = value
}
print(title ?? "")
print(dictionary)
infobox Country
["capital": "[[Paris]]", "language": "French (regional languages: Alsatian, Occitan, Breton, Corsican, Basque, Catalan, ...)", "map": "<map lat='47' lng='1.5' zoom='5' view='0' height='320' country='France'/>", "BW": "FR", "pop": "65,8 million", "country": "France", "currency": "Euro (€)", "hitch": "<rating country='fr' />"]