Python - extract "fields" from the string without specific separator-CodePudding

I am putting three fields author, epoch_date and text in one string textData They are separated with new row. Of course text can contain multiple rows or some special characters (#, blank spaces etc.)

textData="author:john" "\n" "epoch_date:53636352" "\n" "text:comment on_the_first_line \n line_in_the_middle \n line_at_the_end"

Now I am trying convinient way to extract those data in the easiest way in separate fields. I do not see how to do it with splitlines() since it will print me 5 rows instead of 3 because as mentioned text field can contain multiple rows.

for line in textData.splitlines():
    print(line)

Also if there is better way to define textData field I am able to modify that part as well. Note: python 2.7 must be used, unfortunately.

Thanks

CodePudding user response：

Instead of splitlines, which splits all the lines, manually split by \n with an additional maxsplit parameter, then split the parts by :, again with maxsplit, since there could be a : in the comment.

>>> textData="author:john" "\n" "epoch_date:53636352" "\n" "text:comment on_the_first_line \n line_in_the_middle \n line_at_the_end"
>>> [part.split(":", 1) for part in textData.split("\n", 2)]
[['author', 'john'], ['epoch_date', '53636352'], ['text', 'comment on_the_first_line \n line_in_the_middle \n line_at_the_end']]

Other than splitlines, this does not handle all the different options for line-ends, but if your data is coming from the same source all the time, this might not be a problem (or be solved in preprocessing with replace or similar).

Alternatively, if you actually control the data, but have to convert it to a single string at some point, and then parse it back, consider changing the format, e.g. using JSON:

>>> import json
>>> d = {'text': 'comment on_the_first_line \n line_in_the_middle \n line_at_the_end', 'epoch_date': '53636352', 'author': 'john'}
>>> s = json.dumps(d)
>>> s
'{"text": "comment on_the_first_line \\n line_in_the_middle \\n line_at_the_end", "epoch_date": "53636352", "author": "john"}'
>>> json.loads(s)
{u'text': u'comment on_the_first_line \n line_in_the_middle \n line_at_the_end', u'epoch_date': u'53636352', u'author': u'john'}