Home > OS >  Python - extract "fields" from the string without specific separator
Python - extract "fields" from the string without specific separator

Time:07-15

I am putting three fields author, epoch_date and text in one string textData They are separated with new row. Of course text can contain multiple rows or some special characters (#, blank spaces etc.)

textData="author:john" "\n" "epoch_date:53636352" "\n" "text:comment on_the_first_line \n line_in_the_middle \n line_at_the_end"

Now I am trying convinient way to extract those data in the easiest way in separate fields. I do not see how to do it with splitlines() since it will print me 5 rows instead of 3 because as mentioned text field can contain multiple rows.

for line in textData.splitlines():
    print(line)

Also if there is better way to define textData field I am able to modify that part as well. Note: python 2.7 must be used, unfortunately.

Thanks

CodePudding user response:

Instead of splitlines, which splits all the lines, manually split by \n with an additional maxsplit parameter, then split the parts by :, again with maxsplit, since there could be a : in the comment.

>>> textData="author:john" "\n" "epoch_date:53636352" "\n" "text:comment on_the_first_line \n line_in_the_middle \n line_at_the_end"
>>> [part.split(":", 1) for part in textData.split("\n", 2)]
[['author', 'john'], ['epoch_date', '53636352'], ['text', 'comment on_the_first_line \n line_in_the_middle \n line_at_the_end']]

Other than splitlines, this does not handle all the different options for line-ends, but if your data is coming from the same source all the time, this might not be a problem (or be solved in preprocessing with replace or similar).


Alternatively, if you actually control the data, but have to convert it to a single string at some point, and then parse it back, consider changing the format, e.g. using JSON:

>>> import json
>>> d = {'text': 'comment on_the_first_line \n line_in_the_middle \n line_at_the_end', 'epoch_date': '53636352', 'author': 'john'}
>>> s = json.dumps(d)
>>> s
'{"text": "comment on_the_first_line \\n line_in_the_middle \\n line_at_the_end", "epoch_date": "53636352", "author": "john"}'
>>> json.loads(s)
{u'text': u'comment on_the_first_line \n line_in_the_middle \n line_at_the_end', u'epoch_date': u'53636352', u'author': u'john'}
  • Related