I am putting three fields author, epoch_date and text
in one string textData
They are separated with new row. Of course text can contain multiple rows or some special characters (#, blank spaces etc.)
textData="author:john" "\n" "epoch_date:53636352" "\n" "text:comment on_the_first_line \n line_in_the_middle \n line_at_the_end"
Now I am trying convinient way to extract those data in the easiest way in separate fields. I do not see how to do it with splitlines() since it will print me 5 rows instead of 3 because as mentioned text field can contain multiple rows.
for line in textData.splitlines():
print(line)
Also if there is better way to define textData
field I am able to modify that part as well.
Note: python 2.7 must be used, unfortunately.
Thanks
CodePudding user response:
Instead of splitlines
, which splits all the lines, manually split
by \n
with an additional maxsplit
parameter, then split the parts by :
, again with maxsplit
, since there could be a :
in the comment.
>>> textData="author:john" "\n" "epoch_date:53636352" "\n" "text:comment on_the_first_line \n line_in_the_middle \n line_at_the_end"
>>> [part.split(":", 1) for part in textData.split("\n", 2)]
[['author', 'john'], ['epoch_date', '53636352'], ['text', 'comment on_the_first_line \n line_in_the_middle \n line_at_the_end']]
Other than splitlines
, this does not handle all the different options for line-ends, but if your data is coming from the same source all the time, this might not be a problem (or be solved in preprocessing with replace
or similar).
Alternatively, if you actually control the data, but have to convert it to a single string at some point, and then parse it back, consider changing the format, e.g. using JSON:
>>> import json
>>> d = {'text': 'comment on_the_first_line \n line_in_the_middle \n line_at_the_end', 'epoch_date': '53636352', 'author': 'john'}
>>> s = json.dumps(d)
>>> s
'{"text": "comment on_the_first_line \\n line_in_the_middle \\n line_at_the_end", "epoch_date": "53636352", "author": "john"}'
>>> json.loads(s)
{u'text': u'comment on_the_first_line \n line_in_the_middle \n line_at_the_end', u'epoch_date': u'53636352', u'author': u'john'}