I try to split a text where a newline starts with only a period.
txt = "\ra. skin lateral biopsy:\r -positive for disease \r.\rb. skin medial biopsy:\r -negative for disease \r. \rc. skin floor biopsy:\r -negative for disease"
the expected result would be:
["a. skin lateral biopsy: -positive for disease", "b. skin medial biopsy: -negative for disease", "c. skin floor biopsy: -negative for disease"]
I tried
re.split('^\.', txt) and it does not work.
I don't understand what why regex is not picking up the lines that start with periods.
CodePudding user response:
Firstly I would split the text, and then I would process it through a loop
txt = "\ra. skin lateral biopsy:\r -positive for disease \r.\rb. skin medial biopsy:\r -negative for disease \r. \rc. skin floor biopsy:\r -negative for disease";
arr = txt.split("\r.")
for j in arr:
j = j.replace("\r", "")
j = j.strip()
print(j)
CodePudding user response:
[re.sub(r'\r.?','',e).strip() for e in re.split(r'\r(?=[a-z])',txt) if e]
['a. skin lateral biopsy:-positive for disease', 'b. skin medial biopsy:-negative for disease', 'c. skin floor biopsy:-negative for disease']