i have a file ("text.txt") with several lines (~5000000 lines). I'm trying to split a line so that:
2 : : PUNCT sent _ _ _ _ _3-4 L'Algebra _ _ _ _ _ _ _ _
becomes this two lines:
2 : : PUNCT sent _ _ _ _ _
3-4 L'Algebra _ _ _ _ _ _ _ _
so essentially i want to transform a single line into two lines and write it back to another file. All the lines that has to be split starts with the character "" (underscore) and a number or number "-" number. I want to split the line into two lines after the character "" (underscore).
If i try to split the line with this function:
lines = re.split("_\d")
and write the list lines to a file after i get this:
2 : : PUNCT sent _ _ _ _ _
-4 L'Algebra _ _ _ _ _ _ _ _
How can i get to do this correctly? Can anyone help me please?
CodePudding user response:
Try:
>>> re.split("_(?=\d )", line)
['2 : : PUNCT sent _ _ _ _ ',
"3-4 L'Algebra _ _ _ _ _ _ _ _"]