Home > Software engineering >  Split a string from a textfile correctly
Split a string from a textfile correctly

Time:04-20

i have a file ("text.txt") with several lines (~5000000 lines). I'm trying to split a line so that:

2   :   :   PUNCT   sent    _   _   _   _   _3-4    L'Algebra   _   _   _   _   _   _   _   _

becomes this two lines:

2   :   :   PUNCT   sent    _   _   _   _   _
3-4 L'Algebra   _   _   _   _   _   _   _   _

so essentially i want to transform a single line into two lines and write it back to another file. All the lines that has to be split starts with the character "" (underscore) and a number or number "-" number. I want to split the line into two lines after the character "" (underscore).

If i try to split the line with this function:

lines = re.split("_\d")

and write the list lines to a file after i get this:

2   :   :   PUNCT   sent    _   _   _   _   _
-4 L'Algebra   _   _   _   _   _   _   _   _

How can i get to do this correctly? Can anyone help me please?

CodePudding user response:

Try:

>>> re.split("_(?=\d )", line)

['2   :   :   PUNCT   sent    _   _   _   _   ',
 "3-4    L'Algebra   _   _   _   _   _   _   _   _"]
  • Related