Home > database >  Extract part of a text and split into two columns
Extract part of a text and split into two columns

Time:08-31

I am trying to extract some part of the following sentences (I have similar rows following similar pattern):

Text
19 hours ago — Catch up on key developments an...
8 hour ago — Catch up on key developments an...
10 minutes ago — Catch up on key developments an...
1 day ago — Catch up on key developments an...

I would like to split the Text column into two. (before and after the —) :

Text1          Text 2
19 hours ago   Catch up on key developments an...
8 hour ago     Catch up on key developments an...
10 minutes ago Catch up on key developments an...
1 day ago      Catch up on key developments an...

I did this:

df[['Text1', 'Text2']] = df['Text'].str.extract(r"(\d  \w , \d{5})?\s*\—?\s*(.*)", expand=True)

However it seems not working. If you have experience with re, could you please point me to the mistake and to the solution? Thanks

CodePudding user response:

You can use the pandas.Series.str.split function:

df['Text'].str.split(' — ', n=1, expand=True)

You need n=1 to limit the number of splits in output. Also, you need to set expand=True to use the expanding functionality.

CodePudding user response:

You can use split rather than Regex.

df[['Text1', 'Text2']] = df['Text'].str.split('-',n=1,expand=True)

Output:

Text1          Text 2
19 hours ago   Catch up on key developments an...
8 hour ago     Catch up on key developments an...
10 minutes ago Catch up on key developments an...
1 day ago      Catch up on key developments an...
  • Related