Extract part of a text and split into two columns-CodePudding

I am trying to extract some part of the following sentences (I have similar rows following similar pattern):

Text
19 hours ago — Catch up on key developments an...
8 hour ago — Catch up on key developments an...
10 minutes ago — Catch up on key developments an...
1 day ago — Catch up on key developments an...

I would like to split the Text column into two. (before and after the —) :

Text1          Text 2
19 hours ago   Catch up on key developments an...
8 hour ago     Catch up on key developments an...
10 minutes ago Catch up on key developments an...
1 day ago      Catch up on key developments an...

I did this:

df[['Text1', 'Text2']] = df['Text'].str.extract(r"(\d  \w , \d{5})?\s*\—?\s*(.*)", expand=True)

However it seems not working. If you have experience with re, could you please point me to the mistake and to the solution? Thanks

CodePudding user response：

You can use the pandas.Series.str.split function:

df['Text'].str.split(' — ', n=1, expand=True)

You need n=1 to limit the number of splits in output. Also, you need to set expand=True to use the expanding functionality.

CodePudding user response：

You can use split rather than Regex.

df[['Text1', 'Text2']] = df['Text'].str.split('-',n=1,expand=True)

Output:

Text1          Text 2
19 hours ago   Catch up on key developments an...
8 hour ago     Catch up on key developments an...
10 minutes ago Catch up on key developments an...
1 day ago      Catch up on key developments an...