I am trying to extract some part of the following sentences (I have similar rows following similar pattern):
Text
19 hours ago — Catch up on key developments an...
8 hour ago — Catch up on key developments an...
10 minutes ago — Catch up on key developments an...
1 day ago — Catch up on key developments an...
I would like to split the Text column into two. (before and after the —) :
Text1 Text 2
19 hours ago Catch up on key developments an...
8 hour ago Catch up on key developments an...
10 minutes ago Catch up on key developments an...
1 day ago Catch up on key developments an...
I did this:
df[['Text1', 'Text2']] = df['Text'].str.extract(r"(\d \w , \d{5})?\s*\—?\s*(.*)", expand=True)
However it seems not working.
If you have experience with re
, could you please point me to the mistake and to the solution? Thanks
CodePudding user response:
You can use the pandas.Series.str.split
function:
df['Text'].str.split(' — ', n=1, expand=True)
You need n=1
to limit the number of splits in output. Also, you need to set expand=True
to use the expanding functionality.
CodePudding user response:
You can use split rather than Regex.
df[['Text1', 'Text2']] = df['Text'].str.split('-',n=1,expand=True)
Output:
Text1 Text 2
19 hours ago Catch up on key developments an...
8 hour ago Catch up on key developments an...
10 minutes ago Catch up on key developments an...
1 day ago Catch up on key developments an...