I want to extract some data from each row, and make that new columns of existing or new dataframe, without repeatedly doing the same operation of re. match.
Here's how one entry of the dataframe looks:
00:00 Someones_name: some text goes here
And i have a regex that successfully takes 3 groups that I need:
re.match(r"^(\d{2}:\d{2}) (.*): (.*)$", x)
The problem I have is, how to take matched_part[1], [2], and [3] without actually matching for every new column again.
The solution that I don't want is:
new_df['time'] = old_df['text'].apply(function1)`
new_df['name'] = old_df['text'].apply(function2)`
new_df['text'] = old_df['text'].apply(function3)`
def function1(x):
return re.match(r"^(\d{2}:\d{2}) (.*): (.*)$", x)[1]
CodePudding user response:
you can use str.extract with your pattern
df[['time','name', 'text']] = df['col1'].str.extract(r"^(\d{2}:\d{2}) (.*): (.*)$")
print(df)
# col1 time name \
# 0 00:00 Someones_name: some text goes here 00:00 Someones_name
# text
# 0 some text goes here