I want to add column df['B']
and df['C']
I coded like below
df['B'] = df['A'].split("_").str[0]
df['C'] = df['A'].split("_").str[1]
But python split function is slow than I expected, So It spends too long time as dataframe became more bigger I want to find more efficient ways.
Is it possible to use split function one time?
df[['B','C']] = df['A'].split("_")
(this code is just example)
Or is there any more smart way?
Thanks.
CodePudding user response:
split
in Pandas has an expand option which you could use a follows. I've not tested speed.
df = df.join(df['A'].str.split('_', expand = True).rename(columns = {0 : 'B', 1: 'C'}))
CodePudding user response:
You could use expand
to create several columns at a time, and then join
to add all those columns
df.join(df.A.str.split('_', expand=True).rename({0:'B', 1:'C'}, axis=1))
Tho, I fail to see how the method you are using can be so slow. I doubt this one is so much faster. It is basically the same split
. That is not python's split
btw. This is a pandas method. So it is "vectorized".
Unless you have thousands of those columns? In which case, indeed, it is better if you can avoid 1000 calls
Edit: timing
With 1000000 rows, on my computer, this method (and you can see that, in the same 10 seconds, you got twice the exact same answer — with the variation of axis=1
vs columns=
for rename
— so it might be the good one :D), takes 2.71 seconds.
Your method takes 2.77 seconds.
With some 0.03 standard deviation. So sometimes, yours is even faster, tho I ran it enough time to prove with a p-value<5% that yours is sligthly slower, but really really sligthly.
I guess, this is just as fast as it gets.