(Python) Insert each data at list type in dataframe at on-CodePudding

I want to add column df['B'] and df['C']

I coded like below

df['B'] = df['A'].split("_").str[0]
df['C'] = df['A'].split("_").str[1]

But python split function is slow than I expected, So It spends too long time as dataframe became more bigger I want to find more efficient ways.

Is it possible to use split function one time? df[['B','C']] = df['A'].split("_") (this code is just example)

Or is there any more smart way?

Thanks.

CodePudding user response：

split in Pandas has an expand option which you could use a follows. I've not tested speed.

df = df.join(df['A'].str.split('_', expand = True).rename(columns = {0 : 'B', 1: 'C'}))

CodePudding user response：

You could use expand to create several columns at a time, and then join to add all those columns

df.join(df.A.str.split('_', expand=True).rename({0:'B', 1:'C'}, axis=1))

Tho, I fail to see how the method you are using can be so slow. I doubt this one is so much faster. It is basically the same split. That is not python's split btw. This is a pandas method. So it is "vectorized".

Unless you have thousands of those columns? In which case, indeed, it is better if you can avoid 1000 calls

Edit: timing

With 1000000 rows, on my computer, this method (and you can see that, in the same 10 seconds, you got twice the exact same answer — with the variation of axis=1 vs columns= for rename — so it might be the good one :D), takes 2.71 seconds. Your method takes 2.77 seconds. With some 0.03 standard deviation. So sometimes, yours is even faster, tho I ran it enough time to prove with a p-value<5% that yours is sligthly slower, but really really sligthly.

I guess, this is just as fast as it gets.