Home > Net >  pandas split column based on groupby
pandas split column based on groupby

Time:10-04

I want to split a column into multiple column based on a grouped value. For example

# input
df = pd.DataFrame([[1,2,1], [1,4,4], [1,5,7], [2,1,1], [2,3,5], [2,3,1]], columns=['cat', 'v1', 'v2'])

#output
df_out = pd.DataFrame([[1,2,0,1,0], 
                       [1,4,0,4,0], 
                       [1,5,0,7,0], 
                       [2,0,1,0,1], 
                       [2,0,3,0,5], 
                       [2,0,3,0,1]], columns=['cat', 'v1_1', 'v1_2', 'v2_1', 'v2_2'])

I came up with this code, want to get an efficient one since my input file is huge.

grouped = df.groupby(['cat'])

cols = ['v1', 'v2']
for column in cols:
    new_name = [column   '_'   str(k) for k in grouped.groups]
    df[new_name] = np.zeros((df[column].size, len(grouped.groups)), dtype=df[column].dtype)
        
        
for k,index in grouped.groups.items():
    df.loc[index, [c   '_'   str(k) for c in cols]] = df.loc[index, cols].to_numpy()

In:

    cat v1  v2
0   1   2   1
1   1   4   4
2   1   5   7
3   2   1   1
4   2   3   5
5   2   3   1

out:

    cat  v1_1   v1_2    v2_1    v2_2
0   1    2         0    1   0
1   1    4         0    4   0
2   1    5         0    7   0
3   2    0         1    0   1
4   2    0         3    0   5
5   2    0         3    0   1

CodePudding user response:

Use DataFrame.set_index with DataFrame.unstack, flatten MultiIndex and last add cat column by DataFrame.join:

df1 = df.set_index('cat', append=True).unstack(fill_value=0)
df1.columns = df1.columns.map(lambda x: f'{x[0]}_{x[1]}')

df1 = df[['cat']].join(df1)
print (df1)
   cat  v1_1  v1_2  v2_1  v2_2
0    1     2     0     1     0
1    1     4     0     4     0
2    1     5     0     7     0
3    2     0     1     0     1
4    2     0     3     0     5
5    2     0     3     0     1
  • Related