Home > other >  Replace list comprehension with vectorized method to build new features
Replace list comprehension with vectorized method to build new features

Time:11-23

I have this dataframe, data.

data = pd.DataFrame({'group':['A', 'A', 'B', 'C', 'C', 'B'],
             'value':[0.2, 0.21, 0.54, 0.02, 0.001, 0.19]})

I want to build three new features. Below is my target output.

pd.DataFrame({'group':['A', 'A', 'B', 'C', 'C', 'B'],
              'value':[0.2, 0.21, 0.54, 0.02, 0.001, 0.19],
              'group_A':[0.2, 0.21, 0,0,0,0],
              'group_B':[0,0,0.54, 0, 0, 0.19],
              'group_C':[0,0,0,0.02, 0.001,0]})

What is the most efficient way to perform such a task? The code below solves the problem. But perhaps there is a vectorized way to do it on my very large real world data set?

for g in data.group.unique():
    tmp= [0 if j==g else i for i, j in zip(data.value, data.group)]
    data['group_{}'.format(g)]=tmp
    

CodePudding user response:

Use DataFrame.join with DataFrame.pivot, DataFrame.add_prefix and DataFrame.fillna:

df = (data.join(data.reset_index()
          .pivot('index','group','value')
          .add_prefix('group_')
          .fillna(0)))
print (df)
  group  value  group_A  group_B  group_C
0     A  0.200     0.20     0.00    0.000
1     A  0.210     0.21     0.00    0.000
2     B  0.540     0.00     0.54    0.000
3     C  0.020     0.00     0.00    0.020
4     C  0.001     0.00     0.00    0.001
5     B  0.190     0.00     0.19    0.000

Alternative solution:

df = (data.join(data.set_index('group', append=True)['value']
          .unstack(fill_value=0)
          .add_prefix('group_')))
print (df)
  group  value  group_A  group_B  group_C
0     A  0.200     0.20     0.00    0.000
1     A  0.210     0.21     0.00    0.000
2     B  0.540     0.00     0.54    0.000
3     C  0.020     0.00     0.00    0.020
4     C  0.001     0.00     0.00    0.001
5     B  0.190     0.00     0.19    0.000
  • Related