I have this dataframe, data.
data = pd.DataFrame({'group':['A', 'A', 'B', 'C', 'C', 'B'],
'value':[0.2, 0.21, 0.54, 0.02, 0.001, 0.19]})
I want to build three new features. Below is my target output.
pd.DataFrame({'group':['A', 'A', 'B', 'C', 'C', 'B'],
'value':[0.2, 0.21, 0.54, 0.02, 0.001, 0.19],
'group_A':[0.2, 0.21, 0,0,0,0],
'group_B':[0,0,0.54, 0, 0, 0.19],
'group_C':[0,0,0,0.02, 0.001,0]})
What is the most efficient way to perform such a task? The code below solves the problem. But perhaps there is a vectorized way to do it on my very large real world data set?
for g in data.group.unique():
tmp= [0 if j==g else i for i, j in zip(data.value, data.group)]
data['group_{}'.format(g)]=tmp
CodePudding user response:
Use DataFrame.join
with DataFrame.pivot
, DataFrame.add_prefix
and DataFrame.fillna
:
df = (data.join(data.reset_index()
.pivot('index','group','value')
.add_prefix('group_')
.fillna(0)))
print (df)
group value group_A group_B group_C
0 A 0.200 0.20 0.00 0.000
1 A 0.210 0.21 0.00 0.000
2 B 0.540 0.00 0.54 0.000
3 C 0.020 0.00 0.00 0.020
4 C 0.001 0.00 0.00 0.001
5 B 0.190 0.00 0.19 0.000
Alternative solution:
df = (data.join(data.set_index('group', append=True)['value']
.unstack(fill_value=0)
.add_prefix('group_')))
print (df)
group value group_A group_B group_C
0 A 0.200 0.20 0.00 0.000
1 A 0.210 0.21 0.00 0.000
2 B 0.540 0.00 0.54 0.000
3 C 0.020 0.00 0.00 0.020
4 C 0.001 0.00 0.00 0.001
5 B 0.190 0.00 0.19 0.000