I have the following DataFrame:
df = pd.DataFrame([['A',1,10],
['A',1,20],
['C',2,15],
['B',3,20]], columns=['Val1','Val2','Val3'])
colList = [f'{col}_{i}' for col in ['A', 'B', 'C'] for i in range(5)]
data = np.zeros(shape=(len(df), len(colList)))
df = pd.concat([df, pd.DataFrame(data, columns=colList)], axis=1)
df['REF'] = df['Val1'] '_' df['Val2'].astype(str)
Val1 Val2 Val3 A_0 A_1 A_2 A_3 ... B_4 C_0 C_1 C_2 C_3 C_4 REF
0 A 1 10 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 A_1
1 A 1 20 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 A_1
2 C 2 15 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 C_2
3 B 3 20 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 B_3
Column Val3 contains values which should be assigned to the corresponding columns under the REF columns. This can be done by using the following lambda function:
def func(x):
x[x['REF']] = x['Val3']
return x
df = df.apply(lambda x: func(x), axis=1)
print(df)
Val1 Val2 Val3 A_0 A_1 A_2 A_3 ... B_4 C_0 C_1 C_2 C_3 C_4 REF
0 A 1 10 0.0 10.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 A_1
1 A 1 20 0.0 20.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 A_1
2 C 2 15 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 15.0 0.0 0.0 C_2
3 B 3 20 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 B_3
Is there any way I can vectorize this process without using a Lambda function in order to speed up the process?
CodePudding user response:
Pivot
and update
df.update(df.pivot(columns='REF', values='Val3'))
>>> df
Val1 Val2 Val3 A_0 A_1 A_2 A_3 A_4 B_0 B_1 B_2 B_3 B_4 C_0 C_1 C_2 C_3 C_4 REF
0 A 1 10 0.0 10.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 A_1
1 A 1 20 0.0 20.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 A_1
2 C 2 15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 15.0 0.0 0.0 C_2
3 B 3 20 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 20.0 0.0 0.0 0.0 0.0 0.0 0.0 B_3