Here's my code with 2 dataframes:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.array([[1, 2, 3, 5, 2], [2, 2, 3, 5, 2], [3, 2, 3, 5, 2], [10, 2, 3, 5, 2]]),
columns=['ID', 'itemX_2', 'itemK_3', 'itemC_5', 'itemH_2'])
df2 = pd.DataFrame(np.array([[1,1,1, 2,2,2, 3,3,3, 10,10,10], [2,3,5, 2,3,5, 2,3,5, 2,3,5], [20,40,60, 80,100,200, 220,240,260, 500,505,520]]).T,
columns=['ID', 'Item_id', 'value_to_assign'])
Based on df2 I want to modify df1
Expected output:
df_expected_output = pd.DataFrame(np.array([[1, 20, 40, 60, 20], [2, 80, 100, 200, 80], [3, 220, 240, 260, 220], [10, 500, 505, 520, 500]]),
columns=['ID', 'itemX_2', 'itemK_3', 'itemC_5', 'itemH_2'])
I have done it with iterating over columns and some operations. Hovever in my example i got more columns and rows in dataframes, so its pretty slow. Someone know how to do it in fast efficient way? Thanks
CodePudding user response:
Here is one solution. pivot
df2 to have a format similar to df1 and then replace column by column by matching on the number after the last '_'.
df2_pivot = df2.pivot(index='ID', columns='Item_id', values='value_to_assign').rename_axis(None, axis=1)
df3 = df1.set_index('ID')
for c in df3:
df3[c] = df2_pivot[int(c.rsplit('_', 1)[-1])]
Or, using a dictionary comprehension for the second part:
df3 = pd.DataFrame({c: df2_pivot[int(c.rsplit('_', 1)[-1])]
for c in df1.columns[1:]},
index=df1['ID']).reset_index()
output:
>>> df3.reset_index()
ID itemX_2 itemK_3 itemC_5 itemH_2
0 1 20 40 60 20
1 2 80 100 200 80
2 3 220 240 260 220
3 10 500 505 520 500
CodePudding user response:
Another way would be:
- Stack the original df which is to be replaced.
- grab the index and split the second index to get values after
_
- using
pd.Index.map
, map the values of these index from df2 - Create a dataframe keeping this mapped value as value and the stacked multiindex as index and then unstack them.
s = df1.set_index("ID").stack()
i = s.index.map(lambda x: (x[0],x[1].split("_")[1]))
v = i.map(df2.set_index(["ID",df2['Item_id'].map(str)])['value_to_assign'])
out = pd.DataFrame({"value":v},index=s.index)['value'].unstack().reset_index()
print(out)
ID itemX_2 itemK_3 itemC_5 itemH_2
0 1 20 40 60 20
1 2 80 100 200 80
2 3 220 240 260 220
3 10 500 505 520 500
CodePudding user response:
DataFrame.replace
We can use pivot
to reshape the dataframe df2
so that we can easily use replace
method to substitute the values in df1
df1.set_index('ID').T.replace(df2.pivot('Item_id', 'ID', 'value_to_assign')).T
itemX_2 itemK_3 itemC_5 itemH_2
ID
1 20 40 60 20
2 80 100 200 80
3 220 240 260 220
10 500 505 520 500
CodePudding user response:
You can iterate over the columns of df1 and perform a pd.merge
:
for col in df1.columns:
if col == 'ID': continue
df_temp = pd.merge(df1.loc[:, ['ID', col]], df2, on = 'ID')
df1[col] = df_temp[df_temp[col] == df_temp['Item_id']]['value_to_assign'].reset_index(drop=True)
output:
ID itemX_2 itemK_3 itemC_5 itemH_2
0 1 20 40 60 20
1 2 80 100 200 80
2 3 220 240 260 220
3 10 500 505 520 500