A first dataframe has a column containing categories which are the same as the headers of the second. There are multiple row entries for one name in df1. df2 will have 1 row entry per name. df1 has 1 row entry per category per name. All rows for one name occur in sequence in df1. As shown:
The headers of df2 are as follows:
And the desired output is below:
How can I map data from the df1 to df2?
More specifically, how can I map multiple rows from df1 to 1 row and the respective columns of df2 in a more efficient way than looping twice to check for each category under each name?
Any help is appreciated, Have a great day
Code:
import pandas as pds
df1 = pds.DataFrame({'Client': ['Rick', 'Rick', 'John'], 'Category': ['Service1', 'Service2', 'Service1'], 'Amount': [250, 6, 79]})
df2 = pds.DataFrame(columns = ['Client', 'Due_Date', 'Service1', 'Service2'])
output = pds.DataFrame({'Client': ['Rick', 'John'], 'Due_Date': [None,None] , 'Service1': [250, 79], 'Service2': [6, 0]})
CodePudding user response:
This is an alternative approach using .pivot()
and .assign()
df1_pivot = (df1.pivot(index='Client', columns='Category', values='Amount')
.reset_index().assign(Due_Date=None))
df_out = df2.assign(**df1_pivot)
print(df_out)
Client Due_Date Service1 Service2
0 John None 79.0 NaN
1 Rick None 250.0 6.0
CodePudding user response:
You're looking for pandas.DataFrame.pivot
:
out = (df1.pivot(index="Client", columns="Category")
.reset_index()
.set_axis(["Client", "Service1", "Service2"], axis=1)
.assign(Due_Date= None)
)
NB : I suggest you to use import pandas as pd
as per the import convention
.
Output :
print(out)
Client Service1 Service2 Due_Date
0 John 79.0 NaN None
1 Rick 250.0 6.0 None