I would like to transform a data frame using pandas.
Old-Dataframe:
Person-ID | Reference-ID | Name |
---|---|---|
1 | 1 | Max |
2 | 1 | Kevin |
3 | 1 | Sara |
4 | 4 | Chessi |
5 | 9 | Fernando |
into a new-dataframe in the following format.
New-Dataframe:
Person-ID | Reference-ID | Member1 | Member2 | Member3 |
---|---|---|---|---|
1 | 1 | Max | Kevin | Sara |
4 | 4 | Chessi | ||
5 | 9 | Fernando |
My solution would be:
- Write all the Reference-IDs from the old-dataframe into the new-dataframe
- Write all the Person-Ids from the old-dataframe into the new-dataframe, which their reference_id is not in the old-dataframe (see example Fernando)
- Loop trough the "old"-dataframe and add the name to the corresponding line in the new dataframe
Do you have any suggestions, on how to make this faster/simpler?
PS: The old-dataframe can be made like this
person_id = [1,2,3,4,5]
reference_id = [1,1,1,4,9]
name = ['Max','Kevin','Sara',"Chessi","Fernando"]
list_tuples=list(zip(person_id,reference_id,name))
old_dataframe = pd.DataFrame(list_tuples,columns=['Person_ID','Reference_id','Name'])
CodePudding user response:
One way is to groupby.agg
first and then concat
the result:
df = (df.groupby("Reference_id", as_index=False).agg(Person_ID=("Person_ID", "first"),
Name=("Name", list)))
print (pd.concat([df, pd.DataFrame(df.pop("Name").tolist(), columns=["Ref", "Member1", "Member2"])], axis=1))
Reference_id Person_ID Ref Member1 Member2
0 1 1 Max Kevin Sara
1 4 4 Chessi None None
2 9 5 Fernando None None
CodePudding user response:
You can use pivot_table()
like this:
df1= pd.pivot_table(df, index=['Reference-ID'], values=['Person-ID', 'Name'], aggfunc=({'Person-ID':'min', 'Name':lambda x:list(x), 'Person-ID':'min'}))
df1.reset_index()[['Person-ID','Reference-ID']].join(pd.DataFrame(df1.Name.tolist()))
Output:
Person-ID | Reference-ID | 0 | 1 | 2 |
---|---|---|---|---|
1 | 1 | Max | Kevin | Sara |
4 | 4 | Chessi | None | None |
5 | 9 | Fernando | None | None |
You can reassign column names like this:
df2=df1.reset_index()[['Person-ID','Reference-ID']].join(pd.DataFrame(df1.Name.tolist()))
df2.columns=list(df2.columns[0:2]) [f"Member{x 1}" for x in df2.columns[2:]]
Output:
Person-ID | Reference-ID | Member1 | Member2 | Member3 |
---|---|---|---|---|
1 | 1 | Max | Kevin | Sara |
4 | 4 | Chessi | None | None |
5 | 9 | Fernando | None | None |