Python Pandas: Rerange data from vertical to horizontal-CodePudding

I would like to transform a data frame using pandas.

Old-Dataframe:

Person-ID	Reference-ID	Name
1	1	Max
2	1	Kevin
3	1	Sara
4	4	Chessi
5	9	Fernando

into a new-dataframe in the following format.

New-Dataframe:

Person-ID	Reference-ID	Member1	Member2	Member3
1	1	Max	Kevin	Sara
4	4	Chessi
5	9	Fernando

My solution would be:

Write all the Reference-IDs from the old-dataframe into the new-dataframe
Write all the Person-Ids from the old-dataframe into the new-dataframe, which their reference_id is not in the old-dataframe (see example Fernando)
Loop trough the "old"-dataframe and add the name to the corresponding line in the new dataframe

Do you have any suggestions, on how to make this faster/simpler?

PS: The old-dataframe can be made like this

person_id = [1,2,3,4,5]
reference_id = [1,1,1,4,9]
name = ['Max','Kevin','Sara',"Chessi","Fernando"]
list_tuples=list(zip(person_id,reference_id,name))
old_dataframe = pd.DataFrame(list_tuples,columns=['Person_ID','Reference_id','Name'])

CodePudding user response：

One way is to groupby.agg first and then concat the result:

df = (df.groupby("Reference_id", as_index=False).agg(Person_ID=("Person_ID", "first"),
                                                     Name=("Name", list)))

print (pd.concat([df, pd.DataFrame(df.pop("Name").tolist(), columns=["Ref", "Member1", "Member2"])], axis=1))

   Reference_id  Person_ID       Ref Member1 Member2
0             1          1       Max   Kevin    Sara
1             4          4    Chessi    None    None
2             9          5  Fernando    None    None

CodePudding user response：

You can use pivot_table() like this:

df1= pd.pivot_table(df, index=['Reference-ID'], values=['Person-ID', 'Name'], aggfunc=({'Person-ID':'min', 'Name':lambda x:list(x), 'Person-ID':'min'}))

df1.reset_index()[['Person-ID','Reference-ID']].join(pd.DataFrame(df1.Name.tolist()))

Output:

Person-ID	Reference-ID	0	1	2
1	1	Max	Kevin	Sara
4	4	Chessi	None	None
5	9	Fernando	None	None

You can reassign column names like this:

df2=df1.reset_index()[['Person-ID','Reference-ID']].join(pd.DataFrame(df1.Name.tolist()))

df2.columns=list(df2.columns[0:2]) [f"Member{x 1}" for x in df2.columns[2:]]

Output:

Person-ID	Reference-ID	Member1	Member2	Member3
1	1	Max	Kevin	Sara
4	4	Chessi	None	None
5	9	Fernando	None	None