Home > OS >  Python Pandas: Rerange data from vertical to horizontal
Python Pandas: Rerange data from vertical to horizontal

Time:10-19

I would like to transform a data frame using pandas.

Old-Dataframe:

Person-ID Reference-ID Name
1 1 Max
2 1 Kevin
3 1 Sara
4 4 Chessi
5 9 Fernando

into a new-dataframe in the following format.

New-Dataframe:

Person-ID Reference-ID Member1 Member2 Member3
1 1 Max Kevin Sara
4 4 Chessi
5 9 Fernando

My solution would be:

  1. Write all the Reference-IDs from the old-dataframe into the new-dataframe
  2. Write all the Person-Ids from the old-dataframe into the new-dataframe, which their reference_id is not in the old-dataframe (see example Fernando)
  3. Loop trough the "old"-dataframe and add the name to the corresponding line in the new dataframe

Do you have any suggestions, on how to make this faster/simpler?

PS: The old-dataframe can be made like this

person_id = [1,2,3,4,5]
reference_id = [1,1,1,4,9]
name = ['Max','Kevin','Sara',"Chessi","Fernando"]
list_tuples=list(zip(person_id,reference_id,name))
old_dataframe = pd.DataFrame(list_tuples,columns=['Person_ID','Reference_id','Name'])

CodePudding user response:

One way is to groupby.agg first and then concat the result:

df = (df.groupby("Reference_id", as_index=False).agg(Person_ID=("Person_ID", "first"),
                                                     Name=("Name", list)))

print (pd.concat([df, pd.DataFrame(df.pop("Name").tolist(), columns=["Ref", "Member1", "Member2"])], axis=1))

   Reference_id  Person_ID       Ref Member1 Member2
0             1          1       Max   Kevin    Sara
1             4          4    Chessi    None    None
2             9          5  Fernando    None    None

CodePudding user response:

You can use pivot_table() like this:

df1= pd.pivot_table(df, index=['Reference-ID'], values=['Person-ID', 'Name'], aggfunc=({'Person-ID':'min', 'Name':lambda x:list(x), 'Person-ID':'min'}))

df1.reset_index()[['Person-ID','Reference-ID']].join(pd.DataFrame(df1.Name.tolist()))

Output:

Person-ID Reference-ID 0 1 2
1 1 Max Kevin Sara
4 4 Chessi None None
5 9 Fernando None None

You can reassign column names like this:

df2=df1.reset_index()[['Person-ID','Reference-ID']].join(pd.DataFrame(df1.Name.tolist()))

df2.columns=list(df2.columns[0:2]) [f"Member{x 1}" for x in df2.columns[2:]]

Output:

Person-ID Reference-ID Member1 Member2 Member3
1 1 Max Kevin Sara
4 4 Chessi None None
5 9 Fernando None None
  • Related