Home > Enterprise >  How to turn rows in each group in dataframe into columns?
How to turn rows in each group in dataframe into columns?

Time:07-29

I have a dataframe:

col1  col2  col3   val  col4
a1    b1     c1    10    dd
a1    b1     c1    15    kk
a2    b2     c2    20    ff
a2    b2     c2    35    mm
a3    b3     c3    9     sd

I want to put each value in column "val" from each group col1, col2, col3 into each column. So desired result is:

col1  col2  col3   val_1  col4_1  val_2  col4_2
a1    b1     c1     10     dd      15      kk
a2    b2     c2     20     ff      35      mm
a3    b3     c3     9      sd      NA      NA

How to do that? Is there any function to turn those rows into columns within group?

CodePudding user response:

If there exist at most two groups (see comments), then you can make use of the first and last functions in combination with a groupby statement. You just should define your own last function, that returns you the last element of a group (for your example it would be the second value) if it exists. If it does not exist, meaning the group length equals 1, it would return you nan.

Code:

df = pd.DataFrame(
    {"col1": ["a1", "a1", "a2", "a2", "a3"],
     "col2": ["b1", "b1", "b2", "b2", "b3"],  
     "col3": ["c1", "c1", "c2", "c2", "c3"],
     "val": [10, 15, 20, 35, 9],
     "col4": ["dd", "kk", "ff", "mm", "sd"]}
)

last_check = lambda x: pd.NA if len(x) == 1 else x.iloc[-1]

df.groupby(["col1", "col2", "col3"], as_index=False)\
    .agg(val_1=("val", "first"),
         col4_1=("col4", "first"),
         val_2=("val", last_check),
         col4_2=("col4", last_check))

Output:

col1  col2  col3  val_1  col4_1  val_2  col4_2          
a1    b1    c1    10     dd      15      kk
a2    b2    c2    20     ff      35      mm
a3    b3    c3    9      sd      <NA>   <NA>
  • Related