Home > front end >  Pandas: if column names are the same, I want to stack them on top of each other
Pandas: if column names are the same, I want to stack them on top of each other

Time:11-09

I have a table as following:

id a b a b c color
123 1 6 7 3 4 blue
456 2 8 9 7 5 yellow

As you can see, some of the columns have the same. What I want to do is to stack the columns with the same names on top of each other (make the table longer than wider). I have looked into documentations of stack, melt and pivot but I can't find a similar problem as I have here. Can anyone help me how this can be achieved?

FYI, here is how I need the table to be:

id a b c color
123 1 6 4 blue
123 7 3 4 blue
456 2 8 5 yellow
456 9 7 5 yellow

CodePudding user response:

You can deduplicate with groupby.cumcount, then stack and groupby.ffill the missing values:

(df.set_axis(pd.MultiIndex.from_arrays([df.columns,
                                        df.groupby(level=0, axis=1).cumcount()
                                       ]), axis=1)
   .stack().groupby(level=0).ffill()
   .reset_index(drop=True).convert_dtypes() # optional
   [list(dict.fromkeys(df.columns))] # also optional, keep original order
)

output:

    id  a  b  c   color
0  123  1  6  4    blue
1  123  7  3  4    blue
2  456  2  8  5  yellow
3  456  9  7  5  yellow

CodePudding user response:

# melt to turn wide to long format
df2=df.melt(id_vars=['id']) 

(df2.assign(seq=df2.groupby(['variable']).cumcount()) # assign a seq to create multiple rows for an id
 .pivot(index=['id','seq'], columns='variable', values='value' ) # pivot
 .reset_index()
 .drop(columns='seq')
 .rename_axis(columns=None)
).ffill()  # fill nan with previous value

id  a   b   c   color
0   123     1   6   4   blue
1   123     7   3   4   blue
2   456     2   8   5   yellow
3   456     9   7   5   yellow
  • Related