I have a table as following:
id | a | b | a | b | c | color |
---|---|---|---|---|---|---|
123 | 1 | 6 | 7 | 3 | 4 | blue |
456 | 2 | 8 | 9 | 7 | 5 | yellow |
As you can see, some of the columns have the same. What I want to do is to stack the columns with the same names on top of each other (make the table longer than wider). I have looked into documentations of stack, melt and pivot but I can't find a similar problem as I have here. Can anyone help me how this can be achieved?
FYI, here is how I need the table to be:
id | a | b | c | color |
---|---|---|---|---|
123 | 1 | 6 | 4 | blue |
123 | 7 | 3 | 4 | blue |
456 | 2 | 8 | 5 | yellow |
456 | 9 | 7 | 5 | yellow |
CodePudding user response:
You can deduplicate with groupby.cumcount
, then stack
and groupby.ffill
the missing values:
(df.set_axis(pd.MultiIndex.from_arrays([df.columns,
df.groupby(level=0, axis=1).cumcount()
]), axis=1)
.stack().groupby(level=0).ffill()
.reset_index(drop=True).convert_dtypes() # optional
[list(dict.fromkeys(df.columns))] # also optional, keep original order
)
output:
id a b c color
0 123 1 6 4 blue
1 123 7 3 4 blue
2 456 2 8 5 yellow
3 456 9 7 5 yellow
CodePudding user response:
# melt to turn wide to long format
df2=df.melt(id_vars=['id'])
(df2.assign(seq=df2.groupby(['variable']).cumcount()) # assign a seq to create multiple rows for an id
.pivot(index=['id','seq'], columns='variable', values='value' ) # pivot
.reset_index()
.drop(columns='seq')
.rename_axis(columns=None)
).ffill() # fill nan with previous value
id a b c color
0 123 1 6 4 blue
1 123 7 3 4 blue
2 456 2 8 5 yellow
3 456 9 7 5 yellow