Currently, I have a dataframe that looks as such:
abc | def | ghi | abc | def | ghi |
---|---|---|---|---|---|
2 | 4 | 78 | 56 | 7 | 45 |
Is there a way to combine the columns that have the same name and create a new row for each set of values? Example:
abc | def | ghi |
---|---|---|
2 | 4 | 78 |
56 | 7 | 45 |
CodePudding user response:
You can use .groupby(level=0, axis='columns')
to assign a cumulative count and then perform a transformation based on that.
import pandas as pd
new_cols = pd.MultiIndex.from_arrays([df.columns, df.groupby(level=0, axis=1).cumcount()])
out = df.set_axis(new_cols, axis=1).stack().reset_index(level=0, drop=True)
print(out)
abc def ghi
0 2 4 78
1 56 7 45
CodePudding user response:
You can set up a MultiIndex with help of groupby.cumcount
and stack
:
(df
.set_axis(pd.MultiIndex
.from_arrays([df.columns,
df.groupby(level=0, axis=1)
.cumcount()]), axis=1)
.stack()
.droplevel(0)
)
Output:
abc def ghi
0 2 4 78
1 56 7 45
CodePudding user response:
Just to give an alternative to other answers with melt
:
out = (df.melt(var_name='col', value_name='val')
.assign(idx=lambda x: x.groupby('col').cumcount())
.pivot('idx', 'col', 'val').rename_axis(index=None, columns=None))
print(out)
# Output
abc def ghi
0 2 4 78
1 56 7 45
CodePudding user response:
One option is with pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(names_to = '.value', names_pattern = '(. )')
abc def ghi
0 2 4 78
1 56 7 45
In the above solution, the .value
determines which parts of the column labels remain as headers - the labels are determined by the groups in the regular expression in names_pattern
.
Another option would be to pass the names of the new columns to names_to
, while passing a list of matching regular expressions to names_pattern
:
df.pivot_longer(names_to = ['abc', 'def', 'ghi'],
names_pattern = ['abc', 'def', 'ghi'])
abc def ghi
0 2 4 78
1 56 7 45