I really want to know how this could be done in an elegant way with pandas. I have this DataFrame where the index is a MultiIndex
with the two values representing a year and a catagory.
| | VAL |
|:--------------|------:|
| ('2018', 'A') | 0 |
| ('2018', 'B') | 1 |
| ('2018', 'C') | 2 |
| ('2019', 'A') | 0 |
| ('2019', 'B') | 1 |
| ('2019', 'C') | 2 |
| ('2020', 'A') | 0 |
| ('2020', 'B') | 1 |
| ('2020', 'C') | 2 |
I want it to look like this:
| CAT | YEAR | VAL |
|:----|-------|-----:|
| 'A' | 2018 | 0 |
| | 2019 | 0 |
| | 2020 | 0 |
| 'B' | 2018 | 1 |
| | 2019 | 1 |
| | 2020 | 1 |
| 'C' | 2018 | 2 |
| | 2019 | 2 |
| | 2020 | 2 |
- The first column is based on the second level of the MultiIndex.
- The second column is based on the first level of the MultiIndex.
I have no idea how pandas can help me here.
This is the MWE do produce the sample data.
#!/usr/bin/env python3
import pandas as pd
# sub DataFrame
df = pd.DataFrame(
{
'CAT': list('ABC'),
'VAL': range(3)
}
)
df = df.set_index('CAT')
print(df)
# dict of 3 DataFrame's
d = {
'2018': df.copy(),
'2019': df.copy(),
'2020': df.copy(),
}
# glue them together
df = pd.concat(d)
print(df.to_markdown())
CodePudding user response:
In [25]: df.swaplevel().sort_index(level=0)
Out[25]:
VAL
CAT
A 2018 0
2019 0
2020 0
B 2018 1
2019 1
2020 1
C 2018 2
2019 2
2020 2
Use df.swaplevel
then sort the index
If you want those actual columns (with empty strings where there's duplicates) instead of just swapping the MultiIndex, then see @Quang's answer
alternatively, building on the above
In [26]: df2 = df.swaplevel().sort_index(level=0).reset_index().rename(columns={'level_1': 'Year'})
...: df2['CAT'] = np.where(df2['CAT'].duplicated(), '', df2['CAT'])
In [27]: df2
Out[130]:
CAT Year VAL
0 A 2018 0
1 2019 0
2 2020 0
3 B 2018 1
4 2019 1
5 2020 1
6 C 2018 2
7 2019 2
8 2020 2
CodePudding user response:
You can try:
# insert the YEAR column
df.insert(0, 'YEAR', df.index.get_level_values(0))
# insert the CAT column
df.insert(0, 'CAT', df.index.get_level_values(1))
# remove the index and sort values
df = df.reset_index(drop=True).sort_values(['CAT','YEAR'])
# mask the duplicates with ''
df.loc[df['CAT'].duplicated(),'CAT'] = ''
Output:
CAT YEAR VAL
0 A 2018 0
3 2019 0
6 2020 0
1 B 2018 1
4 2019 1
7 2020 1
2 C 2018 2
5 2019 2
8 2020 2
CodePudding user response:
Is this what you're after?
df.reset_index().rename(columns={'level_0': 'YEAR'}).sort_values(['CAT', 'YEAR'])[['CAT', 'YEAR', 'VAL']].reset_index(drop=True)
CAT YEAR VAL
0 A 2018 0
1 A 2019 0
2 A 2020 0
3 B 2018 1
4 B 2019 1
5 B 2020 1
6 C 2018 2
7 C 2019 2
8 C 2020 2
CodePudding user response:
iNPUT
VAL | |
---|---|
('2018', 'A') | 0 |
('2018', 'B') | 1 |
('2018', 'C') | 2 |
('2019', 'A') | 0 |
('2019', 'B') | 1 |
('2019', 'C') | 2 |
('2020', 'A') | 0 |
('2020', 'B') | 1 |
('2020', 'C') | 2 |
df.insert(0, '{YEAR}', df.index.get_level_values(0))
df.insert(0, '{CAT}', df.index.get_level_values(1))
df = df.reset_index(drop=True).sort_values(['{CAT}','{YEAR}'])
df.loc[df['CAT'].duplicated(),'CAT'] = ''
OUTPUT
CAT | YEAR | VAL |
---|---|---|
'A' | 2018 | 0 |
2019 | 0 | |
2020 | 0 | |
'B' | 2018 | 1 |
2019 | 1 | |
2020 | 1 | |
'C' | 2018 | 2 |
2019 | 2 | |
2020 | 2 |