Rearange rows in a pandas DataFrame based on MultiIndex values the pandas-way-CodePudding

I really want to know how this could be done in an elegant way with pandas. I have this DataFrame where the index is a MultiIndex with the two values representing a year and a catagory.

|               |   VAL |
|:--------------|------:|
| ('2018', 'A') |     0 |
| ('2018', 'B') |     1 |
| ('2018', 'C') |     2 |
| ('2019', 'A') |     0 |
| ('2019', 'B') |     1 |
| ('2019', 'C') |     2 |
| ('2020', 'A') |     0 |
| ('2020', 'B') |     1 |
| ('2020', 'C') |     2 |

I want it to look like this:

| CAT |  YEAR |  VAL |
|:----|-------|-----:|
| 'A' |  2018 |    0 |
|     |  2019 |    0 |
|     |  2020 |    0 |
| 'B' |  2018 |    1 |
|     |  2019 |    1 |
|     |  2020 |    1 |
| 'C' |  2018 |    2 |
|     |  2019 |    2 |
|     |  2020 |    2 |

The first column is based on the second level of the MultiIndex.
The second column is based on the first level of the MultiIndex.

I have no idea how pandas can help me here.

This is the MWE do produce the sample data.

#!/usr/bin/env python3
import pandas as pd

# sub DataFrame
df = pd.DataFrame(
    {
        'CAT': list('ABC'),
        'VAL': range(3)
    }
)
df = df.set_index('CAT')
print(df)

# dict of 3 DataFrame's
d = {
    '2018': df.copy(),
    '2019': df.copy(),
    '2020': df.copy(),
}

# glue them together
df = pd.concat(d)
print(df.to_markdown())

CodePudding user response：

In [25]: df.swaplevel().sort_index(level=0)
Out[25]:
          VAL
CAT
A   2018    0
    2019    0
    2020    0
B   2018    1
    2019    1
    2020    1
C   2018    2
    2019    2
    2020    2

Use df.swaplevel then sort the index

If you want those actual columns (with empty strings where there's duplicates) instead of just swapping the MultiIndex, then see @Quang's answer

alternatively, building on the above


In [26]: df2 = df.swaplevel().sort_index(level=0).reset_index().rename(columns={'level_1': 'Year'})
     ...: df2['CAT'] = np.where(df2['CAT'].duplicated(), '', df2['CAT'])

In [27]: df2
Out[130]:
  CAT  Year  VAL
0   A  2018    0
1      2019    0
2      2020    0
3   B  2018    1
4      2019    1
5      2020    1
6   C  2018    2
7      2019    2
8      2020    2

CodePudding user response：

You can try:

# insert the YEAR column
df.insert(0, 'YEAR', df.index.get_level_values(0))

# insert the CAT column
df.insert(0, 'CAT', df.index.get_level_values(1))

# remove the index and sort values
df = df.reset_index(drop=True).sort_values(['CAT','YEAR'])

# mask the duplicates with ''
df.loc[df['CAT'].duplicated(),'CAT'] = ''

Output:

  CAT  YEAR  VAL
0   A  2018    0
3      2019    0
6      2020    0
1   B  2018    1
4      2019    1
7      2020    1
2   C  2018    2
5      2019    2
8      2020    2

CodePudding user response：

Is this what you're after?

df.reset_index().rename(columns={'level_0': 'YEAR'}).sort_values(['CAT', 'YEAR'])[['CAT', 'YEAR', 'VAL']].reset_index(drop=True)

  CAT  YEAR  VAL
0   A  2018    0
1   A  2019    0
2   A  2020    0
3   B  2018    1
4   B  2019    1
5   B  2020    1
6   C  2018    2
7   C  2019    2
8   C  2020    2

CodePudding user response：

iNPUT

	VAL
('2018', 'A')	0
('2018', 'B')	1
('2018', 'C')	2
('2019', 'A')	0
('2019', 'B')	1
('2019', 'C')	2
('2020', 'A')	0
('2020', 'B')	1
('2020', 'C')	2

df.insert(0, '{YEAR}', df.index.get_level_values(0))
df.insert(0, '{CAT}', df.index.get_level_values(1))
df = df.reset_index(drop=True).sort_values(['{CAT}','{YEAR}'])
df.loc[df['CAT'].duplicated(),'CAT'] = ''

OUTPUT

CAT	YEAR	VAL
'A'	2018	0
	2019	0
	2020	0
'B'	2018	1
	2019	1
	2020	1
'C'	2018	2
	2019	2
	2020	2