Home > Enterprise >  Reorder A Dataframe Rows So All Columns Values Match Otherwise Add Nan
Reorder A Dataframe Rows So All Columns Values Match Otherwise Add Nan

Time:04-22

I have a data frame that looks like this:

Col1 Cl2 C3
12/31/2018 9/30/2018 11/30/2018
1/31/2019 10/31/2018 4/30/2019
2/28/2019 11/30/2018 11/30/2020

And I am hoping to have this rearranged based on the row values so it turns to:

Col1 Cl2 C3
Nan 9/30/2018 Nan
Nan 10/31/2018 Nan
Nan 11/30/2018 11/30/2018
12/31/2018 Nan Nan
1/31/2019 Nan Nan
2/28/2019 Nan Nan
Nan Nan 4/30/2019
Nan Nan 11/30/2020

From the above, we can note that all the rows must have the same date, otherwise, we fill them with some sort of Nan. I was also hoping that this idea could work for any number of columns, any number of rows, and any column names (the idea of being generic).

If helps:

import numpy as np
import pandas as pd

# Create the pandas DataFrame
df1 = pd.DataFrame(['2018-12-31','2019-01-31','2019-02-28'], columns = ['Col1'])
df2 = pd.DataFrame(['2018-09-30','2018-10-31','2018-11-30'], columns = ['Cl2'])
df3 = pd.DataFrame(['2018-11-30','2019-04-30','2020-11-30'], columns = ['C3'])

data = {'Col1': [np.nan,np.nan,np.nan,'2018-12-31','2019-01-31','2019-02-28',np.nan,np.nan],
        'Cl2': ['2018-09-30','2018-10-31','2018-11-30',np.nan,np.nan,np.nan,np.nan,np.nan],
        'C3': [np.nan,np.nan,'2018-11-30',np.nan,np.nan,np.nan,'2019-04-30','2020-11-30']}

desired_df = pd.DataFrame(data)
desired_df

Note: This is somewhat similar to a question that I previously posted here

CodePudding user response:

You can set the column to index then add a dummy column

for df in [df1, df2, df3]:
    df.set_index(df.columns[0], inplace=True)
    df[df.index.name] = 1
print(df1)

            Col1
Col1
2018-12-31     1
2019-01-31     1
2019-02-28     1

Then concat all the transformed dataframes and sort the index

df = pd.concat([df1, df2, df3], axis=1).sort_index()
print(df)

            Col1  Cl2   C3
2018-09-30   NaN  1.0  NaN
2018-10-31   NaN  1.0  NaN
2018-11-30   NaN  1.0  1.0
2018-12-31   1.0  NaN  NaN
2019-01-31   1.0  NaN  NaN
2019-02-28   1.0  NaN  NaN
2019-04-30   NaN  NaN  1.0
2020-11-30   NaN  NaN  1.0

At last, replace all the 1 with corresponding index

df = df.apply(lambda col: col.mask(col.eq(1), df.index), axis=0).reset_index(drop=True)
print(df)

         Col1         Cl2          C3
0         NaN  2018-09-30         NaN
1         NaN  2018-10-31         NaN
2         NaN  2018-11-30  2018-11-30
3  2018-12-31         NaN         NaN
4  2019-01-31         NaN         NaN
5  2019-02-28         NaN         NaN
6         NaN         NaN  2019-04-30
7         NaN         NaN  2020-11-30

With less lines

df = pd.concat([df.set_index(df.columns[0]).assign(**{f'{df.columns[0]}': 1}) for df in [df1, df2, df3]], axis=1).sort_index()

df = df.apply(lambda col: col.mask(col.eq(1), df.index), axis=0).reset_index(drop=True)
  • Related