I have a data frame that looks like this:
Col1 | Cl2 | C3 |
---|---|---|
12/31/2018 | 9/30/2018 | 11/30/2018 |
1/31/2019 | 10/31/2018 | 4/30/2019 |
2/28/2019 | 11/30/2018 | 11/30/2020 |
And I am hoping to have this rearranged based on the row values so it turns to:
Col1 | Cl2 | C3 |
---|---|---|
Nan | 9/30/2018 | Nan |
Nan | 10/31/2018 | Nan |
Nan | 11/30/2018 | 11/30/2018 |
12/31/2018 | Nan | Nan |
1/31/2019 | Nan | Nan |
2/28/2019 | Nan | Nan |
Nan | Nan | 4/30/2019 |
Nan | Nan | 11/30/2020 |
From the above, we can note that all the rows must have the same date, otherwise, we fill them with some sort of Nan. I was also hoping that this idea could work for any number of columns, any number of rows, and any column names (the idea of being generic).
If helps:
import numpy as np
import pandas as pd
# Create the pandas DataFrame
df1 = pd.DataFrame(['2018-12-31','2019-01-31','2019-02-28'], columns = ['Col1'])
df2 = pd.DataFrame(['2018-09-30','2018-10-31','2018-11-30'], columns = ['Cl2'])
df3 = pd.DataFrame(['2018-11-30','2019-04-30','2020-11-30'], columns = ['C3'])
data = {'Col1': [np.nan,np.nan,np.nan,'2018-12-31','2019-01-31','2019-02-28',np.nan,np.nan],
'Cl2': ['2018-09-30','2018-10-31','2018-11-30',np.nan,np.nan,np.nan,np.nan,np.nan],
'C3': [np.nan,np.nan,'2018-11-30',np.nan,np.nan,np.nan,'2019-04-30','2020-11-30']}
desired_df = pd.DataFrame(data)
desired_df
Note: This is somewhat similar to a question that I previously posted here
CodePudding user response:
You can set the column to index then add a dummy column
for df in [df1, df2, df3]:
df.set_index(df.columns[0], inplace=True)
df[df.index.name] = 1
print(df1)
Col1
Col1
2018-12-31 1
2019-01-31 1
2019-02-28 1
Then concat all the transformed dataframes and sort the index
df = pd.concat([df1, df2, df3], axis=1).sort_index()
print(df)
Col1 Cl2 C3
2018-09-30 NaN 1.0 NaN
2018-10-31 NaN 1.0 NaN
2018-11-30 NaN 1.0 1.0
2018-12-31 1.0 NaN NaN
2019-01-31 1.0 NaN NaN
2019-02-28 1.0 NaN NaN
2019-04-30 NaN NaN 1.0
2020-11-30 NaN NaN 1.0
At last, replace all the 1 with corresponding index
df = df.apply(lambda col: col.mask(col.eq(1), df.index), axis=0).reset_index(drop=True)
print(df)
Col1 Cl2 C3
0 NaN 2018-09-30 NaN
1 NaN 2018-10-31 NaN
2 NaN 2018-11-30 2018-11-30
3 2018-12-31 NaN NaN
4 2019-01-31 NaN NaN
5 2019-02-28 NaN NaN
6 NaN NaN 2019-04-30
7 NaN NaN 2020-11-30
With less lines
df = pd.concat([df.set_index(df.columns[0]).assign(**{f'{df.columns[0]}': 1}) for df in [df1, df2, df3]], axis=1).sort_index()
df = df.apply(lambda col: col.mask(col.eq(1), df.index), axis=0).reset_index(drop=True)