I have two DataFrames
; df1
and df2
and they both contain mostly NaN
values. They have non-NaN values at the same locations in the data set. Based on these two DataFrames
, I would like to create df
with a column for all values from df1
and one column for df2
, so that I have value pairs from both DataFrames
.
Here I created a simple example:
df1:
01K 02K 03K 04K
Dates
2021-01-01 NaN NaN NaN NaN
2021-01-02 NaN 2.5 NaN NaN
2021-01-03 NaN NaN 4.1 NaN
2021-01-04 8.2 NaN 9.0 NaN
2021-01-05 NaN 1.2 NaN NaN
2021-01-06 NaN NaN NaN NaN
df2:
01K 02K 03K 04K
Dates
2021-01-01 NaN NaN NaN NaN
2021-01-02 NaN 0.6 NaN NaN
2021-01-03 NaN NaN 0.4 NaN
2021-01-04 0.1 NaN 0.2 NaN
2021-01-05 NaN 0.2 NaN NaN
2021-01-06 NaN NaN NaN NaN
df:
df1 df2
0 8.2 0.1
1 2.5 0.6
2 1.2 0.2
3 4.1 0.4
4 9.0 0.2
For reproducibility:
df1 = pd.DataFrame({
'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06'],
'01K':[np.nan, np.nan, np.nan, 8.2, np.nan, np.nan],
'02K':[np.nan, 2.5, np.nan, np.nan, 1.2, np.nan],
'03K':[np.nan, np.nan, 4.1, 9.0, np.nan, np.nan],
'04K':[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]})
df1 = df1.set_index('Dates')
df2 = pd.DataFrame({
'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06'],
'01K':[np.nan, np.nan, np.nan, 0.1, np.nan, np.nan],
'02K':[np.nan, 0.6, np.nan, np.nan, 0.2, np.nan],
'03K':[np.nan, np.nan, 0.4, 0.2, np.nan, np.nan],
'04K':[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]})
df2 = df2.set_index('Dates')
Thanks a lot for your suggestions
CodePudding user response:
You can use stack
to get rid of the nan's:
pd.DataFrame({'df1': df1.stack(), 'df2': df2.stack()})
output:
df1 df2
Dates
2021-01-02 02K 2.5 0.6
2021-01-03 03K 4.1 0.4
2021-01-04 01K 8.2 0.1
03K 9.0 0.2
2021-01-05 02K 1.2 0.2