This is how I replaced the NaNs in the two dataframes,
df_max.fillna(0, inplace=True)
df_min.fillna(0, inplace=True)
This is how I concat and create a calculated column
df_max.reset_index(drop=True, inplace=True)
df_min.reset_index(drop=True, inplace=True)
df_combine = pd.concat([df_max, df_min], ignore_index = True)
df_combine['Range'] = df_combine['Maximum temperature (Degree C)'] - df_combine['Minimum temperature (Degree C)']
df_combine.head()
But I stil get NaN value for all rows in some columns.
CodePudding user response:
I think you want:
df_combine = pd.concat([df_max, df_min], ignore_index = True, axis=1)
CodePudding user response:
Add axis=1
concatenate the dataframes by column and remove ignore_index=True
to keep the column name.
df_combine = pd.concat([df_max, df_min], axis=1)
As there are repeated columns, pd.merge
might work better, it works like sql join.
df_combine = pd.merge(df_max, df_min, on=['Year', 'Month', 'Product Code'], how='outer')
By default, axis=0 which means the dataframes are concatenated by row.
Example
df1 = pd.DataFrame([[1,2,3],[11,12,13]], columns=['a','b','c'])
df1
a b c
0 1 2 3
1 11 12 13
df2 = pd.DataFrame([[1,2,6],[14,15,16]], columns=['a','b','f'])
df2
a b f
0 1 2 6
1 14 15 16
pd.concat([df1, df2], ignore_index = True)
a b c f
0 1 2 3.0 NaN
1 11 12 13.0 NaN
2 1 2 NaN 6.0
3 14 15 NaN 16.0
pd.concat([df1, df2], axis=1, ignore_index = True)
0 1 2 3 4 5
0 1 2 3 1 2 6
1 11 12 13 14 15 16
pd.concat([df1, df2], axis=1)
a b c a b f
0 1 2 3 1 2 6
1 11 12 13 14 15 16
pd.merge(df1, df2, on=['a','b'], how='outer')
a b c f
0 1 2 3.0 6.0
1 11 12 13.0 NaN
2 14 15 NaN 16.0