I have a problem. I want to merge two dataframes, but instead of NaN
it should be filled with 0
. But only the "new" columns. How could I do that?
What I tried
df3 = pd.merge(df2, grouped_df_one,on=['id', 'host_id'], how='left', fill = 0)
[OUT]
TypeError: merge() got an unexpected keyword argument 'fill'
d = {'host_id': [1, 1, 2],
'id': [10, 11, 20],
'value': ["Hot Water,Cold Water,Kitchen,Coffee",
"Hot Water,Coffee,Something",
"Hot Water,Coffee"]}
df = pd.DataFrame(data=d)
print(df)
d2 = {'host_id': [1, 1, 2, 3],
'id': [10, 11, 20, 30],
'some': ['test1', "test2", "test3", np.nan]}
df2 = pd.DataFrame(data=d2)
print(df2)
df_path = df.copy()
df_path.index = pd.MultiIndex.from_arrays(df_path[['host_id', 'id']].values.T, names=['host_id', 'id'])
df_path = df_path['value'].str.split(',', expand=True)
df_path = df_path.melt(ignore_index=False).dropna()
df_path.reset_index(inplace=True)
one_hot = pd.get_dummies(df_path['value'])
df_one = df_path.drop('value',axis = 1)
df_one = df_path.join(one_hot)
grouped_df_one = df_one.groupby(['id']).max()
grouped_df_one = grouped_df_one.drop(columns=['value', 'variable']).reset_index()
df3 = pd.merge(df2, grouped_df_one,on=['id', 'host_id'], how='left')
df3
host_id id value
0 1 10 Hot Water,Cold Water,Kitchen,Coffee
1 1 11 Hot Water,Coffee,Something
2 2 20 Hot Water,Coffee
host_id id some
0 1 10 test1
1 1 11 test2
2 2 20 test3
3 3 30 NaN
What I got
host_id id some Coffee Cold Water Hot Water Kitchen Something
0 1 10 test1 1.0 1.0 1.0 1.0 0.0
1 1 11 test2 1.0 0.0 1.0 0.0 1.0
2 2 20 test3 1.0 0.0 1.0 0.0 0.0
3 3 30 NaN NaN NaN NaN NaN NaN
What I want
host_id id some Coffee Cold Water Hot Water Kitchen Something
0 1 10 test1 1.0 1.0 1.0 1.0 0.0
1 1 11 test2 1.0 0.0 1.0 0.0 1.0
2 2 20 test3 1.0 0.0 1.0 0.0 0.0
3 3 30 NaN 0 0 0 0 0
CodePudding user response:
You can fill specific columns using
df[list_cols] = df[list_cols].fillna(0)
where list_cols is e.g.
list_cols = ["Coffee", "Cold Water", "Hot Water", "Kitchen", "Something"]
See: Pandas fill multiple columns with 0 when null
CodePudding user response:
Isolate column some and fillna. The code below selects all other columns except some
df.update(df.filter(regex='[^some]', axis=1).fillna(0))
print(df)
CodePudding user response:
I think you can do something like this for every column where you want to replace NaN values:
columns = ["Hot Water", "Cold Water", "Kitchen", "Coffee", "Something"]
for column in columns:
df3[column] = df3[column].replace(np.nan, 0)
CodePudding user response:
Filling NaN or Null values can be solved with pandas.DataFrame.fllna()
If you want only new cells filled and are making new dataframes and merging them as a final step, use fillna()
somewhere between or even when merging, but not after.
df = df.fillna(0)