Fill merged columns with 0 instead of NaN-CodePudding

I have a problem. I want to merge two dataframes, but instead of NaN it should be filled with 0. But only the "new" columns. How could I do that?

What I tried

df3 = pd.merge(df2, grouped_df_one,on=['id', 'host_id'], how='left', fill = 0)
[OUT]
TypeError: merge() got an unexpected keyword argument 'fill'

d = {'host_id': [1, 1, 2],
     'id': [10, 11, 20],
     'value': ["Hot Water,Cold Water,Kitchen,Coffee", 
               "Hot Water,Coffee,Something",
               "Hot Water,Coffee"]}
df = pd.DataFrame(data=d)
print(df)


d2 = {'host_id': [1, 1, 2, 3],
     'id': [10, 11, 20, 30],
     'some': ['test1', "test2", "test3", np.nan]}
df2 = pd.DataFrame(data=d2)
print(df2)

df_path = df.copy()
df_path.index = pd.MultiIndex.from_arrays(df_path[['host_id', 'id']].values.T, names=['host_id', 'id'])
df_path = df_path['value'].str.split(',', expand=True)
df_path = df_path.melt(ignore_index=False).dropna()
df_path.reset_index(inplace=True)

one_hot = pd.get_dummies(df_path['value'])
df_one = df_path.drop('value',axis = 1)
df_one = df_path.join(one_hot)

grouped_df_one = df_one.groupby(['id']).max()
grouped_df_one = grouped_df_one.drop(columns=['value', 'variable']).reset_index()

df3 = pd.merge(df2, grouped_df_one,on=['id', 'host_id'], how='left')
df3

   host_id  id                                value
0        1  10  Hot Water,Cold Water,Kitchen,Coffee
1        1  11           Hot Water,Coffee,Something
2        2  20                     Hot Water,Coffee

   host_id  id   some
0        1  10  test1
1        1  11  test2
2        2  20  test3
3        3  30    NaN

What I got

   host_id  id   some  Coffee  Cold Water  Hot Water  Kitchen  Something
0        1  10  test1     1.0         1.0        1.0      1.0        0.0
1        1  11  test2     1.0         0.0        1.0      0.0        1.0
2        2  20  test3     1.0         0.0        1.0      0.0        0.0
3        3  30    NaN     NaN         NaN        NaN      NaN        NaN

What I want

   host_id  id   some  Coffee  Cold Water  Hot Water  Kitchen  Something
0        1  10  test1     1.0         1.0        1.0      1.0        0.0
1        1  11  test2     1.0         0.0        1.0      0.0        1.0
2        2  20  test3     1.0         0.0        1.0      0.0        0.0
3        3  30    NaN       0           0          0        0          0

CodePudding user response：

You can fill specific columns using

df[list_cols] = df[list_cols].fillna(0)

where list_cols is e.g.

list_cols = ["Coffee", "Cold Water", "Hot Water", "Kitchen", "Something"]

See: Pandas fill multiple columns with 0 when null

CodePudding user response：

Isolate column some and fillna. The code below selects all other columns except some

df.update(df.filter(regex='[^some]', axis=1).fillna(0))



 print(df)

CodePudding user response：

I think you can do something like this for every column where you want to replace NaN values:

columns = ["Hot Water", "Cold Water", "Kitchen", "Coffee", "Something"]
for column in columns:
    df3[column] = df3[column].replace(np.nan, 0)

CodePudding user response：

Filling NaN or Null values can be solved with pandas.DataFrame.fllna()

If you want only new cells filled and are making new dataframes and merging them as a final step, use fillna() somewhere between or even when merging, but not after.

df = df.fillna(0)