Home > other >  Replace dataframe null values with dictionary in Python
Replace dataframe null values with dictionary in Python

Time:11-22

I have a dataframe (really big) what have some null values and I can replaces them because there are two columns: Name and Weight, Name appears many times, sometimes with weight, sometimes not. This is a little example and what I tried for solve it.

First I created the dataframe:

import pandas as pd
import numpy as np

lst1 = ["AA","BB","CC","AA","BB","CC","AA","BB","CC"]
lst2 = [12,np.nan,14,12,15,14,np.nan,np.nan,14]
df = pd.DataFrame(list(zip(lst1,lst2)), columns = ['Name','Weight'])

Next I found the weight values for every name with a dictionary:

df_2 = df.groupby('Name')['Weight'].max()
dic = df_2.to_dict()

Finally I created a loop for replaces it all:

for k,v in dic:
  for i in range(len(df)):
    if df['Weight'][i] == None and k == df['Name'][i]:
      df['Weight'][i] = v
  else:
    print(df)

But it returns the same dataframe and I don't know what more doing. I'll really thanks you if you could help me or give an advice. :)

CodePudding user response:

Maybe you can fill the NaNs in .groupby:

df["Weight"] = df.groupby("Name", group_keys=False)["Weight"].apply(
    lambda x: x.fillna(x.max())
)
print(df)

Prints:

  Name  Weight
0   AA    12.0
1   BB    15.0
2   CC    14.0
3   AA    12.0
4   BB    15.0
5   CC    14.0
6   AA    12.0
7   BB    15.0
8   CC    14.0

OR:

x = df.groupby("Name")["Weight"].max()

df = df.set_index("Name")
df["Weight"] = df["Weight"].fillna(x)
df = df.reset_index()

print(df)

Prints:

  Name  Weight
0   AA    12.0
1   BB    15.0
2   CC    14.0
3   AA    12.0
4   BB    15.0
5   CC    14.0
6   AA    12.0
7   BB    15.0
8   CC    14.0
  • Related