Home > Enterprise >  Replacing missing values in a Pandas DataFrame based on values in another Pandas DataFrame
Replacing missing values in a Pandas DataFrame based on values in another Pandas DataFrame

Time:10-30

I need to replace missing values in a Pandas DataFrame using values from another DataFrame.

df1 = pd.DataFrame({'ID':['1111','2222','3333','4444','5555'],'Test':['T1','T1','T1','T2','T2'], 'Day1': ['P','P','P','P','P'], 'Day2': ['P','P','P','P', 'NaN'], 'Day3':['P','P','NaN','P','NaN'], 'Day4': ['P','P','NaN','P','NaN']})
    ID   Test Day1 Day2 Day3 Day4
0   1111 T1   P    P    P    P
1   2222 T1   P    P    P    P
2   3333 T1   P    P    NaN  NaN
3   4444 T2   P    P    P    P
4   5555 T2   P    NaN  NaN  NaN

df2 = pd.DataFrame({'ID':['3333','5555'], 'Test':['T1','T2'], 'Label': ['OOT-P', 'OOT-T']})
    ID   Test Label
0   3333 T1   OOT-P
1   5555 T2   OOT-T

NaN for Day3 and Day4 for ID=3333 and test=T1 in df1 needs to be replaced with OOT-P from df2.

NaN for Day2, Day3, Day4 for ID=5555 and test=T2 in df1 needs to be replace with OOT-T from df2.

Both dataframes will always have an ID and Test column but the names and number of additional columns in df1 will change. For example, df1 may have Day1, Day2, Day3, Day4, Day5 columns or Week1, Week2, Week3 columns.

CodePudding user response:

You can reshape df2 as Series to fillna in df1.

A few things are needed: replacing the string 'NaN' with actual float NaN, temporarily setting ID/Test as index, and temporarily transposing the dataframe (fillna with Series/dictionary only works on columns)

(df1.replace('NaN', float('nan'))
    .set_index(['ID','Test']).T
    .fillna(df2.set_index(['ID','Test'])['Label'])
    .T.reset_index()
 )

Output:

     ID Test Day1   Day2   Day3   Day4
0  1111   T1    P      P      P      P
1  2222   T1    P      P      P      P
2  3333   T1    P      P  OOT-P  OOT-P
3  4444   T2    P      P      P      P
4  5555   T2    P  OOT-T  OOT-T  OOT-T

CodePudding user response:

You could also make a list of tuples with the IDs that you want to change, and the values that you want to change them to. Then you can simply iterate through the IDs and replace the NaN values with your new values.

IDS_to_Change = df2.ID.values.tolist()

New_Vals = []
for i in range(len(df2.index)):
    New_Vals.append(df2[df2['ID'] == IDS_to_Change[i]].Label[i])

Data_to_Merge = list(zip(IDS_to_Change, New_Vals))

for ids in Data_to_Merge:
    for i in range (len(df1.index)):
        if df1['ID'][i] == ids[0]:
            id_name = ids[0]          
            index = df1[df1['ID'] == id_name].index[0]
            for j in range(len(df1[df1['ID'] == id_name].columns)):
                 if df1[df1['ID'] == id_name].loc[index][j] == 'NaN':
                    df1.iloc[index][j] = ids[1]

Output:

    ID    Test  Day1    Day2    Day3    Day4
0   1111    T1     P       P       P       P
1   2222    T1     P       P       P       P
2   3333    T1     P       P   OOT-P   OOT-P
3   4444    T2     P       P       P       P
4   5555    T2     P   OOT-T   OOT-T   OOT-T

  • Related