Home > Mobile >  Pandas how to ignore multiindex to use groupby as index when using filna from a lookup dataframe
Pandas how to ignore multiindex to use groupby as index when using filna from a lookup dataframe

Time:01-21

I have a large multi-indexed dataframe and I want to fill values for a columns based on values in another group of columns. It looks like passing a dataframe or dictionary to groupby.fillna() forces you to match on index. How do I fill values using just the groupby values as an index and ignoring the dataframes index?

import pandas as pd
import numpy as np
nan = np.nan 

def print_df(df):
    with pd.option_context('display.max_rows', None,
                       'display.max_columns', None,
                       ):
        print(df)


d1 = {'I1': {0: 1, 1: 1, 2: 2, 3: 2, 4: 2, 5: 2, 6: 2, 7: 2, 8: 2, 9: 2},
 'I2': {0: 1, 1: 2, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 2, 9: 2},
 'I3': {0: 1, 1: 1, 2: 1, 3: 2, 4: 3, 5: 4, 6: 5, 7: 6, 8: 1, 9: 2},
 'A': {0: 1, 1: 1, 2: 2, 3: 2, 4: 2, 5: 2, 6: 1, 7: 1, 8: 1, 9: 1},
 'B': {0: 3, 1: 2, 2: 1, 3: 2, 4: 3, 5: 3, 6: 2, 7: 2, 8: 1, 9: 3},
 'C': {0: 2, 1: 1, 2: 2, 3: 1, 4: 2, 5: 1, 6: 1, 7: 1, 8: 2, 9: 1},
 'D': {0: nan,  1: nan,  2: 7.0,  3: nan,  4: nan,  5: 8.0,  6: 8.0,  7: 4.0,  8: 1.0,  9: nan},
 'E': {0: nan,  1: nan,  2: 1.0,  3: nan,  4: nan,  5: 1.0,  6: 1.0,  7: 1.0,  8: 1.0,  9: nan}}
df1 = pd.DataFrame(d1)
df1.set_index(["I1","I2","I3"], inplace=True)

d2 = {'A': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 2, 7: 2, 8: 2, 9: 2, 10: 2, 11: 2}, 
'B': {0: 1, 1: 1, 2: 2, 3: 2, 4: 3, 5: 3, 6: 1, 7: 1, 8: 2, 9: 2, 10: 3, 11: 3}, 
'C': {0: 1, 1: 2, 2: 1, 3: 2, 4: 1, 5: 2, 6: 1, 7: 2, 8: 1, 9: 2, 10: 1, 11: 2}, 
'D': {0: 5, 1: 1, 2: 4, 3: 3, 4: 2, 5: 2, 6: 3, 7: 5, 8: 5, 9: 4, 10: 4, 11: 3}, 
'E': {0: 5, 1: 5, 2: 5, 3: 5, 4: 5, 5: 5, 6: 5, 7: 5, 8: 5, 9: 5, 10: 5, 11: 5}}

df2 =pd.DataFrame(d2)
df2.set_index(["A","B","C"], inplace=True)


print("dataframe with values to impute")
print_df(df1)

print("lookup values to fill dataframe")
print_df(df2)
# what I expected to work that is instead using the I1 I2 I3 index
df1.groupby(["A","B","C"]).fillna(df2.to_dict())

This is desired output (** shows the imputed rows):

data frame with values to impute
          A  B  C    D    E
I1 I2 I3                   
1  1  1   1  3  2  2.0  5.0  **
   2  1   1  2  1  4.0  5.0  **
2  1  1   2  1  2  7.0  1.0

Relevant lookup values I showed above for reference

       D  E
A B C      
1 1 1  5  5
    2  1  5
  2 1  4  5**
    2  3  5
  3 1  2  5
    2  2  5**

CodePudding user response:

I would first align the two DataFrames with a merge as they are not sharing the same index:

cols = ['A', 'B', 'C']

df1.fillna(df1[cols].merge(df2, left_on=cols, right_index=True, how='left'))

output:

          A  B  C    D    E
I1 I2 I3                   
1  1  1   1  3  2  2.0  5.0
   2  1   1  2  1  4.0  5.0
2  1  1   2  1  2  7.0  1.0
      2   2  2  1  5.0  5.0
      3   2  3  2  3.0  5.0
      4   2  3  1  8.0  1.0
      5   1  2  1  8.0  1.0
      6   1  2  1  4.0  1.0
   2  1   1  1  2  1.0  1.0
      2   1  3  1  2.0  5.0
  • Related