Home > Blockchain >  Replace Values With Groupby
Replace Values With Groupby

Time:11-16

I have a data frame with columns of numbers one of which contains some 'tbd' values. I want to replace those 'tbd' values with mean values from the groupby of the other columns. I understand that you can fillna values using the following

df['col_d'] = df.groupby(['col_a', 'col_b'])['col_c'].transform(lambda x: x.fillna(x.round().mean()))

I cant use fillna though as I already have na values and I don't want to drop them and append them as I will need them at a later stage. I understand I could use mapping but Im wondering is there a more elegant replace method?

Input col_b contains NaNs and 'tbd' I want to use a group by of col_a and col_c to fill the 'tbd'

col_a   col_b   col_c
  1       1       1
  2       2       2
  3     'tbd'     3
  3     'tbd'     3
  5       5       5
  6      NaN      6

output used the mean of col_a and col_c

col_a   col_b   col_c
  1       1       1
  2       2       2
  6       7       8
  5       6       7
  5       5       5
  6      NaN      6 

simple example but I believe it explains the issue it's essentially replacing the fillna method of the example lambda function I listed with replace method instead

CodePudding user response:

You can do with np.where, and use to_numeric to change to nan the 'tbd' to be able to perform the mean operation.

# dummy data
df = pd.DataFrame({
    'col_a':['a']*8, 'col_b':['b','c']*4,
    'col_c':[1,1,np.nan,np.nan,'tbd',3,4,'tbd']
})

df['col_d'] = np.where(
    df['col_c'].eq('tbd'), 
    pd.to_numeric(df['col_c'], errors='coerce')
      .groupby([df['col_a'],df['col_b']]).transform('mean'), 
    df['col_c']
)

print(df)
#   col_a col_b col_c col_d
# 0     a     b     1     1
# 1     a     c     1     1
# 2     a     b   NaN   NaN
# 3     a     c   NaN   NaN
# 4     a     b   tbd   2.5
# 5     a     c     3     3
# 6     a     b     4     4
# 7     a     c   tbd   2.0
  • Related