Home > Software design >  How to forward fill null values of one column from the values of another column?
How to forward fill null values of one column from the values of another column?

Time:05-12

I am trying to fill the null values within column 'beginning_daily_count' with the previous index value from the 'end_daily_count'. The starting dataset would be:

d = {
    'id': [1, 1, 1, 1, 1, 2, 2, 2, 2],
    'beginning_daily_count': [30, 33, 37, 46, None, 7, 1, None, 2],
    'end_daily_count': [33, 37, 46, 52, 33, 7, 1, 2, 3],
    'foils': [0, 0, 0, 0, 0, 0, 0, 1, 1]
}

and the desired dataset would be:

d = {
    'id': [1, 1, 1, 1, 1, 2, 2, 2, 2],
    'beginning_daily_count': [30, 33, 37, 46, 52, 33, 1, 1, 2],
    'end_daily_count': [33, 37, 46, 52, 33, 7, 1, 2, 3],
    'foils': [0, 0, 0, 0, 0, 0, 0, 1, 1]
}

I have attempted the following ffill() and iloc() methods, but to no avail. I admittedly have little experience with ffill and iloc.

d.iloc[beginning_daily_count.isna()].values = d.iloc[d.end_daily_count- 1].values
d['beginning_daily_count'].transform(lambda x: x.ffill(d['end_daily_count']))

CodePudding user response:

The DataFrame.fillna method can accept a series as its first argument, so you can pass it a shifted version of your end_daily_count column. Assuming you are OK with potentially sharing data across different ids:

df['beginning_daily_count'] = df['beginning_daily_count'].fillna(df['end_daily_count'].shift())

print(df)
   id  beginning_daily_count  end_daily_count  foils
0   1                   30.0               33      0
1   1                   33.0               37      0
2   1                   37.0               46      0
3   1                   46.0               52      0
4   1                   52.0               33      0
5   2                    7.0                7      0
6   2                    1.0                1      0
7   2                    1.0                2      1
8   2                    2.0                3      1

CodePudding user response:

You can fillna the column with the shifted other column per group (using GroupBy.shift to avoid leaking values from one group to the next one):

df['beginning_daily_count'] = (df['beginning_daily_count']
                               .fillna(df.groupby('id')['end_daily_count'].shift(),
                                       downcast='infer')
                              )

output:

   id  beginning_daily_count  end_daily_count  foils
0   1                     30               33      0
1   1                     33               37      0
2   1                     37               46      0
3   1                     46               52      0
4   1                     52               33      0
5   2                      7                7      0
6   2                      1                1      0
7   2                      1                2      1
8   2                      2                3      1

CodePudding user response:

This will look at the previous index and find the 'end_daily_count' previous when the beginning_daily_count is set to replace

df.replace(np.nan, 'Replace', inplace=True)
df['beginning_daily_count'] = np.where(df['beginning_daily_count'] == 'Replace', df.iloc[df.index - 1]['end_daily_count'], df['beginning_daily_count'])
df['beginning_daily_count'] = df['beginning_daily_count'].astype(int)
df
  • Related