Home > database >  Filling Missing Date Column using groupby method
Filling Missing Date Column using groupby method

Time:03-24

I have a dataframe that looks something like:


 --- ---- --------------- ------------ ------------ 
|   | id |     date1     |   date2    | days_ahead |
 --- ---- --------------- ------------ ------------ 
| 0 |  1 |    2021-10-21 | 2021-10-24 | 3          |
| 1 |  1 |    2021-10-22 | NaN        | NaN        |
| 2 |  1 |    2021-11-16 | 2021-11-24 | 8          |
| 3 |  2 |    2021-10-22 | 2021-10-24 | 2          |
| 4 |  2 |    2021-10-22 | 2021-10-24 | 2          |
| 5 |  3 |    2021-10-26 | 2021-10-31 | 5          |
| 6 |  3 |    2021-10-30 | 2021-11-04 | 5          |
| 7 |  3 |    2021-11-02 | NaN        | NaN        |
| 8 |  3 |    2021-11-04 | 2021-11-04 | 0          |
| 9 |  4 |    2021-10-28 | NaN        | NaN        |
 --- ---- --------------- ------------ ------------ 

I am trying to fill the missing data with the days_ahead median of each id group,

For example:
Median of id 1 = 5.5 which rounds to 6
filled value of date2 at index 1 should be 2021-10-28

Similarly, for id 3 Median = 5
filled value of date2 at index 7 should be 2021-11-07

And, for id 4 Median = NaN
filled value of date2 at index 9 should be 2021-10-28


I Tried

df['date2'].fillna(df.groupby('id')['days_ahead'].transform('median'), inplace = True)

But this fills with int values.

Although, I can use lambda and apply methods to identify int and turn it to date, How do I directly use groupby and fillna together?

CodePudding user response:

You can round values with convert to_timedelta, add to date1 with fill_valueparameter and replace missing values:

df['date1'] = pd.to_datetime(df['date1'])
df['date2'] = pd.to_datetime(df['date2'])

td = pd.to_timedelta(df.groupby('id')['days_ahead'].transform('median').round(), unit='d')
df['date2'] = df['date2'].fillna(df['date1'].add(td, fill_value=pd.Timedelta(0)))

print (df)
   id      date1      date2  days_ahead
0   1 2021-10-21 2021-10-24         3.0
1   1 2021-10-22 2021-10-28         NaN
2   1 2021-11-16 2021-11-24         8.0
3   2 2021-10-22 2021-10-24         2.0
4   2 2021-10-22 2021-10-24         2.0
5   3 2021-10-26 2021-10-31         5.0
6   3 2021-10-30 2021-11-04         5.0
7   3 2021-11-02 2021-11-07         NaN
8   3 2021-11-04 2021-11-04         0.0
9   4 2021-10-28 2021-10-28         NaN
  • Related