Home > Blockchain >  Fill in values in other columns based on missing dates in another columns - Pandas
Fill in values in other columns based on missing dates in another columns - Pandas

Time:12-02

I am currently having a similar need to the question in this thread, but it looks like it cannot fill in the dates if the min and max dates of the given date column does not fall into the first and last day of a given month and year. In particular, assume this dataframe

df = pd.DataFrame({'user': ['a','a','b','b','c','c','c'], 'dt': ['2016-01-05','2016-01-08', '2016-01-10','2016-01-15','2016-01-16', '2016-01-22', '2016-01-19'], 'val': [1,33,2,1,5,5,6], 'price': [1,2,1,1,2,5.5,4.2]})

  user          dt  val  price
0    a  2016-01-05    1    1.0
1    a  2016-01-08   33    2.0
2    b  2016-01-10    2    1.0
3    b  2016-01-15    1    1.0
4    c  2016-01-16    5    2.0
5    c  2016-01-22    5    5.5
6    c  2016-01-19    6    4.2

Using the code in the first answer of the above thread, the resulting dataframe can only fill in 0 values for all dates between 2016-01-05 and 2016-01-22. It could not do the same thing on dates between 2016-01-01 and 2016-01-04, OR from 2016-01-23 to 2016-01-31. I wonder if anyone could help address this point, as I currently have a need to accomplish the fill-in for every missing dates within a given month and year?

Expected Output

  user          dt  val  price
0    a  2016-01-01    0    0.0
1    a  2016-01-02    0    0.0
2    a  2016-01-03    0    0.0
3    a  2016-01-04    0    0.0
4    a  2016-01-05    1    1.0
5    a  2016-01-06    0    0.0
6    a  2016-01-07    0    0.0
7    a  2016-01-08   33    2.0
8    a  2016-01-09    0    0.0
9    a  2016-01-10    0    0.0
10   a  2016-01-11    0    0.0
11   a  2016-01-12    0    0.0
12   a  2016-01-13    0    0.0
13   a  2016-01-14    0    0.0
14   a  2016-01-15    0    0.0
15   a  2016-01-16    0    0.0
16   a  2016-01-17    0    0.0
17   a  2016-01-18    0    0.0
18   a  2016-01-19    0    0.0
19   a  2016-01-20    0    0.0
20   a  2016-01-21    0    0.0
21   a  2016-01-22    0    0.0
22   a  2016-01-23    0    0.0
23   a  2016-01-24    0    0.0
24   a  2016-01-25    0    0.0
25   a  2016-01-26    0    0.0
26   a  2016-01-27    0    0.0
27   a  2016-01-28    0    0.0
28   a  2016-01-29    0    0.0
29   a  2016-01-30    0    0.0
30   a  2016-01-31    0    0.0
31   b  2016-01-01    0    0.0
32   b  2016-01-02    0    0.0
33   b  2016-01-03    0    0.0
34   b  2016-01-04    0    0.0
35   b  2016-01-05    0    0.0
36   b  2016-01-06    0    0.0
37   b  2016-01-07    0    0.0
38   b  2016-01-08    0    0.0
39   b  2016-01-09    0    0.0
40   b  2016-01-10    2    1.0
41   b  2016-01-11    0    0.0
42   b  2016-01-12    0    0.0
43   b  2016-01-13    0    0.0
44   b  2016-01-14    0    0.0
45   b  2016-01-15    1    1.0
46   b  2016-01-16    0    0.0
47   b  2016-01-17    0    0.0
48   b  2016-01-18    0    0.0
49   b  2016-01-19    0    0.0
50   b  2016-01-20    0    0.0
51   b  2016-01-21    0    0.0
52   b  2016-01-22    0    0.0
53   b  2016-01-23    0    0.0
54   b  2016-01-24    0    0.0
55   b  2016-01-25    0    0.0
56   b  2016-01-26    0    0.0
57   b  2016-01-27    0    0.0
58   b  2016-01-28    0    0.0
59   b  2016-01-29    0    0.0
60   b  2016-01-30    0    0.0
61   b  2016-01-31    0    0.0
62   c  2016-01-01    0    0.0
63   c  2016-01-02    0    0.0
64   c  2016-01-03    0    0.0
65   c  2016-01-04    0    0.0
66   c  2016-01-05    0    0.0
67   c  2016-01-06    0    0.0
68   c  2016-01-07    0    0.0
69   c  2016-01-08    0    0.0
70   c  2016-01-09    0    0.0
71   c  2016-01-10    2    1.0
72   c  2016-01-11    0    0.0
73   c  2016-01-12    0    0.0
74   c  2016-01-13    0    0.0
75   c  2016-01-14    0    0.0
76   c  2016-01-15    1    1.0
77   c  2016-01-16    5    2.0
78   c  2016-01-17    0    0.0
79   c  2016-01-18    0    0.0
80   c  2016-01-19    6    4.2
81   c  2016-01-20    0    0.0
82   c  2016-01-21    0    0.0
83   c  2016-01-22    5    5.5
84   c  2016-01-23    0    0.0
85   c  2016-01-24    0    0.0
86   c  2016-01-25    0    0.0
87   c  2016-01-26    0    0.0
88   c  2016-01-27    0    0.0
89   c  2016-01-28    0    0.0
90   c  2016-01-29    0    0.0
91   c  2016-01-30    0    0.0
92   c  2016-01-31    0    0.0

CodePudding user response:

You can use:

df['dt'] = pd.to_datetime(df['dt'])

(df.set_index('dt')
   .groupby('user', as_index=False)
   .apply(lambda d: d.reindex(pd.date_range(d.index.min(), d.index.max()),
                              fill_value=0
                             ))
   .reset_index(-1)
)

If you want to round to month start/end:

(df.set_index('dt')
   .groupby('user', as_index=False)
   .apply(lambda d: d.reindex(pd.date_range(d.index.min()-pd.offsets.MonthBegin(1),
                                            d.index.max() pd.offsets.MonthEnd(1)
                                           ).rename('id'),
                              fill_value=0)
                             )
   .reset_index('id')
)

Output:

           id user  val  price
0  2016-01-01    0    0    0.0
0  2016-01-02    0    0    0.0
0  2016-01-03    0    0    0.0
0  2016-01-04    0    0    0.0
0  2016-01-05    a    1    1.0
..        ...  ...  ...    ...
2  2016-01-27    0    0    0.0
2  2016-01-28    0    0    0.0
2  2016-01-29    0    0    0.0
2  2016-01-30    0    0    0.0
2  2016-01-31    0    0    0.0

[93 rows x 4 columns]

CodePudding user response:

Ok, so you just need to define your own pd.date_range, then build a new MultiIndex to get the daily data for each user and use pd.DataFrame.reindex.

df["dt"] = pd.to_datetime(df["dt"])
df = df.set_index(["user", "dt"])

daily_idx = pd.date_range(start="2016-01-01", end="2016-01-31", freq="D")

new_idx = pd.MultiIndex.from_product(
    [df.index.get_level_values("user").unique(), daily_idx], names=["user", "daily"]
)
out = df.reindex(new_idx, fill_value=0).reset_index()
print(out)
   user      daily  val  price
0     a 2016-01-01    0    0.0
1     a 2016-01-02    0    0.0
2     a 2016-01-03    0    0.0
3     a 2016-01-04    0    0.0
4     a 2016-01-05    1    1.0
5     a 2016-01-06    0    0.0
6     a 2016-01-07    0    0.0
7     a 2016-01-08   33    2.0
8     a 2016-01-09    0    0.0
9     a 2016-01-10    0    0.0
10    a 2016-01-11    0    0.0
11    a 2016-01-12    0    0.0
12    a 2016-01-13    0    0.0
13    a 2016-01-14    0    0.0
14    a 2016-01-15    0    0.0
15    a 2016-01-16    0    0.0
16    a 2016-01-17    0    0.0
17    a 2016-01-18    0    0.0
18    a 2016-01-19    0    0.0
19    a 2016-01-20    0    0.0
20    a 2016-01-21    0    0.0
21    a 2016-01-22    0    0.0
22    a 2016-01-23    0    0.0
23    a 2016-01-24    0    0.0
24    a 2016-01-25    0    0.0
25    a 2016-01-26    0    0.0
26    a 2016-01-27    0    0.0
27    a 2016-01-28    0    0.0
28    a 2016-01-29    0    0.0
29    a 2016-01-30    0    0.0
30    a 2016-01-31    0    0.0
31    b 2016-01-01    0    0.0
32    b 2016-01-02    0    0.0
33    b 2016-01-03    0    0.0
34    b 2016-01-04    0    0.0
35    b 2016-01-05    0    0.0
36    b 2016-01-06    0    0.0
37    b 2016-01-07    0    0.0
38    b 2016-01-08    0    0.0
39    b 2016-01-09    0    0.0
40    b 2016-01-10    2    1.0
41    b 2016-01-11    0    0.0
42    b 2016-01-12    0    0.0
43    b 2016-01-13    0    0.0
44    b 2016-01-14    0    0.0
45    b 2016-01-15    1    1.0
46    b 2016-01-16    0    0.0
47    b 2016-01-17    0    0.0
48    b 2016-01-18    0    0.0
49    b 2016-01-19    0    0.0
50    b 2016-01-20    0    0.0
51    b 2016-01-21    0    0.0
52    b 2016-01-22    0    0.0
53    b 2016-01-23    0    0.0
54    b 2016-01-24    0    0.0
55    b 2016-01-25    0    0.0
56    b 2016-01-26    0    0.0
57    b 2016-01-27    0    0.0
58    b 2016-01-28    0    0.0
59    b 2016-01-29    0    0.0
60    b 2016-01-30    0    0.0
61    b 2016-01-31    0    0.0
62    c 2016-01-01    0    0.0
63    c 2016-01-02    0    0.0
64    c 2016-01-03    0    0.0
65    c 2016-01-04    0    0.0
66    c 2016-01-05    0    0.0
67    c 2016-01-06    0    0.0
68    c 2016-01-07    0    0.0
69    c 2016-01-08    0    0.0
70    c 2016-01-09    0    0.0
71    c 2016-01-10    0    0.0
72    c 2016-01-11    0    0.0
73    c 2016-01-12    0    0.0
74    c 2016-01-13    0    0.0
75    c 2016-01-14    0    0.0
76    c 2016-01-15    0    0.0
77    c 2016-01-16    5    2.0
78    c 2016-01-17    0    0.0
79    c 2016-01-18    0    0.0
80    c 2016-01-19    6    4.2
81    c 2016-01-20    0    0.0
82    c 2016-01-21    0    0.0
83    c 2016-01-22    5    5.5
84    c 2016-01-23    0    0.0
85    c 2016-01-24    0    0.0
86    c 2016-01-25    0    0.0
87    c 2016-01-26    0    0.0
88    c 2016-01-27    0    0.0
89    c 2016-01-28    0    0.0
90    c 2016-01-29    0    0.0
91    c 2016-01-30    0    0.0
92    c 2016-01-31    0    0.0
  • Related