I am currently having a similar need to the question in this thread, but it looks like it cannot fill in the dates
if the min
and max
dates of the given date
column does not fall into the first and last day of a given month and year. In particular, assume this dataframe
df = pd.DataFrame({'user': ['a','a','b','b','c','c','c'], 'dt': ['2016-01-05','2016-01-08', '2016-01-10','2016-01-15','2016-01-16', '2016-01-22', '2016-01-19'], 'val': [1,33,2,1,5,5,6], 'price': [1,2,1,1,2,5.5,4.2]})
user dt val price
0 a 2016-01-05 1 1.0
1 a 2016-01-08 33 2.0
2 b 2016-01-10 2 1.0
3 b 2016-01-15 1 1.0
4 c 2016-01-16 5 2.0
5 c 2016-01-22 5 5.5
6 c 2016-01-19 6 4.2
Using the code in the first answer of the above thread, the resulting dataframe can only fill in 0
values for all dates
between 2016-01-05
and 2016-01-22
. It could not do the same thing on dates between 2016-01-01
and 2016-01-04
, OR from 2016-01-23
to 2016-01-31.
I wonder if anyone could help address this point, as I currently have a need to accomplish the fill-in for every missing dates within a given month and year?
Expected Output
user dt val price
0 a 2016-01-01 0 0.0
1 a 2016-01-02 0 0.0
2 a 2016-01-03 0 0.0
3 a 2016-01-04 0 0.0
4 a 2016-01-05 1 1.0
5 a 2016-01-06 0 0.0
6 a 2016-01-07 0 0.0
7 a 2016-01-08 33 2.0
8 a 2016-01-09 0 0.0
9 a 2016-01-10 0 0.0
10 a 2016-01-11 0 0.0
11 a 2016-01-12 0 0.0
12 a 2016-01-13 0 0.0
13 a 2016-01-14 0 0.0
14 a 2016-01-15 0 0.0
15 a 2016-01-16 0 0.0
16 a 2016-01-17 0 0.0
17 a 2016-01-18 0 0.0
18 a 2016-01-19 0 0.0
19 a 2016-01-20 0 0.0
20 a 2016-01-21 0 0.0
21 a 2016-01-22 0 0.0
22 a 2016-01-23 0 0.0
23 a 2016-01-24 0 0.0
24 a 2016-01-25 0 0.0
25 a 2016-01-26 0 0.0
26 a 2016-01-27 0 0.0
27 a 2016-01-28 0 0.0
28 a 2016-01-29 0 0.0
29 a 2016-01-30 0 0.0
30 a 2016-01-31 0 0.0
31 b 2016-01-01 0 0.0
32 b 2016-01-02 0 0.0
33 b 2016-01-03 0 0.0
34 b 2016-01-04 0 0.0
35 b 2016-01-05 0 0.0
36 b 2016-01-06 0 0.0
37 b 2016-01-07 0 0.0
38 b 2016-01-08 0 0.0
39 b 2016-01-09 0 0.0
40 b 2016-01-10 2 1.0
41 b 2016-01-11 0 0.0
42 b 2016-01-12 0 0.0
43 b 2016-01-13 0 0.0
44 b 2016-01-14 0 0.0
45 b 2016-01-15 1 1.0
46 b 2016-01-16 0 0.0
47 b 2016-01-17 0 0.0
48 b 2016-01-18 0 0.0
49 b 2016-01-19 0 0.0
50 b 2016-01-20 0 0.0
51 b 2016-01-21 0 0.0
52 b 2016-01-22 0 0.0
53 b 2016-01-23 0 0.0
54 b 2016-01-24 0 0.0
55 b 2016-01-25 0 0.0
56 b 2016-01-26 0 0.0
57 b 2016-01-27 0 0.0
58 b 2016-01-28 0 0.0
59 b 2016-01-29 0 0.0
60 b 2016-01-30 0 0.0
61 b 2016-01-31 0 0.0
62 c 2016-01-01 0 0.0
63 c 2016-01-02 0 0.0
64 c 2016-01-03 0 0.0
65 c 2016-01-04 0 0.0
66 c 2016-01-05 0 0.0
67 c 2016-01-06 0 0.0
68 c 2016-01-07 0 0.0
69 c 2016-01-08 0 0.0
70 c 2016-01-09 0 0.0
71 c 2016-01-10 2 1.0
72 c 2016-01-11 0 0.0
73 c 2016-01-12 0 0.0
74 c 2016-01-13 0 0.0
75 c 2016-01-14 0 0.0
76 c 2016-01-15 1 1.0
77 c 2016-01-16 5 2.0
78 c 2016-01-17 0 0.0
79 c 2016-01-18 0 0.0
80 c 2016-01-19 6 4.2
81 c 2016-01-20 0 0.0
82 c 2016-01-21 0 0.0
83 c 2016-01-22 5 5.5
84 c 2016-01-23 0 0.0
85 c 2016-01-24 0 0.0
86 c 2016-01-25 0 0.0
87 c 2016-01-26 0 0.0
88 c 2016-01-27 0 0.0
89 c 2016-01-28 0 0.0
90 c 2016-01-29 0 0.0
91 c 2016-01-30 0 0.0
92 c 2016-01-31 0 0.0
CodePudding user response:
You can use:
df['dt'] = pd.to_datetime(df['dt'])
(df.set_index('dt')
.groupby('user', as_index=False)
.apply(lambda d: d.reindex(pd.date_range(d.index.min(), d.index.max()),
fill_value=0
))
.reset_index(-1)
)
If you want to round to month start/end:
(df.set_index('dt')
.groupby('user', as_index=False)
.apply(lambda d: d.reindex(pd.date_range(d.index.min()-pd.offsets.MonthBegin(1),
d.index.max() pd.offsets.MonthEnd(1)
).rename('id'),
fill_value=0)
)
.reset_index('id')
)
Output:
id user val price
0 2016-01-01 0 0 0.0
0 2016-01-02 0 0 0.0
0 2016-01-03 0 0 0.0
0 2016-01-04 0 0 0.0
0 2016-01-05 a 1 1.0
.. ... ... ... ...
2 2016-01-27 0 0 0.0
2 2016-01-28 0 0 0.0
2 2016-01-29 0 0 0.0
2 2016-01-30 0 0 0.0
2 2016-01-31 0 0 0.0
[93 rows x 4 columns]
CodePudding user response:
Ok, so you just need to define your own pd.date_range
, then build a new MultiIndex to get the daily data for each user and use pd.DataFrame.reindex
.
df["dt"] = pd.to_datetime(df["dt"])
df = df.set_index(["user", "dt"])
daily_idx = pd.date_range(start="2016-01-01", end="2016-01-31", freq="D")
new_idx = pd.MultiIndex.from_product(
[df.index.get_level_values("user").unique(), daily_idx], names=["user", "daily"]
)
out = df.reindex(new_idx, fill_value=0).reset_index()
print(out)
user daily val price
0 a 2016-01-01 0 0.0
1 a 2016-01-02 0 0.0
2 a 2016-01-03 0 0.0
3 a 2016-01-04 0 0.0
4 a 2016-01-05 1 1.0
5 a 2016-01-06 0 0.0
6 a 2016-01-07 0 0.0
7 a 2016-01-08 33 2.0
8 a 2016-01-09 0 0.0
9 a 2016-01-10 0 0.0
10 a 2016-01-11 0 0.0
11 a 2016-01-12 0 0.0
12 a 2016-01-13 0 0.0
13 a 2016-01-14 0 0.0
14 a 2016-01-15 0 0.0
15 a 2016-01-16 0 0.0
16 a 2016-01-17 0 0.0
17 a 2016-01-18 0 0.0
18 a 2016-01-19 0 0.0
19 a 2016-01-20 0 0.0
20 a 2016-01-21 0 0.0
21 a 2016-01-22 0 0.0
22 a 2016-01-23 0 0.0
23 a 2016-01-24 0 0.0
24 a 2016-01-25 0 0.0
25 a 2016-01-26 0 0.0
26 a 2016-01-27 0 0.0
27 a 2016-01-28 0 0.0
28 a 2016-01-29 0 0.0
29 a 2016-01-30 0 0.0
30 a 2016-01-31 0 0.0
31 b 2016-01-01 0 0.0
32 b 2016-01-02 0 0.0
33 b 2016-01-03 0 0.0
34 b 2016-01-04 0 0.0
35 b 2016-01-05 0 0.0
36 b 2016-01-06 0 0.0
37 b 2016-01-07 0 0.0
38 b 2016-01-08 0 0.0
39 b 2016-01-09 0 0.0
40 b 2016-01-10 2 1.0
41 b 2016-01-11 0 0.0
42 b 2016-01-12 0 0.0
43 b 2016-01-13 0 0.0
44 b 2016-01-14 0 0.0
45 b 2016-01-15 1 1.0
46 b 2016-01-16 0 0.0
47 b 2016-01-17 0 0.0
48 b 2016-01-18 0 0.0
49 b 2016-01-19 0 0.0
50 b 2016-01-20 0 0.0
51 b 2016-01-21 0 0.0
52 b 2016-01-22 0 0.0
53 b 2016-01-23 0 0.0
54 b 2016-01-24 0 0.0
55 b 2016-01-25 0 0.0
56 b 2016-01-26 0 0.0
57 b 2016-01-27 0 0.0
58 b 2016-01-28 0 0.0
59 b 2016-01-29 0 0.0
60 b 2016-01-30 0 0.0
61 b 2016-01-31 0 0.0
62 c 2016-01-01 0 0.0
63 c 2016-01-02 0 0.0
64 c 2016-01-03 0 0.0
65 c 2016-01-04 0 0.0
66 c 2016-01-05 0 0.0
67 c 2016-01-06 0 0.0
68 c 2016-01-07 0 0.0
69 c 2016-01-08 0 0.0
70 c 2016-01-09 0 0.0
71 c 2016-01-10 0 0.0
72 c 2016-01-11 0 0.0
73 c 2016-01-12 0 0.0
74 c 2016-01-13 0 0.0
75 c 2016-01-14 0 0.0
76 c 2016-01-15 0 0.0
77 c 2016-01-16 5 2.0
78 c 2016-01-17 0 0.0
79 c 2016-01-18 0 0.0
80 c 2016-01-19 6 4.2
81 c 2016-01-20 0 0.0
82 c 2016-01-21 0 0.0
83 c 2016-01-22 5 5.5
84 c 2016-01-23 0 0.0
85 c 2016-01-24 0 0.0
86 c 2016-01-25 0 0.0
87 c 2016-01-26 0 0.0
88 c 2016-01-27 0 0.0
89 c 2016-01-28 0 0.0
90 c 2016-01-29 0 0.0
91 c 2016-01-30 0 0.0
92 c 2016-01-31 0 0.0