Assume we have a table looks like the following:
id | week_num | people | date | level | a | b |
---|---|---|---|---|---|---|
1 | 1 | 20 | 1990101 | 1 | 2 | 3 |
1 | 2 | 30 | 1990108 | 1 | 2 | 3 |
1 | 3 | 40 | 1990115 | 1 | 2 | 3 |
1 | 5 | 100 | 1990129 | 1 | 2 | 3 |
1 | 7 | 100 | 1990212 | 1 | 2 | 3 |
week_num skip the "4" and "6" because the corresponding "people" is 0. However, we want the all the rows included like the following table.
id | week_num | people | date | level | a | b |
---|---|---|---|---|---|---|
1 | 1 | 20 | 1990101 | 1 | 2 | 3 |
1 | 2 | 30 | 1990108 | 1 | 2 | 3 |
1 | 3 | 40 | 1990115 | 1 | 2 | 3 |
1 | 4 | 0 | 1990122 | 1 | 2 | 3 |
1 | 5 | 100 | 1990129 | 1 | 2 | 3 |
1 | 6 | 0 | 1990205 | 1 | 2 | 3 |
1 | 7 | 100 | 1990212 | 1 | 2 | 3 |
The date starts with 1990101, the next row must 7 days if it is a continuous week_num(Ex: 1,2 is continuous; 1,3 is not).
How can we use python(pandas) to achieve this goal?
Note: Each id has 10 week_num(1,2,3,...,10), the output must include all "week_num" with corresponding "people" and "date".
Update: Other columns like "level","a","b" should stay the same even we add the skipped week_num.
CodePudding user response:
This assumes that the date restarts at 1990-01-01 for each id:
import itertools
# reindex to get all combinations of ids and week numbers
df_full = (df.set_index(["id", "week_num"])
.reindex(list(itertools.product([1,2], range(1, 11))))
.reset_index())
# fill people with zero
df_full = df_full.fillna({"people": 0})
# forward fill some other columns
cols_ffill = ["level", "a", "b"]
df_full[cols_ffill] = df_full[cols_ffill].ffill()
# reconstruct date from week starting from 1990-01-01 for each id
df_full["date"] = pd.to_datetime("1990-01-01") (df_full.week_num - 1) * pd.Timedelta("1w")
df_full
# out:
id week_num people date level a b
0 1 1 20.0 1990-01-01 1.0 2.0 3.0
1 1 2 30.0 1990-01-08 1.0 2.0 3.0
2 1 3 40.0 1990-01-15 1.0 2.0 3.0
3 1 4 0.0 1990-01-22 1.0 2.0 3.0
4 1 5 100.0 1990-01-29 1.0 2.0 3.0
5 1 6 0.0 1990-02-05 1.0 2.0 3.0
6 1 7 100.0 1990-02-12 1.0 2.0 3.0
7 1 8 0.0 1990-02-19 1.0 2.0 3.0
8 1 9 0.0 1990-02-26 1.0 2.0 3.0
9 1 10 0.0 1990-03-05 1.0 2.0 3.0
10 2 1 0.0 1990-01-01 1.0 2.0 3.0
11 2 2 0.0 1990-01-08 1.0 2.0 3.0
12 2 3 0.0 1990-01-15 1.0 2.0 3.0
13 2 4 0.0 1990-01-22 1.0 2.0 3.0
14 2 5 0.0 1990-01-29 1.0 2.0 3.0
15 2 6 0.0 1990-02-05 1.0 2.0 3.0
16 2 7 0.0 1990-02-12 1.0 2.0 3.0
17 2 8 0.0 1990-02-19 1.0 2.0 3.0
18 2 9 0.0 1990-02-26 1.0 2.0 3.0
19 2 10 0.0 1990-03-05 1.0 2.0 3.0