Home > front end >  Adding new rows with new values at some specific columns in pandas
Adding new rows with new values at some specific columns in pandas

Time:04-19

Assume we have a table looks like the following:

id week_num people date level a b
1 1 20 1990101 1 2 3
1 2 30 1990108 1 2 3
1 3 40 1990115 1 2 3
1 5 100 1990129 1 2 3
1 7 100 1990212 1 2 3

week_num skip the "4" and "6" because the corresponding "people" is 0. However, we want the all the rows included like the following table.

id week_num people date level a b
1 1 20 1990101 1 2 3
1 2 30 1990108 1 2 3
1 3 40 1990115 1 2 3
1 4 0 1990122 1 2 3
1 5 100 1990129 1 2 3
1 6 0 1990205 1 2 3
1 7 100 1990212 1 2 3

The date starts with 1990101, the next row must 7 days if it is a continuous week_num(Ex: 1,2 is continuous; 1,3 is not).

How can we use python(pandas) to achieve this goal?

Note: Each id has 10 week_num(1,2,3,...,10), the output must include all "week_num" with corresponding "people" and "date".

Update: Other columns like "level","a","b" should stay the same even we add the skipped week_num.

CodePudding user response:

This assumes that the date restarts at 1990-01-01 for each id:

import itertools
# reindex to get all combinations of ids and week numbers
df_full = (df.set_index(["id", "week_num"])
             .reindex(list(itertools.product([1,2], range(1, 11))))
             .reset_index())
# fill people with zero
df_full = df_full.fillna({"people": 0})
# forward fill some other columns
cols_ffill = ["level", "a", "b"]
df_full[cols_ffill] = df_full[cols_ffill].ffill()
# reconstruct date from week starting from 1990-01-01 for each id
df_full["date"] = pd.to_datetime("1990-01-01")   (df_full.week_num - 1) * pd.Timedelta("1w")
df_full

# out:
    id  week_num  people       date  level    a    b
0    1         1    20.0 1990-01-01    1.0  2.0  3.0
1    1         2    30.0 1990-01-08    1.0  2.0  3.0
2    1         3    40.0 1990-01-15    1.0  2.0  3.0
3    1         4     0.0 1990-01-22    1.0  2.0  3.0
4    1         5   100.0 1990-01-29    1.0  2.0  3.0
5    1         6     0.0 1990-02-05    1.0  2.0  3.0
6    1         7   100.0 1990-02-12    1.0  2.0  3.0
7    1         8     0.0 1990-02-19    1.0  2.0  3.0
8    1         9     0.0 1990-02-26    1.0  2.0  3.0
9    1        10     0.0 1990-03-05    1.0  2.0  3.0
10   2         1     0.0 1990-01-01    1.0  2.0  3.0
11   2         2     0.0 1990-01-08    1.0  2.0  3.0
12   2         3     0.0 1990-01-15    1.0  2.0  3.0
13   2         4     0.0 1990-01-22    1.0  2.0  3.0
14   2         5     0.0 1990-01-29    1.0  2.0  3.0
15   2         6     0.0 1990-02-05    1.0  2.0  3.0
16   2         7     0.0 1990-02-12    1.0  2.0  3.0
17   2         8     0.0 1990-02-19    1.0  2.0  3.0
18   2         9     0.0 1990-02-26    1.0  2.0  3.0
19   2        10     0.0 1990-03-05    1.0  2.0  3.0
  • Related