Home > database >  Resample each ID of a dataframe with a given date range
Resample each ID of a dataframe with a given date range

Time:01-11

I have a dataframe like the one below. Each week, different IDs receive different tests.

date    id  test    received
2023-01-02  a1  a   1
2023-01-02  c3  a   1
2023-01-02  e5  a   1
2023-01-02  b2  b   1
2023-01-02  d4  b   1
2023-01-09  a1  c   1
2023-01-09  b2  c   1
2023-01-09  c3  c   1
d = {
    "date": [
        "2023-01-02",
        "2023-01-02",
        "2023-01-02",
        "2023-01-02",
        "2023-01-02",
        "2023-01-09",
        "2023-01-09",
        "2023-01-09",
    ],
    "id": ["a1", "c3", "e5", "b2", "d4", "a1", "b2", "c3"],
    "test": ["a", "a", "a", "b", "b", "c", "c", "c"],
    "received": [1, 1, 1, 1, 1, 1, 1, 1],
}
df = pd.DataFrame(data=d)

I want to resample it so that every ID is listed beside all the tests administered that week, and received = 1 or 0 depending on if they received it.

week_starting   id  test    received
02/01/2023  a1  a   1
02/01/2023  b2  a   0
02/01/2023  c3  a   1
02/01/2023  d4  a   0
02/01/2023  e5  a   1
02/01/2023  a1  b   0
02/01/2023  b2  b   1
02/01/2023  c3  b   0
02/01/2023  d4  b   1
02/01/2023  e5  b   0
09/01/2023  a1  c   1
09/01/2023  b2  c   1
09/01/2023  c3  c   1
09/01/2023  d4  c   0
09/01/2023  e5  c   0

Resampling by date is covered on StackOverflow, but resampling / padding by ID is not. Help?

CodePudding user response:

Is this what you're looking for?

import itertools
all_combs = itertools.product(df['date'].unique(),\
df['id'].unique(),df['test'].unique())
dff = pd.DataFrame(all_combs, columns=['date', 'id', \'test']).sort_values('test')
dff = pd.merge(dff, df, how='outer').fillna(0)

Output:

          date  id test  received
 0   2023-01-02  a1    a       1.0
 1   2023-01-09  c3    a       0.0
 2   2023-01-09  a1    a       0.0
 3   2023-01-02  d4    a       0.0
 4   2023-01-09  b2    a       0.0
 5   2023-01-02  b2    a       0.0
 6   2023-01-09  e5    a       0.0
 7   2023-01-02  c3    a       1.0
 8   2023-01-09  d4    a       0.0
 9   2023-01-02  e5    a       1.0
 10  2023-01-02  e5    b       0.0
 11  2023-01-09  b2    b       0.0
 12  2023-01-09  e5    b       0.0
 13  2023-01-02  c3    b       0.0
 14  2023-01-02  d4    b       1.0
 15  2023-01-09  d4    b       0.0
 16  2023-01-09  a1    b       0.0
 .......
  • Related