I have a dataframe like the one below. Each week, different IDs receive different tests.
date id test received
2023-01-02 a1 a 1
2023-01-02 c3 a 1
2023-01-02 e5 a 1
2023-01-02 b2 b 1
2023-01-02 d4 b 1
2023-01-09 a1 c 1
2023-01-09 b2 c 1
2023-01-09 c3 c 1
d = {
"date": [
"2023-01-02",
"2023-01-02",
"2023-01-02",
"2023-01-02",
"2023-01-02",
"2023-01-09",
"2023-01-09",
"2023-01-09",
],
"id": ["a1", "c3", "e5", "b2", "d4", "a1", "b2", "c3"],
"test": ["a", "a", "a", "b", "b", "c", "c", "c"],
"received": [1, 1, 1, 1, 1, 1, 1, 1],
}
df = pd.DataFrame(data=d)
I want to resample it so that every ID is listed beside all the tests administered that week, and received = 1 or 0 depending on if they received it.
week_starting id test received
02/01/2023 a1 a 1
02/01/2023 b2 a 0
02/01/2023 c3 a 1
02/01/2023 d4 a 0
02/01/2023 e5 a 1
02/01/2023 a1 b 0
02/01/2023 b2 b 1
02/01/2023 c3 b 0
02/01/2023 d4 b 1
02/01/2023 e5 b 0
09/01/2023 a1 c 1
09/01/2023 b2 c 1
09/01/2023 c3 c 1
09/01/2023 d4 c 0
09/01/2023 e5 c 0
Resampling by date is covered on StackOverflow, but resampling / padding by ID is not. Help?
CodePudding user response:
Is this what you're looking for?
import itertools
all_combs = itertools.product(df['date'].unique(),\
df['id'].unique(),df['test'].unique())
dff = pd.DataFrame(all_combs, columns=['date', 'id', \'test']).sort_values('test')
dff = pd.merge(dff, df, how='outer').fillna(0)
Output:
date id test received
0 2023-01-02 a1 a 1.0
1 2023-01-09 c3 a 0.0
2 2023-01-09 a1 a 0.0
3 2023-01-02 d4 a 0.0
4 2023-01-09 b2 a 0.0
5 2023-01-02 b2 a 0.0
6 2023-01-09 e5 a 0.0
7 2023-01-02 c3 a 1.0
8 2023-01-09 d4 a 0.0
9 2023-01-02 e5 a 1.0
10 2023-01-02 e5 b 0.0
11 2023-01-09 b2 b 0.0
12 2023-01-09 e5 b 0.0
13 2023-01-02 c3 b 0.0
14 2023-01-02 d4 b 1.0
15 2023-01-09 d4 b 0.0
16 2023-01-09 a1 b 0.0
.......