I have the following dataframe:
data = {'id': [1, 2, 3, 4, 5, 6, 7, 8],
'stat': ['ordered', 'unconfirmed', 'ordered', 'unknwon', 'ordered', 'unconfirmed', 'ordered', 'back'],
'date': ['2021', '2022', '2023', '2024', '2025','2026','2027', '1990']
}
df = pd.DataFrame(data)
df
I am trying to get the following data frame:
Unfortunate I am not successful so far and I used the following commands (for loops) for only stat==ordered
:
y0 = np.zeros((len(df), 8), dtype=int)
y1 = [2021, 2022, 2023, 2024, 2025, 2026, 2027, 1990]
y2 = [2022, 2023, 2024, 2025, 2026, 2027, 1990]
y3 = [2023, 2024, 2025, 2026, 2027, 1990]
y4 = [2024, 2025, 2026, 2027, 1990]
y5 = [2025, 2026, 2027, 1990]
y6 = [2026, 2027, 1990]
y7 = [2027, 1990]
y8 = [1990]
if stat=='ordered':
for i in df['id']:
for j in y1:
if df.loc[i].at['date'] in y1:
y0[i][y1.index(j)] = 1
else:
y0[i][y1.index(j)] = 0
if df.loc[i].at['date'] in y2:
y0[i][y1.index(j)] = 1
else:
y0[i][y1.index(j)] = 0
if df.loc[i].at['date'] in y3:
y0[i][y1.index(j)] = 1
else:
y0[i][y1.index(j)] = 0
if df.loc[i].at['date'] in y4:
y0[i][y1.index(j)] = 1
else:
y0[i][y1.index(j)] = 0
if df.loc[i].at['date'] in y5:
y0[i][y1.index(j)] = 1
else:
y0[i][y1.index(j)] = 0
if df.loc[i].at['date'] in y6:
y0[i][y1.index(j)] = 1
else:
y0[i][y1.index(j)] = 0
if df.loc[i].at['date'] in y7:
y0[i][y1.index(j)] = 1
else:
y0[i][y1.index(j)] = 0
if df.loc[i].at['date'] not in y8:
y0[i][y1.index(j)] = 0
else:
y0[i][y1.index(j)] = 0
But unfortunately it did not returned the expected solution and beside that it takes a very long time to do the calculation. I tried to use gruopby, but it could not fgure out either how to use it perporly since it is faster than using for loops. Any idea would be very appreiciated.
CodePudding user response:
IIUC:
df.join(
pd.get_dummies(df.date).cumsum(axis=1).mul(
[1, 2, 1, 3, 1, 2, 1, 0], axis=0
).astype(int)
)
id stat date 1990 2021 2022 2023 2024 2025 2026 2027
0 1 ordered 2021 0 1 1 1 1 1 1 1
1 2 unconfirmed 2022 0 0 2 2 2 2 2 2
2 3 ordered 2023 0 0 0 1 1 1 1 1
3 4 unknwon 2024 0 0 0 0 3 3 3 3
4 5 ordered 2025 0 0 0 0 0 1 1 1
5 6 unconfirmed 2026 0 0 0 0 0 0 2 2
6 7 ordered 2027 0 0 0 0 0 0 0 1
7 8 back 1990 0 0 0 0 0 0 0 0