Home > Enterprise >  Using Pandas to assign specific values
Using Pandas to assign specific values

Time:04-15

I have the following dataframe:

data = {'id': [1, 2, 3, 4, 5, 6, 7, 8],
        'stat': ['ordered', 'unconfirmed', 'ordered', 'unknwon', 'ordered', 'unconfirmed', 'ordered', 'back'],
        'date': ['2021', '2022', '2023', '2024', '2025','2026','2027', '1990']    
}
df = pd.DataFrame(data)
df

enter image description here

I am trying to get the following data frame:

enter image description here

Unfortunate I am not successful so far and I used the following commands (for loops) for only stat==ordered:

y0 = np.zeros((len(df), 8), dtype=int)
y1 = [2021, 2022, 2023, 2024, 2025, 2026, 2027, 1990]
y2 = [2022, 2023, 2024, 2025, 2026, 2027, 1990]
y3 = [2023, 2024, 2025, 2026, 2027, 1990]
y4 = [2024, 2025, 2026, 2027, 1990]
y5 = [2025, 2026, 2027, 1990]
y6 = [2026, 2027, 1990]
y7 = [2027, 1990]
y8 = [1990]
if stat=='ordered':
    for i in df['id']:
        for j in y1:
            if df.loc[i].at['date'] in y1:
                  y0[i][y1.index(j)] = 1
            else:
                  y0[i][y1.index(j)] = 0

            if df.loc[i].at['date'] in y2:
                  y0[i][y1.index(j)] = 1
            else:
                  y0[i][y1.index(j)] = 0

            if df.loc[i].at['date'] in y3:
                  y0[i][y1.index(j)] = 1
            else:
                  y0[i][y1.index(j)] = 0

            if df.loc[i].at['date'] in y4:
                  y0[i][y1.index(j)] = 1
            else:
                  y0[i][y1.index(j)] = 0

            if df.loc[i].at['date'] in y5:
                  y0[i][y1.index(j)] = 1
            else:
                  y0[i][y1.index(j)] = 0

            if df.loc[i].at['date'] in y6:
                  y0[i][y1.index(j)] = 1
            else:
                  y0[i][y1.index(j)] = 0

            if df.loc[i].at['date'] in y7:
                  y0[i][y1.index(j)] = 1
            else:
                  y0[i][y1.index(j)] = 0

            if df.loc[i].at['date'] not in y8:
                  y0[i][y1.index(j)] = 0
            else:
                  y0[i][y1.index(j)] = 0

But unfortunately it did not returned the expected solution and beside that it takes a very long time to do the calculation. I tried to use gruopby, but it could not fgure out either how to use it perporly since it is faster than using for loops. Any idea would be very appreiciated.

CodePudding user response:

IIUC:

df.join(
    pd.get_dummies(df.date).cumsum(axis=1).mul(
        [1, 2, 1, 3, 1, 2, 1, 0], axis=0
    ).astype(int)

)

   id         stat  date  1990  2021  2022  2023  2024  2025  2026  2027
0   1      ordered  2021     0     1     1     1     1     1     1     1
1   2  unconfirmed  2022     0     0     2     2     2     2     2     2
2   3      ordered  2023     0     0     0     1     1     1     1     1
3   4      unknwon  2024     0     0     0     0     3     3     3     3
4   5      ordered  2025     0     0     0     0     0     1     1     1
5   6  unconfirmed  2026     0     0     0     0     0     0     2     2
6   7      ordered  2027     0     0     0     0     0     0     0     1
7   8         back  1990     0     0     0     0     0     0     0     0
  • Related