The goal is to put the current highest digit in the new column increasing row by row in a given group of letters. The expected, correct value, as a result formula, was entered by me manually in the column "col_ok". The only thing I have achieved so far is assigning the highest value to a given group and this result is in the fourth column called "cumulatively". However, for the first row in the "A" group it is not true, because the correct value according to the assumptions described is: "1" Similarly, the values in the second and third rows. Only the value in the fourth row is true, but the value in the fifth row is not. Forgive me the inconsistency of my post, I'm not an IT specialist and I don't know English. Thanks in advance for your support.
df = pd.read_csv('C:/Users/.../a.csv',names=['group_letter', 'digit', 'col_ok'] , index_col=0,)
df = df.assign(cumulatively = df.groupby('group_letter')['col_ok'].transform('max'))
print(df)
group_letter digit col_ok cumulatively
A 1 1 5
A 3 3 5
A 2 2 5
A 5 5 5
A 1 5 5
B 1 1 3
B 2 2 3
B 1 2 3
B 1 2 3
B 3 3 3
C 5 5 6
C 6 6 6
C 1 6 6
C 2 6 6
C 3 6 6
D 4 4 7
D 3 4 7
D 2 4 7
D 5 5 7
D 7 7 7
CodePudding user response:
IIUC use:
df = df.assign(cumulatively = df.groupby('group_letter')['col_ok'].cummax())