Starting from an imported df from excel like that:
Code | Time | Rev |
---|---|---|
AAA | 5 | 3 |
AAA | 3 | 2 |
AAA | 6 | 1 |
BBB | 10 | 2 |
BBB | 5 | 1 |
I want to add a new column like that evidence the last revision:
Code | Time | Rev | Last |
---|---|---|---|
AAA | 5 | 3 | OK |
AAA | 3 | 2 | NOK |
AAA | 6 | 1 | NOK |
BBB | 10 | 2 | OK |
BBB | 5 | 1 | NOK |
The df is already sorted by 'Code' and 'Rev'
df= df.sort_values(['Code', 'Rev'],
ascending = [True,False])
I thought to evaluate the column 'Code', if the value in column Code is equal to the value in upper row I must have NOK in the new column.
Unfortunately, I am not able to write it in python
CodePudding user response:
You can do:
#Create a column called 'Last' with 'NOK' values
df['Last'] = 'NOK'
#Skipping sorting because you say df is already sorted.
#Then locate the first row in each group and change its value to 'OK'
df.loc[df.groupby('Code', as_index=False).nth(0).index, 'Last'] = 'OK'
CodePudding user response:
You can use pandas.groupby.cumcount and set every first row of group to 'OK'.
dict_ = {
'Code': ['AAA', 'AAA', 'AAA', 'BBB', 'BBB'],
'Time': [5, 3, 6, 10, 5],
'Rev': [3, 2, 1, 2, 1],
}
df = pd.DataFrame(dict_)
df['Last'] = 'NOK'
df.loc[df.groupby('Code').cumcount() == 0,'Last']='OK'
This gives us the expected output:
df
Code Time Rev Last
0 AAA 5 3 OK
1 AAA 3 2 NOK
2 AAA 6 1 NOK
3 BBB 10 2 OK
4 BBB 5 1 NOK
or you can try fetching the head of each group and set the value to OK
for it.
df.loc[df.groupby('Code').head(1).index, 'Last'] = 'OK'
which gives us the same thing
df
Code Time Rev Last
0 AAA 5 3 OK
1 AAA 3 2 NOK
2 AAA 6 1 NOK
3 BBB 10 2 OK
4 BBB 5 1 NOK