I have a Df like this:
date_from | date_to | item_id | VALUE_NEW | VALUE_OLD | cost_var |
---|---|---|---|---|---|
1/1/1900 00:00:00 | 11/3/2022 15:31:18 | 452953 | 5366,46 | 4024,71 | 33.34% |
11/3/2022 15:31:18 | 1/1/2200 00:00:00 | 452953 | 9122,57 | 5366,46 | 69.99% |
1/1/1900 00:00:00 | 11/3/2022 15:31:18 | 452954 | 5366,46 | 4024,71 | 33.34% |
11/3/2022 15:31:18 | 1/1/2200 00:00:00 | 452954 | 9122,57 | 5366,46 | 69.99% |
1/1/1900 00:00:00 | 21/7/2021 16:30:46 | 452961 | 6170,98 | 4024,71 | 53.33% |
21/7/2021 16:30:46 | 11/3/2022 15:31:09 | 452961 | 5312 | 6170,98 | 13.92% |
11/3/2022 15:31:09 | 1/1/2200 00:00:00 | 452961 | 9122,57 | 5312 | 71.74% |
1/1/1900 00:00:00 | 13/10/2021 14:39:55 | 801286 | 4052,1 | 1332,8 | 204.03% |
13/10/2021 14:39:55 | 13/10/2021 14:43:09 | 801286 | 4,4732 | 4052,1 | 99.89% |
13/10/2021 14:43:09 | 3/2/2022 17:16:23 | 801286 | 4473,2 | 4,4732 | 99900.00% |
3/2/2022 17:16:23 | 1/1/2200 00:00:00 | 801286 | 4946,8 | 4473,2 | 10.59% |
I need to check each item_id, and get the last row where cost_var is >60%. If it's the last row, that's ok, but if there is a next one, and it is <60%, I have to drop the last row>60%. Output should look like this:
date_from | date_to | item_id | VALUE_NEW | VALUE_OLD | cost_var |
---|---|---|---|---|---|
11/3/2022 15:31:18 | 1/1/2200 00:00:00 | 452953 | 9122,57 | 5366,46 | 69.99% |
11/3/2022 15:31:18 | 1/1/2200 00:00:00 | 452954 | 9122,57 | 5366,46 | 69.99% |
11/3/2022 15:31:09 | 1/1/2200 00:00:00 | 452961 | 9122,57 | 5312 | 71.74% |
Item 802186 returned no value, because last row>60% (99900.00%) has a next row and cost_var<60% (10.59%)...Is it possible to do? I couldn't find a way to solve it.
CodePudding user response:
Try this
# read data
df = pd.read_clipboard()
# select the last row of each item_id and only select ones where cost_var > 60%
df.groupby(df.item_id, as_index=False).last().query("cost_var.str.rstrip('%').astype('float')>60", engine='python')