Home > Software design >  filtering data from data frame with complex criteria in pandas python
filtering data from data frame with complex criteria in pandas python

Time:02-21

This is my dataframe

import pandas as pd

data=pd.DataFrame({'vehicle':['car','car','car','car','car','car','bus','bus','bus','bus','bus','bus','car','car','car','car','car','car','bus','bus','bus','bus','bus','bus'],
'expecteddate':['2/24/2022','2/24/2022','3/15/2022','3/15/2022','4/20/2022','4/20/2022','2/24/2022','2/24/2022','3/15/2022','3/15/2022','4/20/2022','4/20/2022','2/24/2022','2/24/2022','3/15/2022','3/15/2022','4/20/2022','4/20/2022','2/24/2022','2/24/2022','3/15/2022','3/15/2022','4/20/2022','4/20/2022'],'range':[240,240,240,240,240,240,300,300,300,300,300,300,240,240,240,240,240,240,300,300,300,300,300,300],'color':['blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red','blue','red'],'discount':[70,80,90,60,40,50,120,110,130,140,80,90,60,40,50,30,70,45,130,100,140,120,90,30],'date':['2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/18/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022','2/17/2022']})
print(data)

data in dataframe:

   vehicle  expecteddate  range color  discount       date
0      car    2/24/2022    240  blue        70  2/18/2022
1      car    2/24/2022    240   red        80  2/18/2022
2      car    3/15/2022    240  blue        90  2/18/2022
3      car    3/15/2022    240   red        60  2/18/2022
4      car    4/20/2022    240  blue        40  2/18/2022
5      car    4/20/2022    240   red        50  2/18/2022
6      bus    2/24/2022    300  blue       120  2/18/2022
7      bus    2/24/2022    300   red       110  2/18/2022
8      bus    3/15/2022    300  blue       130  2/18/2022
9      bus    3/15/2022    300   red       140  2/18/2022
10     bus    4/20/2022    300  blue        80  2/18/2022
11     bus    4/20/2022    300   red        90  2/18/2022
12     car    2/24/2022    240  blue        60  2/17/2022
13     car    2/24/2022    240   red        40  2/17/2022
14     car    3/15/2022    240  blue        50  2/17/2022
15     car    3/15/2022    240   red        30  2/17/2022
16     car    4/20/2022    240  blue        70  2/17/2022
17     car    4/20/2022    240   red        45  2/17/2022
18     bus    2/24/2022    300  blue       130  2/17/2022
19     bus    2/24/2022    300   red       100  2/17/2022
20     bus    3/15/2022    300  blue       140  2/17/2022
21     bus    3/15/2022    300   red       120  2/17/2022
22     bus    4/20/2022    300  blue        90  2/17/2022
23     bus    4/20/2022    300   red        30  2/17/2022

from this dataframe we have two vehicles ,three expecteddates ,range,two colors ,discount and date. we have to find min value in discount and date at which we got min value in discount column ,into seperate two new columns that is mindisc column and mindate and that should be save in new column based on latest date ,this should filtered based on vehicle,expecteddate,range,color and date

we have to find min value in discount column in two dates(all dates as we have many dates not limited to two dates) 2/18/2022,2/17/2022 based on same color,range,expecteddate and vehicle

finally this min to added to mindisc column at latest date and corresponding date at which min date appeared to mindate column at latest date row

output should look like

 country expecteddate  range color  discount       date  mindisc    mindate
0      car    2/24/2022    240  blue        70  2/18/2022   60       2/17/2022
1      car    2/24/2022    240   red        80  2/18/2022   40       2/17/2022
2      car    3/15/2022    240  blue        90  2/18/2022   50       2/17/2022
3      car    3/15/2022    240   red        60  2/18/2022   30       2/17/2022
4      car    4/20/2022    240  blue        40  2/18/2022   40       2/18/2022
5      car    4/20/2022    240   red        50  2/18/2022   45       2/17/2022
6      bus    2/24/2022    300  blue       120  2/18/2022   120      2/18/2022
7      bus    2/24/2022    300   red       110  2/18/2022   100      2/17/2022
8      bus    3/15/2022    300  blue       130  2/18/2022   130      2/18/2022
9      bus    3/15/2022    300   red       140  2/18/2022   120      2/17/2022
10     bus    4/20/2022    300  blue        80  2/18/2022   80       2/18/2022
11     bus    4/20/2022    300   red        90  2/18/2022   30       2/17/2022
12     car    2/24/2022    240  blue        60  2/17/2022
13     car    2/24/2022    240   red        40  2/17/2022
14     car    3/15/2022    240  blue        50  2/17/2022
15     car    3/15/2022    240   red        30  2/17/2022
16     car    4/20/2022    240  blue        70  2/17/2022
17     car    4/20/2022    240   red        45  2/17/2022
18     bus    2/24/2022    300  blue       130  2/17/2022
19     bus    2/24/2022    300   red       100  2/17/2022
20     bus    3/15/2022    300  blue       140  2/17/2022
21     bus    3/15/2022    300   red       120  2/17/2022
22     bus    4/20/2022    300  blue        90  2/17/2022
23     bus    4/20/2022    300   red        30  2/17/2022

vehicles are not limited to two like car and bus ,it has many vehicles and data is not always have equal rows in vehicle and range columns and date is not limited to two dates

CodePudding user response:

I didn't get the problem very well, but you could try something like:

new_df = df[(df['date'] == '2/18/2022') & (df['color'] == 'blue') & (df['vehicle'] == 'car')]

And then:

new_df['discount'].min()

CodePudding user response:

This is my approach. First, let's group the data as you described to find the rows where the minimum discount is reached:

idx = df.groupby(["color", "range", "expecteddate", "vehicle"])["discount"].transform(min) == df["discount"]
# Use df.loc[idx] to see the rows where the minimum discount is reached

Now, we can add the values of the minimum discount and the respective date to our dataframe by using .loc:

df.loc[idx, "mindisc"] = df.loc[idx, "discount"]
df.loc[idx, "mindate"] = df.loc[idx, "date"]

As you can see, we're only changing the values in the rows where the minimum discount was reached (idx). This is the output of these operations:

   vehicle expecteddate  range color  discount       date mindisc    mindate
0      car    2/24/2022    240  blue        70  2/18/2022    None        NaN
1      car    2/24/2022    240   red        80  2/18/2022    None        NaN
2      car    3/15/2022    240  blue        90  2/18/2022    None        NaN
3      car    3/15/2022    240   red        60  2/18/2022    None        NaN
4      car    4/20/2022    240  blue        40  2/18/2022      40  2/18/2022
5      car    4/20/2022    240   red        50  2/18/2022    None        NaN
6      bus    2/24/2022    300  blue       120  2/18/2022     120  2/18/2022
7      bus    2/24/2022    300   red       110  2/18/2022    None        NaN
8      bus    3/15/2022    300  blue       130  2/18/2022     130  2/18/2022
9      bus    3/15/2022    300   red       140  2/18/2022    None        NaN
10     bus    4/20/2022    300  blue        80  2/18/2022      80  2/18/2022
11     bus    4/20/2022    300   red        90  2/18/2022    None        NaN
12     car    2/24/2022    240  blue        60  2/17/2022      60  2/17/2022
13     car    2/24/2022    240   red        40  2/17/2022      40  2/17/2022
14     car    3/15/2022    240  blue        50  2/17/2022      50  2/17/2022
15     car    3/15/2022    240   red        30  2/17/2022      30  2/17/2022
16     car    4/20/2022    240  blue        70  2/17/2022    None        NaN
17     car    4/20/2022    240   red        45  2/17/2022      45  2/17/2022
18     bus    2/24/2022    300  blue       130  2/17/2022    None        NaN
19     bus    2/24/2022    300   red       100  2/17/2022     100  2/17/2022
20     bus    3/15/2022    300  blue       140  2/17/2022    None        NaN
21     bus    3/15/2022    300   red       120  2/17/2022     120  2/17/2022
22     bus    4/20/2022    300  blue        90  2/17/2022    None        NaN
23     bus    4/20/2022    300   red        30  2/17/2022      30  2/17/2022
  • Related