I'm trying to get the highest amount for each group using the Pandas library.
An additional condition is to obtain the maximum amount compared to data from 180 days prior to the date in the DATE column.
Below is the data frame I have.
df = pd.DataFrame({'num': ["50110-374","50110-374","50110-374","50110-374","50110-374","50110-374","50110-374","50110-374","50110-374","50110-374","50110-3421","50110-3421","50110-3421","50110-3421","50110-3421","50110-3421","50110-3421"],
'date': ["2017-11-24","2018-02-08","2018-03-08","2018-03-17","2018-04-11","2018-04-16","2018-05-05","2018-06-04","2018-06-16","2018-07-13","2019-01-28","2019-02-10","2019-03-16","2019-03-16","2019-06-07","2019-06-30", "2022-06-30"],
'type':["39","39","39","39","39","39","39","39","39","39","73","73","73","73","73","73","73"],
'price':[17000,12500,14000,14000,18000,13000,14250,15000,12900,15000,35500,34500,35000,37000,33300,34800, 32000]})`
` num date type price
0 50110-374 2017-11-24 39 17000
1 50110-374 2018-02-08 39 12500
2 50110-374 2018-03-08 39 14000
3 50110-374 2018-03-17 39 14000
4 50110-374 2018-04-11 39 18000
5 50110-374 2018-04-16 39 13000
6 50110-374 2018-05-05 39 14250
7 50110-374 2018-06-04 39 15000
8 50110-374 2018-06-16 39 12900
9 50110-374 2018-07-13 39 15000
10 50110-3421 2019-01-28 73 35500
11 50110-3421 2019-02-10 73 34500
12 50110-3421 2019-03-16 73 35000
13 50110-3421 2019-03-16 73 37000
14 50110-3421 2019-06-07 73 33300
15 50110-3421 2019-06-30 73 34800
16 50110-3421 2022-06-30 73 32000`
Below is the desired MAX column as a result.
I wish 'max' columns like this.
(index 16 max value is 32000 due to 180 days calculation)
` num date type price max
0 50110-374 2017-11-24 39 17000 NaN
1 50110-374 2018-02-08 39 12500 17000.0
2 50110-374 2018-03-08 39 14000 17000.0
3 50110-374 2018-03-17 39 14000 17000.0
4 50110-374 2018-04-11 39 18000 18000.0
5 50110-374 2018-04-16 39 13000 18000.0
6 50110-374 2018-05-05 39 14250 18000.0
7 50110-374 2018-06-04 39 15000 18000.0
8 50110-374 2018-06-16 39 12900 18000.0
9 50110-374 2018-07-13 39 15000 18000.0
10 50110-3421 2019-01-28 73 35500 NaN
11 50110-3421 2019-02-10 73 34500 35500.0
12 50110-3421 2019-03-16 73 35000 35500.0
13 50110-3421 2019-03-16 73 37000 37000.0
14 50110-3421 2019-06-07 73 33300 37000.0
15 50110-3421 2019-06-30 73 34800 37000.0
16 50110-3421 2022-06-30 73 32000 32000.0`
Thank you for reading it.
I tryed this function. It worked, but working time too long.
def maxDeal(date):
testDate = date
dateIndex = totalMonthList.index(testDate)
testRange = totalMonthList[dateIndex-720:dateIndex 1]
tmpCdDf = priceApi[priceApi['date'] == testDate][['num','type','date']]
tmpCdLst = list(tmpCdDf['num'].drop_duplicates())
maxDf = df[(df['num'].isin(tmpCdLst)) & (df['date'].isin(testRange))].groupby(['date','type'])['price'].max().reset_index()
tmpCdDf = pd.merge(tmpCdDf,maxDf, how='left', on=['num','type'] )
maxValue = list(tmpCdDf['price'])
df.loc[df['date'] == date, 'max'] = maxValue
CodePudding user response:
Step1
change date column to datetime and make 180d series
df['date'] = pd.to_datetime(df['date'])
s = df['date'] pd.Timedelta('180d')
Step2
apply
apply function by row
, when axis=1
make condition & apply
where
func and find max
condition:
- index must be less or equal than index of
row
- date 180d must be less or equal than date of
row
- type is same with type of
row
df.apply(lambda x: df['price'].where((df.index <= x.name) & s.ge(x['date']) & df['type'].eq(x['type'])).max(), axis=1)
result
0 17000.0
1 17000.0
2 17000.0
3 17000.0
4 18000.0
5 18000.0
6 18000.0
7 18000.0
8 18000.0
9 18000.0
10 35500.0
11 35500.0
12 35500.0
13 37000.0
14 37000.0
15 37000.0
16 32000.0
dtype: float64
Step3
make result to max
column
df['max'] = df.apply(lambda x: df['price'].where((df.index <= x.name) & s.ge(x['date']) & df['type'].eq(x['type'])).max(), axis=1)
df
num date type price max
0 50110-374 2017-11-24 39 17000 17000.0
1 50110-374 2018-02-08 39 12500 17000.0
2 50110-374 2018-03-08 39 14000 17000.0
3 50110-374 2018-03-17 39 14000 17000.0
4 50110-374 2018-04-11 39 18000 18000.0
5 50110-374 2018-04-16 39 13000 18000.0
6 50110-374 2018-05-05 39 14250 18000.0
7 50110-374 2018-06-04 39 15000 18000.0
8 50110-374 2018-06-16 39 12900 18000.0
9 50110-374 2018-07-13 39 15000 18000.0
10 50110-3421 2019-01-28 73 35500 35500.0
11 50110-3421 2019-02-10 73 34500 35500.0
12 50110-3421 2019-03-16 73 35000 35500.0
13 50110-3421 2019-03-16 73 37000 37000.0
14 50110-3421 2019-06-07 73 33300 37000.0
15 50110-3421 2019-06-30 73 34800 37000.0
16 50110-3421 2022-06-30 73 32000 32000.0
Full Code
df['date'] = pd.to_datetime(df['date'])
s = df['date'] pd.Timedelta('180d')
df['max'] = df.apply(lambda x: df['price'].where((df.index <= x.name) & s.ge(x['date']) & df['type'].eq(x['type'])).max(), axis=1)
CodePudding user response:
you can try to Use groupby transform:
df["max"] = df.groupby('num')["price"].transform('max')