I have a timeseries data in the following format:
| quote_datetime | Moneyness | underlying_bid | askC | askP | bidC | bidP | fwd_premium_abs|
|-----------------|-----------|-----------------|------|-----|------|------|------------|
| 2020-02-25 14:10:00 | 0.980861244 |3134.6 | 73.8 | 10.8 |66.5 | 10.4 |63.4|
| 2020-02-25 14:10:00 | 0.990861244 |3134.6 | 73.4 | 11.8 |63.5 | 11.4 |65.4|
| 2020-02-25 14:11:00 | 0.990861244 |3135.6 | 70.4 | 12.8 |61.5 | 14.4 |64.4|
| 2020-02-25 14:11:00 | 0.990861244 |3135.6 | 72.4 | 10.1 |60.1 | 12.4 |67.4|
for each unique value in quote_datetime,I need minimum Fwd_premium_abs i tried doing it using for loop
for j in df['quote_datetime'].unique():
temp =df[df['quote_datetime']==j]['fwd_premium_abs'].min()
but this is a very inefficient way of doing it and computationally expensive if doing over huge dataset. What are the optimal ways of doing it?
here is a small part of data in dictionary format
{'strike': {0: 3075.0, 1: 3075.0, 2: 3075.0, 3: 3075.0, 4: 3075.0},
'Date': {0: datetime.date(2020, 2, 25),
1: datetime.date(2020, 2, 25),
2: datetime.date(2020, 2, 25),
3: datetime.date(2020, 2, 25),
4: datetime.date(2020, 2, 25)},
'quote_datetime': {0: Timestamp('2020-02-25 14:10:00'),
1: Timestamp('2020-02-25 14:12:00'),
2: Timestamp('2020-02-25 14:19:00'),
3: Timestamp('2020-02-25 14:20:00'),
4: Timestamp('2020-02-25 14:22:00')},
'Moneyness': {0: 0.9808612440191388,
1: 0.9808612440191388,
2: 0.9808612440191388,
3: 0.9808612440191388,
4: 0.9808612440191388},
'underlying_bid': {0: 3134.6, 1: 3135.8, 2: 3137.29, 3: 3136.91, 4: 3136.99},
'askC': {0: 73.8, 1: 74.4, 2: 76.7, 3: 74.8, 4: 74.2},
'askP': {0: 10.8, 1: 10.9, 2: 10.5, 3: 10.7, 4: 10.7},
'bidC': {0: 66.5, 1: 69.1, 2: 70.1, 3: 71.7, 4: 71.2},
'bidP': {0: 10.4, 1: 10.3, 2: 9.4, 3: 10.2, 4: 10.2},
'fwd_premium_abs': {0: 63.4, 1: 64.10000000000001, 2: 67.3, 3: 64.6, 4: 64.0}}
CodePudding user response:
Using groupby
on the 'quote_datetime' column to find the minimum in 'fwd_premium_abs'.
df.groupby('quote_datetime')['fwd_premium_abs'].min()
output
quote_datetime
2020-02-25 14:10:00 63.4
2020-02-25 14:12:00 64.1
2020-02-25 14:19:00 67.3
2020-02-25 14:20:00 64.6
2020-02-25 14:22:00 64.0
Is this what you were looking for?
Since you provided all unique datetimes to begin with, I changed a datetime to demonstrate what this looks like with a repeated Timestamp (now at row indices 1 and 2)
>>> df.iat[2,2] = Timestamp('2020-02-25 14:12:00')
>>> df
strike Date quote_datetime Moneyness underlying_bid askC askP bidC bidP fwd_premium_abs
0 3075.0 2020-02-25 2020-02-25 14:10:00 0.980861 3134.60 73.8 10.8 66.5 10.4 63.4
1 3075.0 2020-02-25 2020-02-25 14:12:00 0.980861 3135.80 74.4 10.9 69.1 10.3 64.1
2 3075.0 2020-02-25 2020-02-25 14:12:00 0.980861 3137.29 76.7 10.5 70.1 9.4 67.3
3 3075.0 2020-02-25 2020-02-25 14:20:00 0.980861 3136.91 74.8 10.7 71.7 10.2 64.6
4 3075.0 2020-02-25 2020-02-25 14:22:00 0.980861 3136.99 74.2 10.7 71.2 10.2 64.0
df.groupby('quote_datetime')['fwd_premium_abs'].min()
output
quote_datetime
2020-02-25 14:10:00 63.4
2020-02-25 14:12:00 64.1
2020-02-25 14:20:00 64.6
2020-02-25 14:22:00 64.0
CodePudding user response:
You can use .transform
after grouping if you want your dataframe to stay in the same shape:
df['fwd_premium_abs'] = df.groupby('quote_datetime')['fwd_premium_abs'].transform(min)
All rows with the same 'quote_datetime'
will have the same value in the column 'fwd_premium_abs'
. This value will be the minimum of their previous values in this column.
Edit:
If you want to select the rows:
df.loc[df.groupby('quote_datetime')['fwd_premium_abs'].idxmin()]