I have the following code :
import pandas as pd
from dateutil.relativedelta import relativedelta
import yfinance as yf
import datetime
ticker = 'f'
df = yf.download(tickers = ticker, start=datetime.datetime.now()-relativedelta(days=6), end= datetime.datetime.now(), interval="5m", progress = False)
df['day'] = df.index.day
df['pClose'] = df.groupby('day').nth(-2)['Close']
So with df['pClose'] = df.groupby('day').nth(-2)['Close']
I am trying to select the second last row. But it is producing this:
Open High Low Close Adj Close Volume pClose
day
7 13.740 13.77 13.74 13.755 13.755 1451751 NaN
8 13.540 13.56 13.52 13.540 13.540 1199373 NaN
9 13.340 13.35 13.32 13.335 13.335 1399753 NaN
10 12.765 12.78 12.74 12.745 12.745 2398719 NaN
Could you please advise why it is not selecting the second last row?
Edit
To get the previous days's closing price: df = yf.download(tickers = ticker, start=datetime.datetime.now()-relativedelta(days=6), end= datetime.datetime.now(), progress = False)
Where ticker = "f"
Open High Low Close Adj Close Volume
Date
2022-06-07 13.26 13.77 13.19 13.74 13.74 38940300
2022-06-08 13.63 13.85 13.44 13.53 13.53 39441900
2022-06-09 13.51 13.59 13.28 13.28 13.28 30468000
2022-06-10 13.00 13.21 12.63 12.75 12.75 55644400
So on 9th the closing price was 13.28. However this is on a daily interval, in the code above not specifying interval defaults to daily.
Since the dataframe which I am requesting is on 5 min interval.
Open High ... Adj Close Volume
Datetime ...
2022-06-07 09:30:00-04:00 13.2600 13.3800 ... 13.3700 2543171
2022-06-07 09:35:00-04:00 13.3679 13.4200 ... 13.3426 712259
2022-06-07 09:40:00-04:00 13.3500 13.3550 ... 13.2900 509699
2022-06-07 09:45:00-04:00 13.2900 13.3382 ... 13.3300 451413
2022-06-07 09:50:00-04:00 13.3300 13.4100 ... 13.3801 849915
... ... ... ... ...
2022-06-10 15:40:00-04:00 12.8050 12.8100 ... 12.7801 889435
2022-06-10 15:45:00-04:00 12.7850 12.7900 ... 12.7400 1469980
2022-06-10 15:50:00-04:00 12.7450 12.7800 ... 12.7650 1910267
2022-06-10 15:55:00-04:00 12.7650 12.7800 ... 12.7450 2398719
2022-06-10 16:00:00-04:00 12.7500 12.7500 ... 12.7500 0
What I am trying to do groupby all the data on day number with datatime.index.day
and then access the closing price on the second last day. This is where the problem was as when i did
.nth(-2)['Close']` It was not selecting just 1 row, but was returning all the days available
CodePudding user response:
When you assign in pandas, there's this thing called index-matching where the values are assigned to the matching index of the assigned df
.
You code returns:
>>> df.groupby('day').nth(-2)['Close']
day
7 13.755
8 13.540
9 13.335
10 12.745
where the indexes are days. However, the original df
has datetime indexes:
Open High Low Close Adj Close
Datetime
2022-06-07 09:30:00-04:00 13.2600 13.3800 13.1850 13.3700 13.3700
2022-06-07 09:35:00-04:00 13.3679 13.4200 13.2900 13.3426 13.3426
2022-06-07 09:40:00-04:00 13.3500 13.3550 13.2800 13.2900 13.2900
2022-06-07 09:45:00-04:00 13.2900 13.3382 13.2701 13.3300 13.3300
2022-06-07 09:50:00-04:00 13.3300 13.4100 13.3100 13.3801 13.3801
That way, none of them match, so all values end up being NaN
s.
Here, use transform
to keep the same original indexes and broadcast the values:
>>> df.groupby('day').Close.transform('nth', n=-2)
Datetime
2022-06-07 09:30:00-04:00 13.290
2022-06-07 09:35:00-04:00 13.290
2022-06-07 09:40:00-04:00 13.290
2022-06-07 09:45:00-04:00 13.290
2022-06-07 09:50:00-04:00 13.290
...
2022-06-10 15:40:00-04:00 13.205
2022-06-10 15:45:00-04:00 13.205
2022-06-10 15:50:00-04:00 13.205
2022-06-10 15:55:00-04:00 13.205
2022-06-10 16:00:00-04:00 13.205
To get the previous day close, you can just use loc
>>> df.loc[df['day'] == df['day'].max() - 1, 'Close'].tail(1)
CodePudding user response:
Are you okay with resetting the index? Then you can try something as such
ticker = 'f'
df = yf.download(tickers = ticker, start=datetime.datetime.now()-relativedelta(days=6), end= datetime.datetime.now(), interval="5m", progress = False)
df['day'] = df.index.day
df['pClose'] = df.groupby('day')
df = df.reset_index()
df2 = df.iloc[-2]