Problem when resampling OHLC data from 1 min to 15min interval-CodePudding

I have a data set containing Open, High, Low, Close prices for every 1 minute,for a one year interval, like below:

                           Open     High        Low          Close
Date_time               
2020-01-01 17:00:00     1.12120     1.12121     1.12117     1.12120
2020-01-01 17:01:00     1.12106     1.12135     1.12106     1.12135
2020-01-01 17:02:00     1.12136     1.12139     1.12136     1.12139
2020-01-01 17:03:00     1.12135     1.12135     1.12120     1.12122
2020-01-01 17:04:00     1.12122     1.12125     1.12122     1.12125

There is no NaN or Null in this data set and I checked it with:

df.isnull().sum()

output:

Open     0
High     0
Low      0
Close    0
dtype: int64

I want to resample these to 15min intervals, by using:

dff = df.resample('15min').agg({
    'Open': 'first',
    'High': 'max',
    'Low': 'min',
    'Close': 'last'})

the output will be:

                           Open     High        Low        Close
Date_time               
2020-01-01 17:00:00     1.12120     1.12139     1.12106     1.12127
2020-01-01 17:15:00     1.12127     1.12154     1.12127     1.12146
2020-01-01 17:30:00     1.12147     1.12156     1.12138     1.12154
2020-01-01 17:45:00     1.12155     1.12166     1.12143     1.12143
2020-01-01 18:00:00     1.12143     1.12218     1.12142     1.12202

which sounds OK but when I run :

dff.isnull().sum()

the output is:

Open     10121
High     10121
Low      10121
Close    10121
dtype: int64

which means that almost half of the dataset is Null. I searched a lot but did't find anything useful. Is there another way for resampling the dataset into 15min intervals? and where is the problem that causes Null values in dataset?

More info: size of dff is :

dff.count()
output:
Open     24919
High     24919
Low      24919
Close    24919
dtype: int64

Update: I just used

dff = df.groupby(pd.Grouper(freq='15Min')).agg({"Open": "first", 
                                             "Low": "min", 
                                             "High": "max",
                                             "Close": "last"})

but still half of the dataset is Null. I attached dataset csv file here

update#2 : thanks to @not_speshal, I found out that the size of df is (372335,4) but the size of df.resample("1min").last() is (372275,4) which I think means that I don't have data for some of the 1min intervals, How can I fix this?

CodePudding user response：

Your csv file doesn't have data for many days (probably when the markets are closed?). You can instead resample and then only keep rows that exist in the original data like so:

df = pd.read_csv("EURUSD_2020.csv")
df["datetime"] = pd.to_datetime(df["Date"] " " df["time"])
df = df.drop(["Date", "time"], axis=1).set_index("datetime").astype("float")

dff = df.resample("15min").agg({'Open':'first', 'High':'max', 'Low':'min', 'Close': 'last'})
dff = dff[dff.index.isin(df.index)]