My database looks like this after I pull from the API. Each row is 1 minute. But not every minute is recorded (There is no 09:51:00 in this database).
ticker date time vol vwap open high low close lbh lah trades
0 AACG 2022-01-06 09:30:00 33042 1.8807 1.8900 1.9200 1.8700 1.9017 0.0 0.0 68
1 AACG 2022-01-06 09:31:00 5306 1.9073 1.9100 1.9200 1.8801 1.9100 0.0 0.0 27
2 AACG 2022-01-06 09:32:00 3496 1.8964 1.9100 1.9193 1.8800 1.8900 0.0 0.0 17
3 AACG 2022-01-06 09:33:00 5897 1.9377 1.8900 1.9500 1.8900 1.9500 0.0 0.0 15
4 AACG 2022-01-06 09:34:00 1983 1.9362 1.9200 1.9499 1.9200 1.9200 0.0 0.0 9
5 AACG 2022-01-06 09:35:00 10725 1.9439 1.9400 1.9600 1.9201 1.9306 0.0 0.0 87
6 AACG 2022-01-06 09:36:00 5942 1.9380 1.9307 1.9400 1.9300 1.9400 0.0 0.0 48
7 AACG 2022-01-06 09:37:00 5759 1.9428 1.9659 1.9659 1.9400 1.9500 0.0 0.0 11
8 AACG 2022-01-06 09:38:00 4855 1.9424 1.9500 1.9500 1.9401 1.9495 0.0 0.0 10
9 AACG 2022-01-06 09:39:00 6275 1.9514 1.9500 1.9700 1.9450 1.9700 0.0 0.0 14
10 AACG 2022-01-06 09:40:00 13695 2.0150 1.9799 2.0500 1.9749 2.0200 0.0 0.0 59
11 AACG 2022-01-06 09:41:00 3252 2.0209 2.0275 2.0300 2.0200 2.0200 0.0 0.0 14
12 AACG 2022-01-06 09:42:00 12082 2.0117 2.0300 2.0400 1.9800 1.9900 0.0 0.0 41
13 AACG 2022-01-06 09:43:00 5148 1.9802 1.9800 1.9999 1.9750 1.9999 0.0 0.0 11
14 AACG 2022-01-06 09:44:00 2764 1.9927 1.9901 1.9943 1.9901 1.9943 0.0 0.0 5
15 AACG 2022-01-06 09:45:00 2379 1.9576 1.9601 1.9601 1.9201 1.9201 0.0 0.0 10
16 AACG 2022-01-06 09:46:00 8762 1.9852 1.9550 1.9900 1.9550 1.9900 0.0 0.0 35
17 AACG 2022-01-06 09:47:00 1343 1.9704 1.9700 1.9738 1.9700 1.9701 0.0 0.0 5
18 AACG 2022-01-06 09:48:00 17080 1.9696 1.9700 1.9800 1.9600 1.9600 0.0 0.0 9
19 AACG 2022-01-06 09:49:00 9004 1.9600 1.9600 1.9600 1.9600 1.9600 0.0 0.0 9
20 AACG 2022-01-06 09:50:00 9224 1.9603 1.9600 1.9613 1.9600 1.9613 0.0 0.0 4
21 AACG 2022-01-06 09:52:00 16914 1.9921 1.9800 2.0400 1.9750 2.0399 0.0 0.0 67
22 AACG 2022-01-06 09:53:00 4665 1.9866 1.9900 2.0395 1.9801 1.9900 0.0 0.0 37
23 AACG 2022-01-06 09:55:00 2107 2.0049 1.9900 2.0100 1.9900 2.0099 0.0 0.0 10
24 AACG 2022-01-06 09:56:00 3003 2.0028 2.0000 2.0099 2.0000 2.0099 0.0 0.0 23
25 AACG 2022-01-06 09:57:00 8489 2.0272 2.0100 2.0400 2.0100 2.0300 0.0 0.0 34
26 AACG 2022-01-06 09:58:00 6050 2.0155 2.0300 2.0300 2.0150 2.0150 0.0 0.0 6
27 AACG 2022-01-06 09:59:00 61623 2.0449 2.0300 2.0700 2.0300 2.0699 0.0 0.0 83
28 AACG 2022-01-06 10:00:00 19699 2.0856 2.0699 2.1199 2.0600 2.1100 0.0 0.0 54
The code I'm using to "find missing rows" is stuck in a loop:
h = 9
m = 30
row = 0
while df['time'][row] < datetime.time(10,00):
if df['time'][row] == datetime.time(h,m):
m = m 1
row = row 1
if m == 60:
m = 00
h = h 1
break
if row >= 40:
break
else:
missingrow = {df.columns[0]: df.iloc[1,0], df.columns[1]: df.iloc[1,1], df.columns[2]:datetime.time(h,m), df.columns[3]:0, df.columns[4]:0, df.columns[5]:0, df.columns[6]:0, df.columns[7]:0, df.columns[8]:0, df.columns[9]:0, df.columns[10]:0, df.columns[11]:0,}
df = df.append(missingrow, ignore_index = True)
The "missingrow" variable is suppose to be an empty row with an updated time value inserted into the database.
If the code was correct, then this would be inserted into the DataFrame:
ticker date time vol vwap open high low close lbh lah trades
21 AACG 2022-01-06 09:51:00 0 0.00 0.00 0.00 0.00 0.00 0.0 0.0 0
Can you tell what's wrong with this code?
CodePudding user response:
Instead of iterating over the DataFrame rows, use pandas built-in methods.
(i) Convert df['time']
to Timedelta objects with type 'timedelta64[m]'
(ii) set_index
with time_delta and reindex
with the entire range of available minutes.
time = pd.to_timedelta(df['time'].astype(str)).astype('timedelta64[m]')
out = df.set_index(time).reindex(np.arange(time[0], time.iloc[len(df)-1] 1)).reset_index(drop=True)
Output:
ticker date time vol vwap open high low close lbh lah trades
0 AACG 2022-01-06 09:30:00 33042.0 1.8807 1.8900 1.9200 1.8700 1.9017 0.0 0.0 68.0
1 AACG 2022-01-06 09:31:00 5306.0 1.9073 1.9100 1.9200 1.8801 1.9100 0.0 0.0 27.0
2 AACG 2022-01-06 09:32:00 3496.0 1.8964 1.9100 1.9193 1.8800 1.8900 0.0 0.0 17.0
3 AACG 2022-01-06 09:33:00 5897.0 1.9377 1.8900 1.9500 1.8900 1.9500 0.0 0.0 15.0
4 AACG 2022-01-06 09:34:00 1983.0 1.9362 1.9200 1.9499 1.9200 1.9200 0.0 0.0 9.0
...
20 AACG 2022-01-06 09:50:00 9224.0 1.9603 1.9600 1.9613 1.9600 1.9613 0.0 0.0 4.0
21 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
22 AACG 2022-01-06 09:52:00 16914.0 1.9921 1.9800 2.0400 1.9750 2.0399 0.0 0.0 67.0
23 AACG 2022-01-06 09:53:00 4665.0 1.9866 1.9900 2.0395 1.9801 1.9900 0.0 0.0 37.0
24 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
25 AACG 2022-01-06 09:55:00 2107.0 2.0049 1.9900 2.0100 1.9900 2.0099 0.0 0.0 10.0
...
So the above data shows that the data for 09:51 and 09:54 are missing.
Now you can simply assign the missing data to NaN rows.
CodePudding user response:
Combine date and time into a single date column, generate the missing rows with complete from pyjanitor, and adjust the values for relevant columns:
# pip install pyjanitor
import janitor
import pandas as pd
temp = df.copy()
temp['date_time'] = pd.to_datetime(temp['date'] ' ' temp['time'])
# create mapping for all possible dates, minutes wise
mapp = {'date_time' : pd.date_range(temp.date_time.min(),
temp.date_time.max(),
freq='min')}
# create dataframe with missing rows
(temp.complete('ticker', mapp, sort = True)
.assign(date = lambda df: df.date_time.dt.date,
time = lambda df: df.date_time.astype(str).str.split().str[-1])
.drop(columns='date_time')
)
ticker date time vol vwap open high low close lbh lah trades
0 AACG 2022-01-06 09:30:00 33042.0 1.8807 1.8900 1.9200 1.8700 1.9017 0.0 0.0 68.0
1 AACG 2022-01-06 09:31:00 5306.0 1.9073 1.9100 1.9200 1.8801 1.9100 0.0 0.0 27.0
2 AACG 2022-01-06 09:32:00 3496.0 1.8964 1.9100 1.9193 1.8800 1.8900 0.0 0.0 17.0
3 AACG 2022-01-06 09:33:00 5897.0 1.9377 1.8900 1.9500 1.8900 1.9500 0.0 0.0 15.0
4 AACG 2022-01-06 09:34:00 1983.0 1.9362 1.9200 1.9499 1.9200 1.9200 0.0 0.0 9.0
5 AACG 2022-01-06 09:35:00 10725.0 1.9439 1.9400 1.9600 1.9201 1.9306 0.0 0.0 87.0
6 AACG 2022-01-06 09:36:00 5942.0 1.9380 1.9307 1.9400 1.9300 1.9400 0.0 0.0 48.0
7 AACG 2022-01-06 09:37:00 5759.0 1.9428 1.9659 1.9659 1.9400 1.9500 0.0 0.0 11.0
8 AACG 2022-01-06 09:38:00 4855.0 1.9424 1.9500 1.9500 1.9401 1.9495 0.0 0.0 10.0
9 AACG 2022-01-06 09:39:00 6275.0 1.9514 1.9500 1.9700 1.9450 1.9700 0.0 0.0 14.0
10 AACG 2022-01-06 09:40:00 13695.0 2.0150 1.9799 2.0500 1.9749 2.0200 0.0 0.0 59.0
11 AACG 2022-01-06 09:41:00 3252.0 2.0209 2.0275 2.0300 2.0200 2.0200 0.0 0.0 14.0
12 AACG 2022-01-06 09:42:00 12082.0 2.0117 2.0300 2.0400 1.9800 1.9900 0.0 0.0 41.0
13 AACG 2022-01-06 09:43:00 5148.0 1.9802 1.9800 1.9999 1.9750 1.9999 0.0 0.0 11.0
14 AACG 2022-01-06 09:44:00 2764.0 1.9927 1.9901 1.9943 1.9901 1.9943 0.0 0.0 5.0
15 AACG 2022-01-06 09:45:00 2379.0 1.9576 1.9601 1.9601 1.9201 1.9201 0.0 0.0 10.0
16 AACG 2022-01-06 09:46:00 8762.0 1.9852 1.9550 1.9900 1.9550 1.9900 0.0 0.0 35.0
17 AACG 2022-01-06 09:47:00 1343.0 1.9704 1.9700 1.9738 1.9700 1.9701 0.0 0.0 5.0
18 AACG 2022-01-06 09:48:00 17080.0 1.9696 1.9700 1.9800 1.9600 1.9600 0.0 0.0 9.0
19 AACG 2022-01-06 09:49:00 9004.0 1.9600 1.9600 1.9600 1.9600 1.9600 0.0 0.0 9.0
20 AACG 2022-01-06 09:50:00 9224.0 1.9603 1.9600 1.9613 1.9600 1.9613 0.0 0.0 4.0
21 AACG 2022-01-06 09:51:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN
22 AACG 2022-01-06 09:52:00 16914.0 1.9921 1.9800 2.0400 1.9750 2.0399 0.0 0.0 67.0
23 AACG 2022-01-06 09:53:00 4665.0 1.9866 1.9900 2.0395 1.9801 1.9900 0.0 0.0 37.0
24 AACG 2022-01-06 09:54:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN
25 AACG 2022-01-06 09:55:00 2107.0 2.0049 1.9900 2.0100 1.9900 2.0099 0.0 0.0 10.0
26 AACG 2022-01-06 09:56:00 3003.0 2.0028 2.0000 2.0099 2.0000 2.0099 0.0 0.0 23.0
27 AACG 2022-01-06 09:57:00 8489.0 2.0272 2.0100 2.0400 2.0100 2.0300 0.0 0.0 34.0
28 AACG 2022-01-06 09:58:00 6050.0 2.0155 2.0300 2.0300 2.0150 2.0150 0.0 0.0 6.0
29 AACG 2022-01-06 09:59:00 61623.0 2.0449 2.0300 2.0700 2.0300 2.0699 0.0 0.0 83.0
30 AACG 2022-01-06 10:00:00 19699.0 2.0856 2.0699 2.1199 2.0600 2.1100 0.0 0.0 54.0