Home > other >  How to insert row into pandas dataframe if a condition is not met?
How to insert row into pandas dataframe if a condition is not met?

Time:01-14

My database looks like this after I pull from the API. Each row is 1 minute. But not every minute is recorded (There is no 09:51:00 in this database).

  ticker        date    time         vol    vwap    open    high    low    close    lbh lah trades
0   AACG    2022-01-06  09:30:00    33042   1.8807  1.8900  1.9200  1.8700  1.9017  0.0 0.0 68
1   AACG    2022-01-06  09:31:00    5306    1.9073  1.9100  1.9200  1.8801  1.9100  0.0 0.0 27
2   AACG    2022-01-06  09:32:00    3496    1.8964  1.9100  1.9193  1.8800  1.8900  0.0 0.0 17
3   AACG    2022-01-06  09:33:00    5897    1.9377  1.8900  1.9500  1.8900  1.9500  0.0 0.0 15
4   AACG    2022-01-06  09:34:00    1983    1.9362  1.9200  1.9499  1.9200  1.9200  0.0 0.0 9
5   AACG    2022-01-06  09:35:00    10725   1.9439  1.9400  1.9600  1.9201  1.9306  0.0 0.0 87
6   AACG    2022-01-06  09:36:00    5942    1.9380  1.9307  1.9400  1.9300  1.9400  0.0 0.0 48
7   AACG    2022-01-06  09:37:00    5759    1.9428  1.9659  1.9659  1.9400  1.9500  0.0 0.0 11
8   AACG    2022-01-06  09:38:00    4855    1.9424  1.9500  1.9500  1.9401  1.9495  0.0 0.0 10
9   AACG    2022-01-06  09:39:00    6275    1.9514  1.9500  1.9700  1.9450  1.9700  0.0 0.0 14
10  AACG    2022-01-06  09:40:00    13695   2.0150  1.9799  2.0500  1.9749  2.0200  0.0 0.0 59
11  AACG    2022-01-06  09:41:00    3252    2.0209  2.0275  2.0300  2.0200  2.0200  0.0 0.0 14
12  AACG    2022-01-06  09:42:00    12082   2.0117  2.0300  2.0400  1.9800  1.9900  0.0 0.0 41
13  AACG    2022-01-06  09:43:00    5148    1.9802  1.9800  1.9999  1.9750  1.9999  0.0 0.0 11
14  AACG    2022-01-06  09:44:00    2764    1.9927  1.9901  1.9943  1.9901  1.9943  0.0 0.0 5
15  AACG    2022-01-06  09:45:00    2379    1.9576  1.9601  1.9601  1.9201  1.9201  0.0 0.0 10
16  AACG    2022-01-06  09:46:00    8762    1.9852  1.9550  1.9900  1.9550  1.9900  0.0 0.0 35
17  AACG    2022-01-06  09:47:00    1343    1.9704  1.9700  1.9738  1.9700  1.9701  0.0 0.0 5
18  AACG    2022-01-06  09:48:00    17080   1.9696  1.9700  1.9800  1.9600  1.9600  0.0 0.0 9
19  AACG    2022-01-06  09:49:00    9004    1.9600  1.9600  1.9600  1.9600  1.9600  0.0 0.0 9
20  AACG    2022-01-06  09:50:00    9224    1.9603  1.9600  1.9613  1.9600  1.9613  0.0 0.0 4
21  AACG    2022-01-06  09:52:00    16914   1.9921  1.9800  2.0400  1.9750  2.0399  0.0 0.0 67
22  AACG    2022-01-06  09:53:00    4665    1.9866  1.9900  2.0395  1.9801  1.9900  0.0 0.0 37
23  AACG    2022-01-06  09:55:00    2107    2.0049  1.9900  2.0100  1.9900  2.0099  0.0 0.0 10
24  AACG    2022-01-06  09:56:00    3003    2.0028  2.0000  2.0099  2.0000  2.0099  0.0 0.0 23
25  AACG    2022-01-06  09:57:00    8489    2.0272  2.0100  2.0400  2.0100  2.0300  0.0 0.0 34
26  AACG    2022-01-06  09:58:00    6050    2.0155  2.0300  2.0300  2.0150  2.0150  0.0 0.0 6
27  AACG    2022-01-06  09:59:00    61623   2.0449  2.0300  2.0700  2.0300  2.0699  0.0 0.0 83
28  AACG    2022-01-06  10:00:00    19699   2.0856  2.0699  2.1199  2.0600  2.1100  0.0 0.0 54

The code I'm using to "find missing rows" is stuck in a loop:

h = 9
m = 30
row = 0

while df['time'][row] < datetime.time(10,00):
    if df['time'][row] == datetime.time(h,m):
        m = m 1
        row = row 1
        if m == 60:
            m = 00
            h = h 1
            break
        if row >= 40:
            break
    else:
        missingrow = {df.columns[0]: df.iloc[1,0], df.columns[1]: df.iloc[1,1], df.columns[2]:datetime.time(h,m), df.columns[3]:0, df.columns[4]:0, df.columns[5]:0, df.columns[6]:0, df.columns[7]:0, df.columns[8]:0, df.columns[9]:0, df.columns[10]:0, df.columns[11]:0,}
        df = df.append(missingrow, ignore_index = True)

The "missingrow" variable is suppose to be an empty row with an updated time value inserted into the database.

If the code was correct, then this would be inserted into the DataFrame:

   ticker       date    time       vol  vwap    open high   low close lbh lah trades
21  AACG    2022-01-06  09:51:00    0   0.00  0.00  0.00  0.00  0.00  0.0 0.0 0

Can you tell what's wrong with this code?

CodePudding user response:

Instead of iterating over the DataFrame rows, use pandas built-in methods.

(i) Convert df['time'] to Timedelta objects with type 'timedelta64[m]'

(ii) set_index with time_delta and reindex with the entire range of available minutes.

time = pd.to_timedelta(df['time'].astype(str)).astype('timedelta64[m]')
out = df.set_index(time).reindex(np.arange(time[0], time.iloc[len(df)-1] 1)).reset_index(drop=True)

Output:

    ticker      date      time       vol     vwap    open    high    low    close   lbh     lah     trades
0   AACG    2022-01-06  09:30:00    33042.0 1.8807  1.8900  1.9200  1.8700  1.9017  0.0     0.0     68.0
1   AACG    2022-01-06  09:31:00    5306.0  1.9073  1.9100  1.9200  1.8801  1.9100  0.0     0.0     27.0
2   AACG    2022-01-06  09:32:00    3496.0  1.8964  1.9100  1.9193  1.8800  1.8900  0.0     0.0     17.0
3   AACG    2022-01-06  09:33:00    5897.0  1.9377  1.8900  1.9500  1.8900  1.9500  0.0     0.0     15.0
4   AACG    2022-01-06  09:34:00    1983.0  1.9362  1.9200  1.9499  1.9200  1.9200  0.0     0.0     9.0
...
20  AACG    2022-01-06  09:50:00    9224.0  1.9603  1.9600  1.9613  1.9600  1.9613  0.0     0.0     4.0
21  NaN     NaN         NaN         NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
22  AACG    2022-01-06  09:52:00    16914.0 1.9921  1.9800  2.0400  1.9750  2.0399  0.0     0.0     67.0
23  AACG    2022-01-06  09:53:00    4665.0  1.9866  1.9900  2.0395  1.9801  1.9900  0.0     0.0     37.0
24  NaN     NaN         NaN         NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
25  AACG    2022-01-06  09:55:00    2107.0  2.0049  1.9900  2.0100  1.9900  2.0099  0.0     0.0     10.0
...

So the above data shows that the data for 09:51 and 09:54 are missing.

Now you can simply assign the missing data to NaN rows.

CodePudding user response:

Combine date and time into a single date column, generate the missing rows with complete from pyjanitor, and adjust the values for relevant columns:

# pip install pyjanitor
import janitor
import pandas as pd

temp = df.copy()
temp['date_time'] = pd.to_datetime(temp['date']   ' '   temp['time'])
# create mapping for all possible dates, minutes wise
mapp = {'date_time' : pd.date_range(temp.date_time.min(),
                                    temp.date_time.max(), 
                                    freq='min')}

# create dataframe with missing rows
(temp.complete('ticker', mapp, sort = True)
     .assign(date = lambda df: df.date_time.dt.date, 
             time = lambda df: df.date_time.astype(str).str.split().str[-1])
     .drop(columns='date_time')
)
   ticker        date      time      vol    vwap    open    high     low   close  lbh  lah  trades
0    AACG  2022-01-06  09:30:00  33042.0  1.8807  1.8900  1.9200  1.8700  1.9017  0.0  0.0    68.0
1    AACG  2022-01-06  09:31:00   5306.0  1.9073  1.9100  1.9200  1.8801  1.9100  0.0  0.0    27.0
2    AACG  2022-01-06  09:32:00   3496.0  1.8964  1.9100  1.9193  1.8800  1.8900  0.0  0.0    17.0
3    AACG  2022-01-06  09:33:00   5897.0  1.9377  1.8900  1.9500  1.8900  1.9500  0.0  0.0    15.0
4    AACG  2022-01-06  09:34:00   1983.0  1.9362  1.9200  1.9499  1.9200  1.9200  0.0  0.0     9.0
5    AACG  2022-01-06  09:35:00  10725.0  1.9439  1.9400  1.9600  1.9201  1.9306  0.0  0.0    87.0
6    AACG  2022-01-06  09:36:00   5942.0  1.9380  1.9307  1.9400  1.9300  1.9400  0.0  0.0    48.0
7    AACG  2022-01-06  09:37:00   5759.0  1.9428  1.9659  1.9659  1.9400  1.9500  0.0  0.0    11.0
8    AACG  2022-01-06  09:38:00   4855.0  1.9424  1.9500  1.9500  1.9401  1.9495  0.0  0.0    10.0
9    AACG  2022-01-06  09:39:00   6275.0  1.9514  1.9500  1.9700  1.9450  1.9700  0.0  0.0    14.0
10   AACG  2022-01-06  09:40:00  13695.0  2.0150  1.9799  2.0500  1.9749  2.0200  0.0  0.0    59.0
11   AACG  2022-01-06  09:41:00   3252.0  2.0209  2.0275  2.0300  2.0200  2.0200  0.0  0.0    14.0
12   AACG  2022-01-06  09:42:00  12082.0  2.0117  2.0300  2.0400  1.9800  1.9900  0.0  0.0    41.0
13   AACG  2022-01-06  09:43:00   5148.0  1.9802  1.9800  1.9999  1.9750  1.9999  0.0  0.0    11.0
14   AACG  2022-01-06  09:44:00   2764.0  1.9927  1.9901  1.9943  1.9901  1.9943  0.0  0.0     5.0
15   AACG  2022-01-06  09:45:00   2379.0  1.9576  1.9601  1.9601  1.9201  1.9201  0.0  0.0    10.0
16   AACG  2022-01-06  09:46:00   8762.0  1.9852  1.9550  1.9900  1.9550  1.9900  0.0  0.0    35.0
17   AACG  2022-01-06  09:47:00   1343.0  1.9704  1.9700  1.9738  1.9700  1.9701  0.0  0.0     5.0
18   AACG  2022-01-06  09:48:00  17080.0  1.9696  1.9700  1.9800  1.9600  1.9600  0.0  0.0     9.0
19   AACG  2022-01-06  09:49:00   9004.0  1.9600  1.9600  1.9600  1.9600  1.9600  0.0  0.0     9.0
20   AACG  2022-01-06  09:50:00   9224.0  1.9603  1.9600  1.9613  1.9600  1.9613  0.0  0.0     4.0
21   AACG  2022-01-06  09:51:00      NaN     NaN     NaN     NaN     NaN     NaN  NaN  NaN     NaN
22   AACG  2022-01-06  09:52:00  16914.0  1.9921  1.9800  2.0400  1.9750  2.0399  0.0  0.0    67.0
23   AACG  2022-01-06  09:53:00   4665.0  1.9866  1.9900  2.0395  1.9801  1.9900  0.0  0.0    37.0
24   AACG  2022-01-06  09:54:00      NaN     NaN     NaN     NaN     NaN     NaN  NaN  NaN     NaN
25   AACG  2022-01-06  09:55:00   2107.0  2.0049  1.9900  2.0100  1.9900  2.0099  0.0  0.0    10.0
26   AACG  2022-01-06  09:56:00   3003.0  2.0028  2.0000  2.0099  2.0000  2.0099  0.0  0.0    23.0
27   AACG  2022-01-06  09:57:00   8489.0  2.0272  2.0100  2.0400  2.0100  2.0300  0.0  0.0    34.0
28   AACG  2022-01-06  09:58:00   6050.0  2.0155  2.0300  2.0300  2.0150  2.0150  0.0  0.0     6.0
29   AACG  2022-01-06  09:59:00  61623.0  2.0449  2.0300  2.0700  2.0300  2.0699  0.0  0.0    83.0
30   AACG  2022-01-06  10:00:00  19699.0  2.0856  2.0699  2.1199  2.0600  2.1100  0.0  0.0    54.0
  •  Tags:  
  • Related