Home > database >  NaN when converting df to a series
NaN when converting df to a series

Time:11-26

I have a dataframe with OHLC data. I need to get the close price into the pandas series, using the timestamp column as the index.

I am reading from a sqlite db into my df:

conn = sql.connect('allStockData.db') 
price = pd.read_sql_query("SELECT * from ohlc_minutes", conn)
price['timestamp'] = pd.to_datetime(price['timestamp'])
print(price)

Which returns:

                      timestamp  open   high   low  close  volume  trade_count      vwap symbol  volume_10_day
0     2022-09-16 08:00:00 00:00  3.19  3.570  3.19  3.350   66475          458  3.404240   AAOI            NaN
1     2022-09-16 08:05:00 00:00  3.35  3.440  3.33  3.430   28925          298  3.381131   AAOI            NaN
2     2022-09-16 08:10:00 00:00  3.44  3.520  3.35  3.400   62901          643  3.445096   AAOI            NaN
3     2022-09-16 08:15:00 00:00  3.37  3.390  3.31  3.360   17943          184  3.339721   AAOI            NaN
4     2022-09-16 08:20:00 00:00  3.36  3.410  3.34  3.400   29123          204  3.383370   AAOI            NaN
...                         ...   ...    ...   ...    ...     ...          ...       ...    ...            ...
8759  2022-09-08 23:35:00 00:00  1.35  1.360  1.35  1.355    3835           10  1.350613   RUBY       515994.5
8760  2022-09-08 23:40:00 00:00  1.36  1.360  1.35  1.350    2780            7  1.353687   RUBY       515994.5
8761  2022-09-08 23:45:00 00:00  1.35  1.355  1.35  1.355    7080           11  1.350424   RUBY       515994.5
8762  2022-09-08 23:50:00 00:00  1.35  1.360  1.33  1.360   11664           30  1.351104   RUBY       515994.5
8763  2022-09-08 23:55:00 00:00  1.36  1.360  1.33  1.340   21394           32  1.348223   RUBY       515994.5

[8764 rows x 10 columns]

When I try to get the close into a series with the timestamp:

price = pd.Series(price['close'], index=price['timestamp'])

It returns a bunch of NaNs:

2022-09-16 08:00:00 00:00   NaN
2022-09-16 08:05:00 00:00   NaN
2022-09-16 08:10:00 00:00   NaN
2022-09-16 08:15:00 00:00   NaN
2022-09-16 08:20:00 00:00   NaN
                             ..
2022-09-08 23:35:00 00:00   NaN
2022-09-08 23:40:00 00:00   NaN
2022-09-08 23:45:00 00:00   NaN
2022-09-08 23:50:00 00:00   NaN
2022-09-08 23:55:00 00:00   NaN
Name: close, Length: 8764, dtype: float64

If I remove the index:

price = pd.Series(price['close'])

The close is returned normally:

0       3.350
1       3.430
2       3.400
3       3.360
4       3.400
        ...  
8759    1.355
8760    1.350
8761    1.355
8762    1.360
8763    1.340
Name: close, Length: 8764, dtype: float64

How can I return the close column as a pandas series, using my timestamp column as the index?

CodePudding user response:

It's because price['close'] has it's own index which is incompatible with timestamp. Try use .values instead:

price = pd.Series(price['close'].values, index=price['timestamp'])

CodePudding user response:

I needed to set the timestamp to the index before getting the the close as a series:

conn = sql.connect('allStockData.db') 
price = pd.read_sql_query("SELECT * from ohlc_minutes", conn)
price['timestamp'] = pd.to_datetime(price['timestamp'])
price = price.set_index('timestamp')
print(price)

price = pd.Series(price['close'])
print(price)

Gives:

2022-09-16 08:00:00 00:00    3.350
2022-09-16 08:05:00 00:00    3.430
2022-09-16 08:10:00 00:00    3.400
2022-09-16 08:15:00 00:00    3.360
2022-09-16 08:20:00 00:00    3.400
                             ...  
2022-09-08 23:35:00 00:00    1.355
2022-09-08 23:40:00 00:00    1.350
2022-09-08 23:45:00 00:00    1.355
2022-09-08 23:50:00 00:00    1.360
2022-09-08 23:55:00 00:00    1.340
Name: close, Length: 8764, dtype: float64
  • Related