Home > OS >  Convert a single Pandas column to DateTime
Convert a single Pandas column to DateTime

Time:11-30

In my dataframe, I set the index of each column to 'Time' and then did frame = frame.astype(float) to convert all the the other data to floats. However, I now need the default indices (0, 1, 2, etc) but I still want to set the 'Time' column to a date time format. I've tried a few different ways of doing this, they either work but mess up the time (says its 1970 instead of 2021) or they result in TypeError: Cannot cast DatetimeArray to dtype float64

This is similar to the dataframe I want (but with the times messed up):

                          Time      Open      High       Low     Close
0   1970-01-01 00:27:18.185760  57141.92  57157.16  57141.92  57147.00
1   1970-01-01 00:27:18.185820  57145.48  57149.15  57124.62  57139.75
2   1970-01-01 00:27:18.185880  57126.75  57173.11  57126.74  57142.20
3   1970-01-01 00:27:18.185940  57163.42  57163.42  57079.10  57135.31
4   1970-01-01 00:27:18.186000  57084.42  57110.00  57084.42  57092.95

I've tried changing the format of the 'Time' column with:

frame['Time'] = pd.to_datetime(frame['Time'])

And

frame['Time'] = frame['Time'].apply(pd.to_datetime)

And I have also tried changing the types of the other columns in a similar way

frame[['Open','High','Low','Close']] = frame[['Open','High','Low','Close']].apply(frame.astype(float))

And I tried this before and after applying pd.to_datetime


EDIT

Going to give some more information because I haven't been specific enough. The code below retrieves data from an API and puts it into a DataFrame. The response from the API is a list of lists, with each sublist containing 10 elements (I think, can't remember now). I only want the data up to 'Close'.

def get_historical_futures_data(symbol, interval, lookback):
    frame = pd.DataFrame(client.futures_historical_klines(symbol, interval, lookback ' min ago UTC'))
    frame = frame.iloc[:,:5]
    frame.columns = ['Time','Open','High','Low','Close']
    frame = frame.set_index('Time')
    frame.index = pd.to_datetime(frame.index, unit='ms')
    frame = frame.astype(float)
    print(frame)
    frames.append(frame)
                         Open      High       Low     Close
Time                                                       
2021-11-29 14:27:00  57220.49  57220.50  57185.95  57190.01
2021-11-29 14:28:00  57190.00  57209.21  57161.74  57177.28
2021-11-29 14:29:00  57177.28  57182.61  57160.26  57164.46
2021-11-29 14:30:00  57164.46  57186.99  57154.32  57155.99
2021-11-29 14:31:00  57156.00  57179.74  57154.33  57179.74

Above is the code (and its output), I had previously, however, in another part of my code, I have realised that it is much easier for me to keep the row index numbers, so I do not want to make 'Time' the index of each row. Instead, I want the index of each row to remain, and then the rest of the data frame to come after, similar to this:

                          Time      Open      High       Low     Close
0   1970-01-01 00:27:18.185760  57141.92  57157.16  57141.92  57147.00
1   1970-01-01 00:27:18.185820  57145.48  57149.15  57124.62  57139.75
2   1970-01-01 00:27:18.185880  57126.75  57173.11  57126.74  57142.20
3   1970-01-01 00:27:18.185940  57163.42  57163.42  57079.10  57135.31
4   1970-01-01 00:27:18.186000  57084.42  57110.00  57084.42  57092.95

My issue is, that I am unable to make the 'Time' column into a DateTime type as well as make the other columns (Open, High, Low, Close) into float type. I either get errors about type casting, or the Time column gets messed up and says 1970 instead of 2021.

How do I make every column (EXCEPT FOR TIME) float type, and make the Time column DateTime type?

CodePudding user response:

I believe this issue might be happening because the format is not easy to find by pandas. Perhaps you can try using infer_datetime_format=True to enhance the formats being detected.

Kindly try:

frame['Time'] = pd.to_datetime(frame['Time'],infer_datetime_format=True)

This outputs

                        Time
0 1970-01-01 00:27:18.185760
1 1970-01-01 00:27:18.185820
2 1970-01-01 00:27:18.185880

And by using df.info() we can check it's an actual datetime format:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   Time    3 non-null      datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 152.0 bytes
None

This is the sample data used for this example:

df = pd.DataFrame({'Time':['1970-01-01 00:27:18.185760',
                           '1970-01-01 00:27:18.185820',
                           '1970-01-01 00:27:18.185880']})

CodePudding user response:

So I figured out what I was doing wrong. My times were being changed from the year 2021 to 1970 because I wasn't specifying that the unit was in milliseconds. My code is very similar to what I had initially, and the solution is actually really simple:

def get_historical_futures_data(symbol, interval, lookback):
    frame = pd.DataFrame(client.futures_historical_klines(symbol, interval, lookback ' min ago UTC'))
    frame = frame.iloc[:,:5]
    frame.columns = ['Time','Open','High','Low','Close']
    frame['Time'] = pd.to_datetime(frame['Time'], unit='ms')
    frame[['Open','High','Low','Close']] = frame[['Open','High','Low','Close']].astype(float)
    print(frame)
    frames.append(frame)

Output is:

                  Time      Open      High       Low     Close
0   2021-11-29 19:17:00  58388.41  58401.33  58357.30  58359.75
1   2021-11-29 19:18:00  58359.74  58365.33  58270.00  58290.95
2   2021-11-29 19:19:00  58290.95  58291.80  58173.28  58188.67
3   2021-11-29 19:20:00  58188.68  58317.02  58174.30  58308.70
4   2021-11-29 19:21:00  58309.32  58365.75  58309.31  58330.55
  • Related