In my dataframe, I set the index of each column to 'Time' and then did frame = frame.astype(float)
to convert all the the other data to floats. However, I now need the default indices (0, 1, 2, etc) but I still want to set the 'Time' column to a date time format. I've tried a few different ways of doing this, they either work but mess up the time (says its 1970 instead of 2021) or they result in TypeError: Cannot cast DatetimeArray to dtype float64
This is similar to the dataframe I want (but with the times messed up):
Time Open High Low Close
0 1970-01-01 00:27:18.185760 57141.92 57157.16 57141.92 57147.00
1 1970-01-01 00:27:18.185820 57145.48 57149.15 57124.62 57139.75
2 1970-01-01 00:27:18.185880 57126.75 57173.11 57126.74 57142.20
3 1970-01-01 00:27:18.185940 57163.42 57163.42 57079.10 57135.31
4 1970-01-01 00:27:18.186000 57084.42 57110.00 57084.42 57092.95
I've tried changing the format of the 'Time' column with:
frame['Time'] = pd.to_datetime(frame['Time'])
And
frame['Time'] = frame['Time'].apply(pd.to_datetime)
And I have also tried changing the types of the other columns in a similar way
frame[['Open','High','Low','Close']] = frame[['Open','High','Low','Close']].apply(frame.astype(float))
And I tried this before and after applying pd.to_datetime
EDIT
Going to give some more information because I haven't been specific enough. The code below retrieves data from an API and puts it into a DataFrame. The response from the API is a list of lists, with each sublist containing 10 elements (I think, can't remember now). I only want the data up to 'Close'.
def get_historical_futures_data(symbol, interval, lookback):
frame = pd.DataFrame(client.futures_historical_klines(symbol, interval, lookback ' min ago UTC'))
frame = frame.iloc[:,:5]
frame.columns = ['Time','Open','High','Low','Close']
frame = frame.set_index('Time')
frame.index = pd.to_datetime(frame.index, unit='ms')
frame = frame.astype(float)
print(frame)
frames.append(frame)
Open High Low Close
Time
2021-11-29 14:27:00 57220.49 57220.50 57185.95 57190.01
2021-11-29 14:28:00 57190.00 57209.21 57161.74 57177.28
2021-11-29 14:29:00 57177.28 57182.61 57160.26 57164.46
2021-11-29 14:30:00 57164.46 57186.99 57154.32 57155.99
2021-11-29 14:31:00 57156.00 57179.74 57154.33 57179.74
Above is the code (and its output), I had previously, however, in another part of my code, I have realised that it is much easier for me to keep the row index numbers, so I do not want to make 'Time' the index of each row. Instead, I want the index of each row to remain, and then the rest of the data frame to come after, similar to this:
Time Open High Low Close
0 1970-01-01 00:27:18.185760 57141.92 57157.16 57141.92 57147.00
1 1970-01-01 00:27:18.185820 57145.48 57149.15 57124.62 57139.75
2 1970-01-01 00:27:18.185880 57126.75 57173.11 57126.74 57142.20
3 1970-01-01 00:27:18.185940 57163.42 57163.42 57079.10 57135.31
4 1970-01-01 00:27:18.186000 57084.42 57110.00 57084.42 57092.95
My issue is, that I am unable to make the 'Time' column into a DateTime type as well as make the other columns (Open, High, Low, Close) into float type. I either get errors about type casting, or the Time column gets messed up and says 1970 instead of 2021.
How do I make every column (EXCEPT FOR TIME) float type, and make the Time column DateTime type?
CodePudding user response:
I believe this issue might be happening because the format is not easy to find by pandas. Perhaps you can try using infer_datetime_format=True
to enhance the formats being detected.
Kindly try:
frame['Time'] = pd.to_datetime(frame['Time'],infer_datetime_format=True)
This outputs
Time
0 1970-01-01 00:27:18.185760
1 1970-01-01 00:27:18.185820
2 1970-01-01 00:27:18.185880
And by using df.info()
we can check it's an actual datetime format:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Time 3 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 152.0 bytes
None
This is the sample data used for this example:
df = pd.DataFrame({'Time':['1970-01-01 00:27:18.185760',
'1970-01-01 00:27:18.185820',
'1970-01-01 00:27:18.185880']})
CodePudding user response:
So I figured out what I was doing wrong. My times were being changed from the year 2021 to 1970 because I wasn't specifying that the unit was in milliseconds. My code is very similar to what I had initially, and the solution is actually really simple:
def get_historical_futures_data(symbol, interval, lookback):
frame = pd.DataFrame(client.futures_historical_klines(symbol, interval, lookback ' min ago UTC'))
frame = frame.iloc[:,:5]
frame.columns = ['Time','Open','High','Low','Close']
frame['Time'] = pd.to_datetime(frame['Time'], unit='ms')
frame[['Open','High','Low','Close']] = frame[['Open','High','Low','Close']].astype(float)
print(frame)
frames.append(frame)
Output is:
Time Open High Low Close
0 2021-11-29 19:17:00 58388.41 58401.33 58357.30 58359.75
1 2021-11-29 19:18:00 58359.74 58365.33 58270.00 58290.95
2 2021-11-29 19:19:00 58290.95 58291.80 58173.28 58188.67
3 2021-11-29 19:20:00 58188.68 58317.02 58174.30 58308.70
4 2021-11-29 19:21:00 58309.32 58365.75 58309.31 58330.55