All, I am trying to read the time coordinate from Berkley Earth in the following temperature file. The time spans from 1850 to 2022. The time unit is in the year A.D. (1850.041667, 1850.125, 1850.208333, ..., 2022.708333, 2022.791667,2022.875).
The pandas.to_datetime
cannot correctly interpret the time array because I think I need to state the origin of the time coordinate and the unit. I tried
to use pd.to_datetime(dti,unit='D',origin='julian’)
, but it did not work (out of bounds). Also, I think I have to use a unit of years instead of Days.
The file is located here http://berkeleyearth.lbl.gov/auto/Global/Gridded/Land_and_Ocean_LatLong1.nc
import xarray as xr
import numpy as np
import pandas as pd
# read data into memory
flname="Land_and_Ocean_LatLon1.nc"
ds = xr.open_dataset("./" flname)
dti = ds['time']
pd.to_datetime(dti,unit='D',origin='julian')
np.diff(dti)
CodePudding user response:
convert to datetime using %Y
as parsing directive to get the year only, then add the fractional year as a timedelta of days. Note that you have to account for leap years when calculating the timedelta. Ex:
import pandas as pd
dti = pd.to_datetime(ds['time'], format="%Y")
daysinyear = pd.Series([366]*dti.size).where(dti.is_leap_year, 365)
dti = dti pd.to_timedelta(daysinyear * (ds['time']-ds['time'].astype(int)), unit="d")
dti
0 1850-01-16 04:59:59.999971200
1 1850-02-15 15:00:00.000000000
2 1850-03-18 01:00:00.000028800
3 1850-04-17 10:59:59.999971200
4 1850-05-17 21:00:00.000000000
2070 2022-07-17 16:59:59.999971200
2071 2022-08-17 03:00:00.000000000
2072 2022-09-16 13:00:00.000028800
2073 2022-10-16 22:59:59.999971200
2074 2022-11-16 09:00:00.000000000
Length: 2075, dtype: datetime64[ns]