Reading netcdf time with unit of years-CodePudding

All, I am trying to read the time coordinate from Berkley Earth in the following temperature file. The time spans from 1850 to 2022. The time unit is in the year A.D. (1850.041667, 1850.125, 1850.208333, ..., 2022.708333, 2022.791667,2022.875).

The pandas.to_datetime cannot correctly interpret the time array because I think I need to state the origin of the time coordinate and the unit. I tried to use pd.to_datetime(dti,unit='D',origin='julian’), but it did not work (out of bounds). Also, I think I have to use a unit of years instead of Days.

The file is located here http://berkeleyearth.lbl.gov/auto/Global/Gridded/Land_and_Ocean_LatLong1.nc

import xarray as xr
import numpy as np
import pandas as pd  
# read data into memory
flname="Land_and_Ocean_LatLon1.nc"
ds = xr.open_dataset("./" flname)
dti = ds['time']
pd.to_datetime(dti,unit='D',origin='julian')
np.diff(dti)

CodePudding user response：

convert to datetime using %Y as parsing directive to get the year only, then add the fractional year as a timedelta of days. Note that you have to account for leap years when calculating the timedelta. Ex:

import pandas as pd

dti = pd.to_datetime(ds['time'], format="%Y")

daysinyear = pd.Series([366]*dti.size).where(dti.is_leap_year, 365)

dti = dti   pd.to_timedelta(daysinyear * (ds['time']-ds['time'].astype(int)), unit="d")

dti
0      1850-01-16 04:59:59.999971200
1      1850-02-15 15:00:00.000000000
2      1850-03-18 01:00:00.000028800
3      1850-04-17 10:59:59.999971200
4      1850-05-17 21:00:00.000000000
            
2070   2022-07-17 16:59:59.999971200
2071   2022-08-17 03:00:00.000000000
2072   2022-09-16 13:00:00.000028800
2073   2022-10-16 22:59:59.999971200
2074   2022-11-16 09:00:00.000000000
Length: 2075, dtype: datetime64[ns]