I am working on Berkeley Earth Surface Temperature data. I have a monthly NetCDF file from 1753-recent. The date axis is float64 and when I convert to DateTime format it only returns the first day and month of each year. Below is the time documentation from Berkeley Earth Surface Temperature data:
time: A list of times at which data is reported. The data format is decimal with year and fraction of year reported, with each value corresponding to the midpoint of the respective month. For example, 1981.125 indicates February 1981.
I tried to convert DateTime from float to int and then apply pd.to_datetime(). It returns a value error when I use month in the format.
pd.to_datetime(dset.time.astype(int), format="%m%Y")
ValueError: time data '1850' does not match format '%m%Y' (match)
pd.to_datetime(dset.time.astype(int), format=%Y")
DatetimeIndex(['1850-01-01', '1850-01-01', '1850-01-01', '1850-01-01',
'1850-01-01', '1850-01-01', '1850-01-01', '1850-01-01',
'1850-01-01', '1850-01-01',
...
'2021-01-01', '2021-01-01', '2021-01-01', '2022-01-01',
'2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
'2022-01-01', '2022-01-01'],
dtype='datetime64[ns]', length=2071, freq=None)
I am new to xarray and NetCDF files, any help would be appreciated. Here is the link to the website - http://berkeleyearth.org/data/
Here is a description of my data:
<xarray.Dataset>
Dimensions: (longitude: 360, latitude: 180, time: 2071, month_number: 12)
Coordinates:
* longitude (longitude) float32 -179.5 -178.5 -177.5 ... 177.5 178.5 179.5
* latitude (latitude) float32 -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
* time (time) float64 1.85e 03 1.85e 03 ... 2.022e 03 2.023e 03
Dimensions without coordinates: month_number
Data variables:
land_mask (latitude, longitude) float64 ...
temperature (time, latitude, longitude) float32 ...
climatology (month_number, latitude, longitude) float32 ...
Attributes:
Conventions: Berkeley Earth Internal Convention (based on CF-1.5)
title: Native Format Berkeley Earth Surface
Temperature A...
history: 27-Aug-2022 08:16:14
institution: Berkeley Earth Surface Temperature Project
land_source_history: 05-Aug-2022 11:14:59
ocean_source_history: 27-Aug-2022 05:20:43
comment: This file contains Berkeley Earth surface temperature...
I am guessing this is what you meant by top rows:
<xarray.DataArray 'time' (time: 2071)>
array([1850.041667, 1850.125 , 1850.208333, ..., 2022.375 , 2022.458333,
2022.541667])
Coordinates:
* time (time) float64 1.85e 03 1.85e 03 1.85e 03 ... 2.022e 03 2.023e 03
Attributes:
units: year A.D.
standard_name: time
long_name: Time
CodePudding user response:
You need to convert the floating point date to a format that can be converted by pd.to_datetime
. Based on the description you have provided, you can extract (assuming fdate
represents the floating point date value):
year = int(fdate)
month = int((fdate - year) * 12) 1
and then convert that to a string in the form mmyyyy
using an f-string
:
f'{month:02d}{year:04d}'
That can then be converted to a datetime using format %m%Y
. Wrapping it in a function:
def convert_date(fdate):
year = int(fdate)
month = int((fdate - year) * 12) 1
return f'{month:02d}{year:04d}'
you would then use it as:
pd.to_datetime(xr.apply_ufunc(convert_date, dset.time, vectorize=True), format='%m%Y')