Home > Net >  How to only select winter months of daily data in xarray?
How to only select winter months of daily data in xarray?

Time:03-02

I have gridded daily temperature data but am only interested in the winter months.

from netCDF4 import Dataset as netcdf_dataset
import numpy as np
import xarray as xr

#open NASA GISS gridded temperature netcdf file
df = xr.open_dataset('BerkeleyEarth.nc')

#pull out temperature variable
air=df.temperature

#select only winter months
WinterAir = air[(air.time.dt.month >= 12) | (air.time.dt.month <= 2)]

When I try to select the months this way I get the following error message: AttributeError: 'DataArray' object has no attribute 'month'. How do I select only winter months?

Here is a screenshot of the netcdf file enter image description here

CodePudding user response:

I was able to do this by:

# select only winter months

WinterAir = air[(df.month >= 12) | (df.month <= 2)]

CodePudding user response:

The reason this doesn't work for your data specifically is that you don't have a datetime coordinate time; instead, you have a dimension time without any coordinate data labeling it, and then you have data variables with a variety of date components. Because of this, you can reference the month data variable directly and use that to slice your data.

You could always construct a datetime coordinate using the day, month, and year values in your data and assign that as the time coordinate, and then the usual time series functionality built into xarray would work.

As an example, here's a dataset similar to yours in structure:

In [6]: dates = pd.date_range("2020-01-01", "2020-12-31", freq="D")
   ...:
   ...: ds = xr.Dataset(
   ...:     coords={"lon": [-135, -45, 45, 135], "lat": [-45, 45]},
   ...:     data_vars={
   ...:         "day": (("time",), dates.day),
   ...:         "month": (("time",), dates.month),
   ...:         "year": (("time",), dates.year),
   ...:         "temperature": (
   ...:             ("lat", "lon", "time"),
   ...:             np.random.random(size=(2, 4, len(dates))),
   ...:         ),
   ...:     },
   ...: )

In [7]: ds
Out[7]:
<xarray.Dataset>
Dimensions:      (time: 366, lat: 2, lon: 4)
Coordinates:
  * lon          (lon) int64 -135 -45 45 135
  * lat          (lat) int64 -45 45
Dimensions without coordinates: time
Data variables:
    day          (time) int64 1 2 3 4 5 6 7 8 9 ... 23 24 25 26 27 28 29 30 31
    month        (time) int64 1 1 1 1 1 1 1 1 1 1 ... 12 12 12 12 12 12 12 12 12
    year         (time) int64 2020 2020 2020 2020 2020 ... 2020 2020 2020 2020
    temperature  (lat, lon, time) float64 0.2308 0.3257 ... 0.3501 0.009162

Note that time is a special "dimension without coordinates" - this means that there are no labels on the time dimension, and xarray does not know anything about "time" except that it has a certain shape and is the dimension indexing several of your data variables. Importantly, in your data, time is not a datetime type.

Because month is a data variable in the dataset, you need to reference month directly, as you found, and the DatetimeAccessor ds.time.dt is not available:

In [8]: ds.loc[{"time": ds.month == 2}]
Out[8]:
<xarray.Dataset>
Dimensions:      (time: 29, lat: 2, lon: 4)
Coordinates:
  * lon          (lon) int64 -135 -45 45 135
  * lat          (lat) int64 -45 45
Dimensions without coordinates: time
Data variables:
    day          (time) int64 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29
    month        (time) int64 2 2 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2 2
    year         (time) int64 2020 2020 2020 2020 2020 ... 2020 2020 2020 2020
    temperature  (lat, lon, time) float64 0.2821 0.08776 0.2018 ... 0.929 0.4774

If the time dimension had a corresponding coordinate of type datetime, e.g. by assigning the previous dates array to the time coord, everything would work as you expect:

In [10]: dates = pd.date_range("2020-01-01", "2020-12-31", freq="D")
    ...:
    ...: ds = xr.Dataset(
    ...:     coords={"lon": [-135, -45, 45, 135], "lat": [-45, 45], "time": dates},
    ...:     data_vars={
    ...:         "temperature": (
    ...:             ("lat", "lon", "time"),
    ...:             np.random.random(size=(2, 4, len(dates))),
    ...:         ),
    ...:     },
    ...: )

In [11]: ds
Out[11]:
<xarray.Dataset>
Dimensions:      (lat: 2, lon: 4, time: 366)
Coordinates:
  * lon          (lon) int64 -135 -45 45 135
  * lat          (lat) int64 -45 45
  * time         (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2020-12-31
Data variables:
    temperature  (lat, lon, time) float64 0.09064 0.5252 ... 0.08733 0.6283

Now the xarray datetime accessors work the way you'd expect

In [12]: ds.loc[{"time": ds.time.dt.month == 2}]
Out[12]:
<xarray.Dataset>
Dimensions:      (lat: 2, lon: 4, time: 29)
Coordinates:
  * lon          (lon) int64 -135 -45 45 135
  * lat          (lat) int64 -45 45
  * time         (time) datetime64[ns] 2020-02-01 2020-02-02 ... 2020-02-29
Data variables:
    temperature  (lat, lon, time) float64 0.3407 0.6847 0.3073 ... 0.8578 0.1335

See xarray's docs on Coordinates and working with time series data for more info.

  • Related